kmeans algorithm
Hi,
I am trying to implement kmeans algorithm in java for my data.
I have my data stored in ms access format and my application use sql to retrieve selected. I was testing to put sample data into vector.
e.g. {fund A, 0.022,0.45}, {fundB, 0.432, 0.888} etc.
As I learn the theory concept of kmeans algorithm to find the minimum distance between points and then to form clusters among data.
the concept is as follow.
input p= P{p1....pk} points;
n=no. of cluster for which i set as 2 for testing.
output c={c1....cn} cluster centriod
m:p-{1...n}cluster memebership
proceed kmeans
set c to intial value (which can select p1 as starting point)
for each pi belong to P
m(pi)=arg min distance(pi,cj)
while m has changed
for each i belongs to {1...n}
recompute ci as the centriodof {plm(p)=i}
for each pi belong to P
m(pi)=arg min distance (pi,cj)
since if i use the vector i have at the moment, i would not be able to separate the data e.g. {fund A, 0.022,0.45}, {fundB, 0.432, 0.888} to compare between 0.022 and 0.432 since all data in same index.? does it mean that I have to spearate it into different vector like
vector a={0.022},{0.432}
vector b={0.45},{0.888}
inorder to compare the distance?
Can someone give me a starting point for the algorithm since i am very new to java and algorithm,
as i already set my no. of cluster to 2
and I have the selected data. I assume the first centriod point as the first point p1 e.g. 0.022. but do I compare with the next availble one in the index (0.432)? or and I am quite confused with the min distance how I determine the min distance among all data variable and put it as same member?
I very much appreciate your help.
thank you.
z

