mean 平均值,k 聚成k類
算法目的:將數據分成k類
1.首先,在所有數據中隨機選取中的k組數據作爲k箇中心點
2.分別計算每組數據和k個數據的誤差,距離k箇中心點哪個近,就標記爲哪類
3.計算完每組數據後,重新計算中心點,計算方法是算出每組的平均值作爲該組新的中心點
4.重複2~3,直至中心點不變
main.mat
close all;
for i=1:10
% clear workspace
clear;clc;
% set algorithm parameters
TOL = 0.0004;%學習率
ITER = 30;%迭代次數
kappa = 15;%分成15類
% load data 可以自己載入 .mat文件
data=load('Yale.mat');
X =data.X;
% run k-Means on random data
tic;
[C, I, iter] = myKmeans(X, kappa, ITER, TOL);
toc
% show number of iteration taken by k-means
disp(['k-means instance took ' int2str(iter) ' iterations to complete']);
end
% pause and close all windows
pause;
close all;
myKmeans.m
function [C, I, iter] = myKmeans(X, K, maxIter, TOL)
% 計算數據的行數和列數
[vectors_num, dim] = size(X);
%R爲打亂的數據行數,選取前k個作爲中心點
R = randperm(vectors_num);
% construct indicator matrix (each entry corresponds to the cluster
% of each point in X)
I = zeros(vectors_num, 1);
% 中心矩陣,K箇中心所以是K行,dim列,先初始化爲0
C = zeros(K, dim);
% 爲中心矩陣賦值,選取R前K行
for k=1:K
C(k,:) = X(R(k),:);
end
% iteration count
iter = 0;
while 1
% find closest point
%一行一行算
for n=1:vectors_num
% 給minldx初始賦值
minIdx = 1;
%先取最小距離爲該點到第一個中心點的距離
minVal = norm(X(n,:) - C(minIdx,:), 1);
for j=1:K
%計算該點到每一箇中心點的距離
dist = norm(C(j,:) - X(n,:), 1);
if dist < minVal
%如果找到最小點,則將minldx記爲該點
minIdx = j;
minVal = dist;
end
end
% 找到最近中心點,做標記
I(n) = minIdx;
end
% 重新計算k箇中心點
for k=1:K
C(k, :) = sum(X(find(I == k), :));
C(k, :) = C(k, :) / length(find(I == k));
end
% compute RSS error
RSS_error = 0;
for idx=1:vectors_num
RSS_error = RSS_error + norm(X(idx, :) - C(I(idx),:), 2);
end
RSS_error = RSS_error / vectors_num;
% increment iteration
iter = iter + 1;
% check stopping criteria
if 1/RSS_error < TOL
break;
end
if iter > maxIter
iter = iter - 1;
break;
end
end
disp(['k-means took ' int2str(iter) ' steps to converge']);