import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from sklearn import datasets
%matplotlib
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
Using matplotlib backend: MacOSX
# 加載數據
df = pd.read_csv('https://query.data.world/s/ei6k5toqscwnarxr2ttfavrx2zodxc')
df.head(5)
number | density | sugercontent | |
---|---|---|---|
0 | 1 | 0.697 | 0.460 |
1 | 2 | 0.774 | 0.376 |
2 | 3 | 0.634 | 0.264 |
3 | 4 | 0.608 | 0.318 |
4 | 5 | 0.556 | 0.215 |
df.plot.scatter(x='density', y='sugercontent')
<matplotlib.axes._subplots.AxesSubplot at 0x11a539780>
K-Means算法
對於給定樣本集,按照樣本之間的距離大小,將樣本集劃分爲K個簇,讓簇內的點儘量緊密的連在一起,而讓簇間的距離儘量大。