python層次聚類——基於sci庫的代碼實現和解釋

原創

2020-02-23 18:46

一、代碼

from scipy.cluster.hierarchy import linkage, fcluster
import numpy as np
from matplotlib import pyplot as plt

data = np.random.rand(100, 2)
# 進行層次聚類（linkage返回聚類結果矩陣z）
z = linkage(data, method = 'complete', metric = 'euclidean' )
# 輸入閾值獲取聚類的結果（fcluster返回每個點所屬的cluster的編號）
cluster_assignments = fcluster(z, t = 0.5, criterion = 'distance')
print('Cluster assignments:', cluster_assignments)

# np.where根據cluster編號取點的索引
clusters = [np.where(i == cluster_assignments)[0].tolist() for i in range(1, cluster_assignments.max() + 1)]
print('Clusters:', clusters)
# 繪製聚類結果
for indices in clusters:
    plt.scatter(data[indices][:, 0], data[indices][:, 1])
plt.show()

輸出結果：

Cluster assignments: [8 4 3 1 9 2 1 3 5 9 4 2 2 4 5 5 7 5 7 6 8 9 9 1 9 6 3 5 6 8 3 1 6 6 9 8 2 9 2 8 8 7 2 9 8 8 5 4 5 4 4 1 2 9 8 4 9 2 7 6 3 9 1 9 2 7 1 3 7 2 2 7 8 2 6 7 7 2 3 5 4 6 5 2 6 9 2 9 3 1 5 2 2 8 1 9 9 5 7 3]
Cluster: [[3, 6, 23, 31, 51, 62, 66, 89, 94], [5, 11, 12, 36, 38, 42, 52, 57, 64, 69, 70, 73, 77, 83, 86, 91, 92], [2, 7, 26, 30, 60, 67, 78, 88, 99], [1, 10, 13, 47, 49, 50, 55, 80], [8, 14, 15, 17, 27, 46, 48, 79, 82, 90, 97], [19, 25, 28, 32, 33, 59, 74, 81, 84], [16, 18, 41, 58, 65, 68, 71, 75, 76, 98], [0, 20, 29, 35, 39, 40, 44, 45, 54, 72, 93], [4, 9, 21, 22, 24, 34, 37, 43, 53, 56, 61, 63, 85, 87, 95, 96]]

二、函數解釋

以下是簡單的介紹，具體參考：https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html#module-scipy.cluster.hierarchy

1、linkage函數

原型	`scipy.cluster.hierarchy.linkage`(y, method='single', metric='euclidean', optimal_ordering=False)
	名稱	類型	解釋
參數	y	ndarray	聚類的數據，可以是已經算好的座標點間 $\begin{pmatrix} n\\ 2 \end{pmatrix}$ 個距離值，也可以是n個d維座標值。
	method	str	類型爲字符串或者函數cluster間距離計算方法。例如’single‘計算的cluster距離爲兩個cluster間最近點的距離，’complete‘計算的cluster距離爲兩個cluster間最遠點的距離。
	metric	str或function	座標點間距離計算方法，例如’euclidean‘表示歐式距離。
	optimal_ordering	bool	如果True，計算結果的可視化會更直觀，但算法會變慢。在數據量大的情況下，這個參數最好設置爲False。
返回值	z	ndarray	層次聚類結果編碼後的矩陣。

2、fcluster函數

原型	`scipy.cluster.hierarchy.fcluster`(Z, t, criterion='inconsistent', depth=2, R=None, monocrit=None)
	名稱	類型	解釋
參數	Z	ndarray	linkage函數所返回的編碼矩陣。
	t	scalar	與參數criterion相關：對於 ‘inconsistent’、‘distance’或‘monocrit’表示歸併的閾值；對於‘maxclust’或‘maxclust_monocrit’表示cluster數量的最大值。
	criterion	str	聚類的標準。
	depth	int	‘inconsistent’計算時的最大深度。
	R	ndaray	用於‘inconsistent’計算的不一致性矩陣。
	monocrit	ndaray	-
返回值	fcluster	ndarray	返回輸入座標點所處的cluster編號。

參考：

層次聚類算法的原理及python實現

Python層次聚類sci.cluster.hierarchy.linkage函數詳解

詳解python中層次聚類的fcluster函數

附：scipy庫計算距離的函數

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python層次聚類——基於sci庫的代碼實現和解釋

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

一個docker容器暴露多個端口

leetcode 60 排列序列

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

NetworkManager——nmcli命令連接WIFI、以太網和創建熱點

ROS串口編程學習筆記

樹莓派3B-linux控制GPIO（不用樹莓派的庫）

PCL+QT配置過程記錄（Ubuntu16.04）

Linux進程間通信：管道和FIFO

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結