Julia机器学习---- 聚类分析代码示例

原創

2020-06-28 23:39

Clustering.jl 是Julia中一个很基础的用于聚类数据分析的代码库，文档里缺少一些代码示例，这里简单整理了一下。

K-means

K-均值是一种经典的聚类或矢量量化方法。它产生固定数量的簇，每个簇都与一个中心（也称为原型）关联，并且每个数据点都被分配给具有最近中心的簇。

从数学角度来看，K-means是一种座标下降算法，它解决了以下优化问题：

这里，μk是k次聚类的中心，Zi是i次点的聚类指标。

代码样例

using RDatasets, Clustering, Plots
using DataFrames
using CSV

iris = dataset("datasets", "iris"); # load the data

features = collect(Matrix(iris[:, 1:4])'); # features to use for clustering
result = kmeans(features, 3); # run K-means for the 3 clusters

#plot with the point color mapped to the assigned cluster index
scatter(iris.PetalLength, iris.PetalWidth, marker_z=result.assignments,
                color=:lightrainbow, legend=false)

Fuzzy C-means

Fuzzy C-means是一种聚类方法，它提供聚类成员权而不是“硬”分类（如K-means）。

从数学角度看，Fuzzy C-means解决了以下优化问题：

这里，cj是j-簇的中心，wij是j-簇中i-点的隶属度，m>1是用户定义的模糊参数。

代码示例：

using RDatasets, Clustering, Plots
using DataFrames
using CSV

iris = dataset("datasets", "iris"); # load the data

features = collect(Matrix(iris[:, 1:4])'); # features to use for clustering
result = fuzzy_cmeans(features, 3, 4, maxiter=150, display=:iter)

#plot with the point color mapped to the assigned cluster index
scatter(iris.PetalLength, iris.PetalWidth, marker_z=result.centers,
                color=:lightrainbow, legend=false)

K-medoids

K-medods是一种聚类算法，思路与K-means算法类似，但它通过查找中值（而不是均值）数据点（称为medods），使得每个数据点与最近的medods之间的总距离最小。它需要提供N*N的正方形矩阵。

using RDatasets, Clustering, Plots
using DataFrames
using CSV

X = DataFrame(rand(100, 100))
features = collect(Matrix(X[:, 1:100])); # features to use for clustering
result = kmedoids(features, 3); # run K-means for the 3 clusters

scatter(sum([X.x1,X.x49]), sum([X.x50,X.x100]), marker_z=result.counts,
                color=:lightrainbow, legend=false)

MLC（Markov Cluster Algorithm)

这是一个图聚类算法，可以用于人的社交图分析，以上几个都是特征聚类算法。马尔可夫聚类算法的工作原理是在一个加权图中模拟一个随机（马尔可夫）流，其中每个节点都是一个数据点，边的权值由邻接矩阵定义。。。当算法收敛时，它产生新的边权值来定义图中新的连通分量（即簇）。它需要提供N*N的正方形矩阵。

using RDatasets, Clustering, Plots
using DataFrames
using CSV
import Clustering:mcl

X = DataFrame(rand(100, 100))
features = collect(Matrix(X[:, 1:100])); # features to use for clustering
result = mcl(features;add_loops=true,expansion=3,inflation=4); # run K-means for the 3 clusters

# println(X)
# println(result.assignments)
# println(result.converged)
# println(result.iterations)
# println(result.rel_Δ)
# println(result.iterations)

#plot with the point color mapped to the assigned cluster index
scatter(X.x1, X.x2, marker_z=result.assignments,
                color=:lightrainbow, legend=false)

AP 图聚类算法

Affinity propagation （简称AP算法）是2007提出的，当时发表在Science上《single-exemplar-based》。特别适合高维、多类数据快速聚类，相比传统的聚类算法，该算法算是比较新的，从聚类性能和效率方面都有大幅度的提升。

AP算法的基本思想：将全部样本看作网络的节点，然后通过网络中各条边的消息传递计算出各样本的聚类中心。聚类过程中，共有两种消息在各节点间传递，分别是吸引度( responsibility)和归属度(availability) 。AP算法通过迭代过程不断更新每一个点的吸引度和归属度值，直到产生m个高质量的Exemplar（类似于质心），同时将其余的数据点分配到相应的聚类中。

using RDatasets, Clustering, Plots
using DataFrames
using CSV

iris = dataset("datasets", "iris"); # load the data

features = collect(Matrix(iris[:, 1:4])'); # features to use for clustering
typeof(features)

result = affinityprop(features)

#plot with the point color mapped to the assigned cluster index
scatter(iris.PetalLength, iris.PetalWidth, marker_z=result.assignments,
                color=:lightrainbow, legend=false)

DBSCAN

DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一个比较有代表性的基于密度的聚类算法。与划分和层次聚类方法不同，它将簇定义为密度相连的点的最大集合，能够把具有足够高密度的区域划分为簇，并可在噪声的空间数据库中发现任意形状的聚类。

待续。。。。。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Julia机器学习---- 聚类分析代码示例

K-means

Fuzzy C-means

K-medoids

MLC（Markov Cluster Algorithm)

AP 图聚类算法

DBSCAN

vue绑定对象，绑定的值不改变的问题

Spring Cloud 部署时如何使用 Kubernetes 作为注册中心和配置中心

KubeKey 部署 K8s v1.28.8 实战

记一些CISP-PTE题目解析

Julia 機器學習 --- k-折交叉驗證

Julia 機器學習 ---- 單變量線性迴歸和多元線性迴歸 (Linear regression)

Julia 機器學習 ---- 訓練集和測試集的拆分函數

Julia機器學習---- 聚類分析代碼示例

Docker 一鍵部署Redis Cluster 集羣

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Julia机器学习---- 聚类分析 代码示例

AP 图聚类算法

DBSCAN

Julia机器学习---- 聚类分析代码示例