Clustering Uncertain Data Based on Probability Distribution Similarity

Clustering Uncertain Data Based on Probability Distribution Similarity

IEEE Transanctions on knowledge and data engineering

Bin Jiang et al.

 

Key Points

  1. Common clustering algorithms cluster objects by their coordinates, but when the object is a collection of data, the inner distribution of the object should be considered when clustering. For example, two items are both rated as 4.5 star, but the votes have very different distribution, then we may cluster them into different categories.

  2. The global idea remains similar to K-mean, but here the authors use Kullback-Leibler divergence (information entropy, relative entropy) to measure the distance between distributions.

  3. Use kernel density to estimate the distribution and KL divergence and use fast Gauss transform to boost the computation.

  4. Use K-medoids as clustering method.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章