mahout聚類結果的輸出和可視化

原創

wanghailong000

2018-08-27 19:53

1、在mahout中，org.apache.mahout.utils.clustering.ClusterDumper類可以將聚類結果輸出，如果是打印在控制檯，則可以使用：

ClusterDumper clusterdumper=new ClusterDumper(sequentialfile,clusterpoints);
clusterdumper.printClusters(null);

其中第一個參數表示聚類結果的簇中心序列化的文件路徑的path類，第二個參數表示聚類結果的中心序列化文件路徑的path類

如果要輸出到文件，則可以在控制檯通過命令運行該ClusterDumper.java文件，如何要在eclipse中運行的話，則給ClusterDumper.java添加所需要的參數，然後run即可，參數說明如下：

--help                               Print out help 
--input (-i) input                   The directory containing Sequence
                                       Files for the Clusters   （聚類結果的序列化的簇中心文件路徑）    
--output (-o) output                 The output file.  If not specified,（反序列化後的結果輸出路徑）
                                       dumps to the console.
--outputFormat (-of) outputFormat    The optional output format to write
                                       the results as. Options: TEXT, CSV, or GRAPH_ML       
--substring (-b) substring           The number of chars of the     
                       asFormatString() to print    
--pointsDir (-p) pointsDir           The directory containing points  
                                       sequence files mapping input vectors
                                       to their cluster.  If specified, 
                                       then the program will output the 
                                       points associated with a cluster （聚類結果的數據點序列化文件）
--dictionary (-d) dictionary         The dictionary file.
--dictionaryType (-dt) dictionaryType    The dictionary file type       
                                     (text|sequencefile)
--distanceMeasure (-dm) distanceMeasure  The classname of the DistanceMeasure.
                                           Default is SquaredEuclidean.
--numWords (-n) numWords             The number of top terms to print 
--tempDir tempDir                    Intermediate output directory
--startPhase startPhase              First phase to run
--endPhase endPhase                  Last phase to run
--evaluate (-e)                      Run ClusterEvaluator and CDbwEvaluator over the
                                      input. The output will be appended to the rest of
                                      the output at the end.

其中紅色的部分參數是必須的。

2、可視化聚類結果：

在mahout源碼中，org.apache.mahout.clustering.display包下有對應的可視化類，之間運行即可看到結果，是用java swing寫的

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

mahout聚類結果的輸出和可視化

TensorFlow簡單學習1

mahout之canopy聚類算法

機器學習之最小二乘法

PCA 和 SVD

閱讀資料

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結