利用mahout自帶的fpgrowth算法挖掘頻繁模式

原創

tangtang5156

2020-02-22 05:22

建立測試文件，將測試文件上傳至hdfs上。這裏我的測試文件是自己隨便寫的幾行數字

1,5,2,3
5,7,3,4
5,2,3
1,5,2,7,3,4
1,2,4
5,2,4
1,2,3
1,5,2,6,3
1,5,6,3

hadoop fs -put fp.txt /

hadoop jar /opt/mahout-distribution-0.9/mahout-examples-0.9-job.jar org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i /fp.txt -o /out -s 1 -method mapreduce

運行後查看結果是亂碼。

原因：mahout運行之後得到的結果是序列化的，必須將其轉化爲文本文件下載到本地纔可進行查看

mahout seqdumper -i /out/frequentpatterns/part-r-00000 -o /home/mahout_test/fpresult1.txt

查看後發現結果仍爲亂碼

猜想：mahout處理的文件必須都爲序列化文件，原因可能是我的輸入文件是文本格式不是序列化。

用命令mahout seqdirectory -i /test/ -o /test1/ -c UTF-8將其轉化爲序列化文件，但是報錯：

Error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:164)
   at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.<init>(CombineFileRecordReader.java:126)
   at org.apache.mahout.text.MultipleTextFileInputFormat.createRecordReader(MultipleTextFileInputFormat.java:43)
   at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:155)
   ... 10 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
   at org.apache.mahout.text.WholeFileRecordReader.<init>(WholeFileRecordReader.java:59)
   ... 15 more

未能解決。

於是嘗試利用mahout中自帶的例子進行測試

使用測試樣本在mahout的源碼中，路徑爲F:\mahout\mahout-distribution-0.9-src\mahout-distribution-0.9\core\src\test\resources\retail.dat

將其上傳至hdfs中，再運行如下命令

hadoop jar /opt/mahout-distribution-0.9/mahout-examples-0.9-job.jar org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i /test/retail.dat -o /out2 -s 5 -method mapreduce

結果存至hdfs上的out2中。

mahout seqdumper -i /out2/frequentpatterns/part-r-00000 -o /home/mahout_test/fpresult5.txt

將其下載到本地後，

直接vi fpresult5.txt，即可看到結果，其中，部分結果如下：

Key: 10 : Value: ([10 ],6)
Key: 10 39 : Value: ([10 39 ],5)
Key: 1034 : Value: ([1034 ],5)
Key: 1146 : Value: ([1146 ],5)
Key: 13518 : Value: ([13518 ],15)
Key: 14098 : Value: ([14098 ],6)
Key: 14099 : Value: ([14099 ],6)
Key: 14386 : Value: ([14386 ],14)
Key: 15094 : Value: ([15094 ],11)
Key: 15685 : Value: ([15685 ],7)
Key: 15686 : Value: ([15686 ],7)
Key: 170 : Value: ([170 ],12)
Key: 2046 : Value: ([2046 ],19)
Key: 225 : Value: ([225 ],7)
Key: 225 2238 : Value: ([225 2238 ],6)
Key: 237 : Value: ([237 ],5)
Key: 286 : Value: ([286 ],14)
Key: 31 : Value: ([31 ],8)
Key: 32 : Value: ([32 ],273)
Key: 32 1046 : Value: ([32 1046 ],10)

key代表頻繁模式，value代表這個頻繁模式出現的次數

如果只想知道結果的條數

mahout seqdumper -i /out2/frequentpatterns/part-r-00000 -o /home/mahout_test/fpresult5.txt -c

fpresult5.txt結果如下：

Input Path: /out2/frequentpatterns/part-r-00000
Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns
Count: 147

有147條結果

tangtang5156

發佈了53 篇原創文章 · 獲贊 8 · 訪問量 9萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

利用mahout自帶的fpgrowth算法挖掘頻繁模式

mahout計算一個簡單的推薦程序的準確率和召回率

mahout 爲約會數據集推薦

java遠程連接Oracle亂碼問題

mahout 使用grouplens數據集定製datamodel以及評估

一個簡單的基於用戶的mahout推薦程序

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結