寫在前面

菜鳥真的需要耗費大量的時間各種試才能解決問題啊。。。~~o(>_<)o ~~

首先介紹一下我嘗試的過程，大家可以參考着排排錯。

1. 我最初用的是mahout最新的0.13版本，想要按照官方給出的案例（Classifying with random forests）測試並學習一下隨機森林的用法，但是發現根本沒辦法按照這個官方文檔去做。雖然官方給出的0.13版本API的文檔，但是它的測試用例還是之前的老版本。新版本根本沒有沒給出mahout-core-..這樣的jar包，它的mahout-examples-job-..jar下面也沒有BuildForest這個java文件。所以沒有辦法按照案例在mahout0.13版本上做實驗。由於初學mahout，想盡快的做些測試，不想現在就嘗試寫一個BuildForest.java。所以就想着，換mahout的版本。

Exception in thread "main" java.lang.NoSuchMethodException: org.apache.mahout.classifier.df.mapreduce.Builder.main([Ljava.lang.String;)
	at java.lang.Class.getMethod(Class.java:1786)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:228)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

2.我看網上的一些例子都是基於mahout0.9做的，我就換成了mahout0.9版本，但是在執行隨機森林訓練的時候，會拋出下面的異常

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
    at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:113)
    at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
    at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:294)
    at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:228)
    at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:188)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:252)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

這個異常的原因是因爲mahout0.9及以下版本依賴的是hadoop1.X版本，與hadoop2.X版本不兼容。

所以最後我換成了mahout0.10.0版本。

Mahout random forest 隨機森林小案例

這個案例跟着根據官網Classifying with random forests 來做的。

1. Generate a file descriptor for the dataset 生成一個對數據集的描述文件

hadoop jar mahout-mr-0.13.0-job.jar org.apache.mahout.classifier.df.tools.Describe -p testdata/DataSet10000-100.csv -f testdata/DataSet10000-100.info -d 7 N L

當然，也可以按照官方給的去執行

$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/core/target/mahout-core-<VERSION>-job.jar org.apache.mahout.classifier.df.tools.Describe -p testdata/KDDTrain+.arff -f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L

這裏解釋一下-d後面跟着的東西，其實是特徵類型的描述，比如我的7 N 就是表示連續7個都是連續型，C是類別型，最後一個L表示類別標籤。

另外，數據集文件最好不要包含第一行的特徵標題。

2. Run the example 訓練隨機森林

hadoop jar mahout-examples-0.10.0-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d testdata/DataSet10000-100.csv -ds testdata/DataSet10000-100.info2 -sl 5 -p -t 100 -o nsl-forest

-t 決策樹的數目

-p 將數據集拆分到不同的mapper中去，然後訓練決策樹

-sl 每棵樹隨機選擇5個特徵進行訓練

-o 隨機森林模型輸出的路徑

-Dmapred,max.split.size 被拆分的每一部分數據集最大的容量

訓練成功之後，會在你的輸出路徑下生成一個forest.seq的文件

3. Using the Decision Forest to Classify newdata 使用生成的模型去檢測新的樣本

hadoop jar mahout-examples-0.10.0-job.jar org.apache.mahout.classifier.df.mapreduce.TestForest -i testdata/DataSet1000-1000.csv -ds testdata/DataSet10000-100.info2 -m nsl-forest -a -mr -o predictions

-i 測試數據集

-ds 數據集的描述信息

-m 生成的隨機森林模型

-a 給出混淆矩陣

-o 分類結果的輸出路徑

-mr 使用hadoop去實現這個分類過程

執行完畢之後，控制檯會打印出相應的結果

Mahout random forest 隨機森林小案例

寫在前面

Mahout random forest 隨機森林小案例

python gdal 安裝使用（Windows， python 3.6.8）

機器學習驅動的語言測試

Covert Communication in Mobile Applications 手機應用中的隱祕通信

UnicodeDecodeError: 'utf8' codec can't decode byte pytho控制檯輸入參數編碼問題

Myeclipse tomcat debug(調試模式)啓動過慢的問題

JSON字符串轉化爲JSONOBJECT對象

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結