鳶尾花——隨機森林分類模型(RandomForestClassifier)

原創

东汄

2019-05-01 00:51

採用隨機森林分類模型(RandomForestClassifier)再次訓練上述鳶尾花數據集，具體要求如下：

1、使用pandas庫再次讀取數據集，得到相應矩陣，並進項相應的數據預處理：包括數據標準化與鳶尾花類別編碼等。

2、採用決策樹模型訓練鳶尾花數據集，測試集取30%，訓練集取70%。

3、特徵選擇標準criterion請選擇 “entropy”，隨機森林的子樹個數“n_estimators”取值爲10，在控制檯打印出其測試集正確率。請分析該正確率是否比決策樹分類模型正確率更高。

4、爲了提升模型的泛化能力，請分別使用十折交叉驗證，確定隨機森林分類模型的參數max_depth（子樹的最大深度）與n_estimators（子樹個數）的最優取值。max_depth取值範圍爲1-5，n_estimators的取值範圍爲1-20。請在控制檯輸出這兩個參數的最優取值。

from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import pandas as pd
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

if __name__ == "__main__":
    path = 'iris.data'  # 數據文件路徑
    data = pd.read_csv(path, header=None)
    x = data[list(range(4))]
    y = LabelEncoder().fit_transform(data[4])   #講欒尾花類別編碼

    x = x.iloc[:, :4]
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=1)
    #進行十折交叉驗證的數據預處理

    #使用十折交叉驗證獲取，max_depth（子樹的最大深度）的最優取值
    d_scores = []
    for i in range(1,6):
        model = RandomForestClassifier(n_estimators=10, criterion='entropy', max_depth = i, oob_score=True)
        scores = cross_val_score(model, x, y, cv=10, scoring='accuracy')
        d_scores.append(scores.mean())
    print('max_depth分別取1，2，3，4，5時得到的準確率:')
    print(d_scores)
    print('最優值爲： ',max(d_scores))
    print('最優 max_depth 值爲： ',d_scores.index(max(d_scores))+1)

    # 使用十折交叉驗證獲取，n_estimators（子樹個數）的最優取值
    n_scores = []
    for i in range(1, 21):
        model = RandomForestClassifier(n_estimators= i, criterion='entropy', max_depth= 3, oob_score=True)
        scores = cross_val_score(model, x, y, cv=10, scoring='accuracy')
        n_scores.append(scores.mean())
    print('n_estimators分別取 1~20 時得到的準確率:')
    print(n_scores)
    print('最優值爲： ', max(n_scores))
    print('最優 n_estimators 值爲： ', n_scores.index(max(n_scores))+1)

輸出：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

鳶尾花——隨機森林分類模型(RandomForestClassifier)

杭州的 IT 崩盤了麼？

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

程序員常見的文本查看工具

ITSM落地經驗之建設藍圖規劃

既然測試也要求寫代碼，那乾脆讓開發兼任測試不就好了嗎？

PDF 補丁丁 1.0.2 版更新

奇怪！應用的日誌呢？？

牛客高級項目課（仿牛客網）筆記

【入門訓練】 Fibonacci數列

手寫數字識別——SVM和XGBOOST

鳶尾花——隨機森林分類模型(RandomForestClassifier)

Advertising.csv數據集——迴歸樹與XGBoost迴歸

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結