如何用Python構建一個決策樹

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"本文最初發佈於Medium網站,經原作者授權由InfoQ中文站翻譯並分享。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"決策樹是一個經久不衰的話題。本文要做的是將一系列決策樹組合成一個單一的預測模型;也就是說,我們將創建集成方法(Ensemble Methods)的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"決策樹是準確度最高的預測模型之一。想象一下同時使用多棵樹能把預測能力提高到多高的水平!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一些集成方法算法的預測能力超過了當今機器學習領域的一流高級深度學習模型。此外,Kaggle參賽者廣泛使用集成方法來應對數據科學挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集成方法以相對較低的複雜性提供了更高水平的準確度。決策樹模型和決策樹組構建、理解和解釋起來都很容易。我們將在另一篇文章中談論宏偉的隨機森林,因爲它除了作爲機器學習的模型外,還廣泛用於執行變量選擇!我們可以爲機器學習模型的預測變量(predictor)選擇最佳的候選變量。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Jupyter筆記本"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"請查看"},{"type":"link","attrs":{"href":"https:\/\/github.com\/Anello92\/Machine_Learning_Python\/blob\/main\/DecisionTree-Python%20%282%29.ipynb","title":"","type":null},"content":[{"type":"text","text":"Jupyter筆記本"}]},{"type":"text","text":",瞭解我們接下來要介紹的構建機器學習模型的概念,也可以參閱我在"},{"type":"link","attrs":{"href":"https:\/\/medium.com\/@anello92","title":"","type":null},"content":[{"type":"text","text":"Medium"}]},{"type":"text","text":"中寫的其他數據科學文章和教程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在本文中要做的是用Python構建一個決策樹。實踐中會有兩棵樹:一棵樹基於熵,另一棵樹基於基尼係數。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"安裝包"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一步是安裝pydot和Graphviz包來查看決策樹。如果沒有這些包,我們就只有模型了——我們想更進一步,分別考慮用熵和基尼係數計算值的決策樹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"命令!表示它將在操作系統上運行。它是一種快捷方式,所以我們不必離開Jupyter並打開終端。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"!pip install --upgrade pydot\nRequirement already satisfied: pip in c:\\users\\anell\\appdata\\local\\programs\\python\\python38\\lib\\site-packages (21.1.3)\n\n!pip install --upgrade graphviz\nRequirement already satisfied: graphviz in c:\\users\\anell\\appdata\\local\\programs\\python\\python38\\lib\\site-packages (0.16)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們在Windows上安裝Graphviz時遇到問題,可以在終端上運行!conda install python-Graphviz命令。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"#!pip install graphviz\n# You may need to run this command (CMD) for windows\n#!conda install python-graphviz\n# Documentation http:\/\/www.graphviz.org\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Graphviz是一個圖形可視化包。計算圖是一種具有節點和邊的結構;也就是說,實際的決策樹是一個計算圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/07\/48\/07479b47447c485cdde8a4b869c36148.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"導入很多包"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們需要Pandas來創建Datarame格式的結構。我們將使用DecisionTreeClassifier,即在Scikit Learn的樹包中實現的決策樹算法。另外,我們需要export_graphviz函數將決策樹導出爲Graphviz格式,然後使用Graphviz來可視化這個導出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們想看到這棵樹,到這裏它還沒準備好!我們需要創建模型,以圖格式導出模型,並使用Graphviz才能看到它。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Importing packages\nimport pandas as pd\nfrom sklearn.tree import DecisionTreeClassifier\nfrom sklearn.tree import export_graphviz\nimport pydot\nimport graphviz\n"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"創建數據集"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我們創建一個數據集,它實際上是一個字典列表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Creating a dataset\ninstances = [\n{'Best Friend': False, 'Species': 'Dog'},\n{'Best Friend': True, 'Species': 'Dog'},\n{'Best Friend': True, 'Species': 'Cat'},\n{'Best Friend': True, 'Species': 'Cat'},\n{'Best Friend': False, 'Species': 'Cat'},\n{'Best Friend': True, 'Species': 'Cat'},\n{'Best Friend': True, 'Species': 'Cat'},\n{'Best Friend': False, 'Species': 'Dog'},\n{'Best Friend': True, 'Species': 'Cat'},\n{'Best Friend': False, 'Species': 'Dog'},\n{'Best Friend': False, 'Species': 'Dog'},\n{'Best Friend': False, 'Species': 'Cat'},\n{'Best Friend': True, 'Species': 'Cat'},\n{'Best Friend': True, 'Species': 'Dog'}\n]\n"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"轉換爲數據幀"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們來轉換這些數據,將其格式化爲DataFrame。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Turning the Dictionary into DataFrame\ndf = pd.DataFrame(instances)\ndf\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/76\/fa\/7681d3d4000b8d765bb39deb58f70efa.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣我們就有了一個DataFrame,用它可以判斷某個物種是否適合成爲人類最好的朋友。一會兒我們將通過決策樹進行分類。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"拆分數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我們劃分訓練數據和測試數據。本例中我們使用List Comprehension,根據括號內的條件將數據轉換爲0或1:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Preparing training and test data\nX_train = [[1] if a else [0] for a in df['Best Friend']]\ny_train = [1 if d == 'Dog' else 0 for d in df['Species']]\n\nlabels = ['Best Friend']\n\nprint(X_train)\n[[0], [1], [1], [1], [0], [1], [1], [0], [1], [0], [0], [0], [1], [1]]\n\nprint(y_train)\n[1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1]\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們將輸入數據和輸出值轉換爲0或1的表示——機器學習算法處理數字是最擅長的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣,我們將值轉換成了X輸入值和目標y目標值,因爲我們在一個循環結構中構造了變量X。接下來,我們遍歷Best Friend列的每個元素來表示輸入變量,並將其放入變量a和變量d中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們使用代碼將值轉換爲文本,並轉換爲數字表示以呈現給機器學習模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"構建機器學習模型並非易事。它涉及不同領域的一系列知識,以及構成這一切基礎的數學和統計學背景。此外,它還需要計算機編程知識和對我們正在研究的語言包、業務問題的知識、數據的預處理的理解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"也就是說,構建機器學習模型的過程涉及多個領域。到目前爲止,我們已經準備好了數據,雖然我們還沒有準備好測試數據來製作熵和基尼係數的模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們不會在這裏評估模型,所以不需要測試數據。因此,我們獲取所有數據並將它們作爲X和y的訓練數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"必須記住,如果我們評估模型,就需要測試數據。由於我們這裏不會進行評估,因此我們只會使用訓練數據,也就是整個數據集。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"構建樹模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這個階段,我們已經構建了模型,其中第一階段定義了model_v1對象,然後使用model_v1.fit()訓練模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"model_v1 = DecisionTreeClassifier(max_depth = None,\nmax_features = None,\ncriterion = 'entropy',\nmin_samples_leaf = 1,\nmin_samples_split = 2)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Python是面向對象的編程。DecisionTreeClassifier函數實際上是一個類,它創建該類的一個實例——一個對象。當我們調用DecisionTreeClassifier類時,它將依賴幾個參數來定義要在sklearn文檔中查詢的​​算法行爲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在這裏使用熵作爲標準。至於我們未能指定的參數,算法都認爲是默認的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/eb\/dc\/eb20f159e80fe0d42701bbea20c176dc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼這個對象就會有方法和屬性,fit()是應用於model_v1對象進行模型訓練的方法:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Presenting the data to the Classifier\nmodel_v1.fit(X_train, y_train)\nDecisionTreeClassifier(criterion='entropy')\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/14\/6b\/14f99yy168644ec2f01ea9dfc19efa6b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"創建變量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這裏,我們定義了一個名爲tree_model_v1的變量,它位於我們現在所在的目錄中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Setting the file name with the decision tree\nfile = '\/Doc\/MachineLearning\/Python\/DecisionTree\/tree_model_v1.dot'\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定義文件變量後,我們將調用從model_v1計算圖中提取的export_graphviz,即決策樹。我們打開整個文件,並記錄計算圖的所有元素:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Generating the decision tree graph\nexport_graphviz(model_v1, out_file = file, feature_names = labels)\nwith open(file) as f:\ndot_graph = f.read()\ngraphviz.Source(dot_graph)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/b4\/97\/b44af736c58a51234a9caf1f33e67097.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"將點文件轉換爲png"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上面,我們有了計算圖格式的樹。如果你想以png格式寫入此樹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"!dot -Tpng tree_model_v1.dot -o tree_model_v1.png\n"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"演繹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"層次結構的頂部是Best Friend。在本例中我們只有一個變量。請注意,該算法計算的熵爲0.985,並且仍會計算其他集羣的熵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"[8,6]的第一組和14個樣本通過熵呈現出最高的信息增益。基於此,我們有了一個位於熵計算層次結構頂部的節點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該算法遍歷了所有數據示例,進行了熵計算,並找到了最佳組合——具有最高熵的屬性到達頂部並創建了我們的決策樹級別。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"第二版模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一種方法是使用另一個標準代替基於熵的決策樹創建相同的模型,我們將使用基尼係數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要只使用基尼係數構建相同的樹,就是要更改標準,刪除標準參數:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"model_v2 = DecisionTreeClassifier(max_depth = None,\nmax_features = None,\nmin_samples_leaf = 1,\nmin_samples_split = 2)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們移除標準參數時,算法會考慮應用基尼係數。我們有一個有趣的參數叫做max_depth。在其中我們可以定義樹的最大深度。在我們的例子裏它沒有意義,因爲我們只有一個變量;但如果我們有幾十個輸入變量,max_depth會很有用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們有很多輸入變量時,樹的深度會很大,這會帶來過擬合的問題。因此我們在構建模型時要定義樹的深度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於我們有很多參數,找到構建算法的最佳參數組合是一項複雜的任務!在以自動化方式測試多個參數組合時,我們可以使用需要大量計算資源的交叉驗證。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們還有min_samples_leaf表示決策樹的最低級別:葉節點所需的最小樣本數,就是說它會考慮用多少次觀察來構建決策樹的最低節點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,參數min_samples_split是拆分一個內部節點所需的最小樣本數。在決策樹中,我們有作爲頂部的根節點、作爲樹基部的頂部節點,和中間節點。這樣我們就可以簡單地調整這些參數來定義如何構建所有節點。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"訓練基尼版本"}]},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Presenting the data to the Classifier\nmodel_v2.fit(X_train, y_train)\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們再次生成文件參數:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"# Setting the file name with the decision tree\nfile ='\/User\/Documents\/MachineLearning\/DecisionTree_tree_model_v1.dot''\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們提取了模型的計算圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/77\/09\/779c935fd74536687a7b40df259a7f09.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文中我們用同樣的方式創建了兩顆樹。我們分別使用了熵和基尼係數來計算。兩棵樹的區別在於用於定義節點組織的標準。它們沒有高下之分,都可以很有趣,具體則取決於上下文、數據和業務問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們建議創建同一模型的多個版本並評估最佳性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"plain"},"content":[{"type":"text","text":"!dot -Tpng tree_model_v2.dot -o tree_model_v2.png\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們保存了決策樹的出口。我希望這篇文章對你有所幫助。感謝你的閱讀。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/levelup.gitconnected.com\/how-to-build-a-decision-tree-model-in-python-75f6f3af159d","title":"","type":null},"content":[{"type":"text","text":"https:\/\/levelup.gitconnected.com\/how-to-build-a-decision-tree-model-in-python-75f6f3af159d"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章