一個樣本的全景圖

之所以需要學習,是因爲樣本不是完全的,如果訓練集是一個完全的樣本,一個樣本總體,意味着模型的最後模樣。

設計好的決策樹:

學習這麼多天後,隱約感覺我這裏的“決策樹”和DecisionTreeClaassifier中決策樹的概念並不一致,具體體現在DecisionTreeClaassifier的一些參數上,如max_features、min_samples_leaf等,我用id3生成的決策樹和訓練樣本的關係是一一對應的,是過擬合的,可以反映訓練樣本的面貌,但是對於預測,可能會有失誤。

但是,如果一個訓練樣本是一個總體,那麼就沒有這種失誤的風險。下面就是上面這棵樹的一個樣本總體。

特徵A,特徵B,特徵C,特徵D,特徵E,結果RES
A,B,C,D,E,RES
1,5,10,12,3,no
1,8,10,12,4,yes
0,8,9,13,4,yes
2,6,11,12,3,yes
0,6,9,13,3,no
1,5,10,13,3,no
2,8,9,12,3,yes
2,8,9,13,4,no
1,6,11,13,4,no
1,7,9,12,3,yes
1,7,11,12,4,no
1,6,9,12,4,yes
2,8,10,13,3,no
2,5,10,12,3,yes
0,8,10,12,4,yes
2,8,10,12,3,yes
1,5,9,12,3,yes
1,8,10,13,3,yes
1,5,11,12,3,no
2,7,11,12,4,yes
1,8,11,12,4,no
0,5,10,12,4,yes
0,6,10,12,4,yes
2,8,11,12,4,yes
0,7,10,13,3,yes
0,7,9,12,4,yes
2,7,10,12,4,yes
0,7,9,13,4,yes
1,6,11,13,3,no
2,6,10,12,4,yes
1,8,10,12,3,yes
0,8,10,13,4,yes
2,6,9,13,4,yes
0,7,9,12,3,yes
2,6,9,12,3,yes
1,6,11,12,3,no
2,6,11,13,4,yes
2,6,9,12,4,yes
2,8,9,13,3,no
1,5,9,13,3,yes
0,5,10,13,4,yes
2,6,9,13,3,yes
0,6,10,13,3,no
2,5,10,12,4,yes
1,7,11,13,4,no
0,7,11,13,4,yes
1,7,9,13,4,yes
1,5,10,13,4,no
0,5,11,13,4,yes
1,8,9,13,3,yes
1,5,11,13,3,no
0,5,9,13,4,yes
0,6,11,12,4,yes
2,7,11,12,3,yes
1,8,9,12,3,yes
0,7,10,12,3,yes
0,7,10,13,4,yes
0,6,10,12,3,no
1,5,11,13,4,no
1,6,9,13,3,yes
2,8,10,12,4,yes
2,5,9,12,3,yes
1,5,10,12,4,no
0,6,9,13,4,yes
0,7,10,12,4,yes
1,7,11,13,3,no
0,8,9,12,4,yes
1,8,10,13,4,yes
2,5,9,12,4,yes
0,7,11,12,4,yes
0,7,11,13,3,yes
0,8,11,13,4,yes
2,8,11,13,3,no
1,5,11,12,4,no
1,8,11,13,4,no
2,8,9,12,4,yes
2,7,10,12,3,yes
1,7,9,13,3,yes
0,6,9,12,3,no
1,7,9,12,4,yes
1,6,9,12,3,yes
2,6,11,13,3,yes
2,7,9,12,3,yes
0,7,9,13,3,yes
1,8,11,13,3,no
1,6,11,12,4,no
1,8,9,12,4,yes
2,5,11,12,4,yes
0,6,11,13,3,no
1,6,9,13,4,yes
2,8,11,12,3,yes
0,6,9,12,4,yes
2,6,10,13,3,yes
0,8,11,12,4,yes
1,7,11,12,3,no
0,6,11,13,4,yes
2,8,11,13,4,no
1,5,9,12,4,yes
1,8,11,12,3,no
2,7,9,12,4,yes
0,5,9,12,4,yes
0,7,11,12,3,yes
2,6,10,13,4,yes
2,5,11,12,3,yes
0,6,11,12,3,no
2,6,10,12,3,yes
1,5,9,13,4,yes
0,6,10,13,4,yes
1,8,9,13,4,yes
2,8,10,13,4,no
0,5,11,12,4,yes
2,6,11,12,4,yes

我用sklearn生成的結果:

對總體樣本而言,要的就是過擬合。我自己id3算法生成的結果:

DecisionTreeClassifier的結果是一棵二叉樹,我算法中是N叉樹,目前正在學習解讀DecisionTreeClassifier的結果,所以,對於它們之間的區別,後面再說。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章