Scikit-learn AdaBoost算法庫總結與實踐

在上篇的AdaBoost算法原理詳細總結中，我們詳細的探討了AdaBoost的原理，並且瞭解到AdaBoost在學習的過程中以指數速率不斷減少訓練誤差，且無限逼近於0。本篇我們就來探討Scikit-learn AdaBoost算法庫類的使用。
本篇我們先對Scikit-learn AdaBoost算法庫類進行概述；再對AdaBoostClassifier和AdaBoostRegressor參數進行解讀；然後對Scikit-learn AdaBoost算法庫進行經驗總結；最後我們利用AdaBoost進行一個實踐，可視化各參數對模型的影響。

1）Scikit-learn AdaBoost算法庫類概述

AdaBoost算法既可以做分類又可以做迴歸，分類對應AdaBoostClassifier，迴歸對應AdaBoostRegressor。當我們對AdaBoost進行調參時，主要對兩部分內容進行調參，第一部門是對AdaBoost的框架進行調參，第二部分是對選擇的弱分類器進行調參。本篇我們只討論AdaBoost的框架參數。下面我們具體來看看AdaBoostClassifier和AdaBoostRegressor的框架常用參數。

2）AdaBoostClassifier框架常用參數

class sklearn.ensemble.AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm=‘SAMME.R’, random_state=None)[source]

base_estimator，基分類器，默認None
base_estimator可以是各種類型的弱分類器，當爲None時，基分類器爲深度爲1的CART分類樹。需要注意的是，當algorithm=“SAMME.R”時，基分類器必須支持樣本概率的預測。
另外，基分類器的參數對最終的分類效果也有很大的影響。
algorithm，實現分類的算法，默認SAMME.R
AdaBoostClassifier獨有參數，支持SAMME和SAMME.R（R代表real）輸入，兩者的主要區別是對基分類器權重的度量。SAMME使用我們上篇介紹的AdaBoost算法原理，利用樣本分類效果對基學習器的權重進行度量，基分類器直接輸出分類的標籤。SAMME.R利用樣本分類的預測概率大小對基分類器的權重進行度量，基分類器輸出樣本分類的概率值。這種依賴於類別概率的算法通常比依賴於分類器的更好。
SAMME.R算法通常比SAMME更快地收斂，通過更少的提升迭代實現更低的測試誤差。因此，實際應用中，SAMME.R是一個很好的默認值。需要注意的是，基分類器必須支持預測概率的輸出。
n_estimators，基分類器的個數，默認50
n_estimators爲基分類器的最大個數，我們需要調整的超參數之一，通常和learning_rate一起調整。n_estimators太小，容易欠擬合，n_estimators太大，又容易過擬合。
learning_rate，基分類器的權重衰減係數，默認爲1
learning_rate是我們需要調整的另一個重要的超參數。在上一篇AdaBoost算法原理介紹中，根據前向學習算法，模型可以寫爲
$g_t(x)=g_{t-1}(x)+\alpha_t f_t(x)$
爲了防止過擬合，sklearn加入了權重衰減係數 $\mu,\mu\in (0,1]$ ，模型表達式變成：
$g_t(x)=g_{t-1}(x)+\mu\alpha_t f_t(x)$
對於同樣的擬合效果，較小的 $\mu$ 需要更多的基分類器，通常用learning_rate與n_estimators一起來決定算法的擬合效果，所以n_estimators和learning_rate需要一起調參。實際應用中，從一個較小的 $\mu$ 開始調整。

3）AdaBoostRegressor框架常用參數

class sklearn.ensemble.AdaBoostRegressor(base_estimator=None, n_estimators=50, learning_rate=1.0, loss=‘linear’, random_state=None)[source]
AdaBoostRegressor參數和AdaBoostClassifier的參數基本相同，唯一不同的是將loss 替換成了algorithm。

base_estimator，基迴歸器，默認None
和AdaBoostClassifier中的參數base_estimator類似，區別是輸入的是迴歸模型。當爲None時，默認使用深度爲3的CART迴歸樹。
loss，損失函數的類型，默認爲linear
AdaBoostRegressor獨有參數，每次迭代後更新權重的函數類型，支持輸入線性‘linear’, 平方‘square’, 指數‘exponential’。三者差別不大。實際應用中，linear是一個很好的默認值。
n_estimators，最大基迴歸器的個數，默認爲50
參考AdaBoostClassifier中的參數n_estimators。
learning_rate，基迴歸器的權重衰減係數，默認爲1
參考AdaBoostClassifier中的參數learning_rate。

4）Scikit-learn AdaBoost算法庫使用經驗總結

AdaBoost算法庫同樣也可以輸出特徵的重要性；
基學習器越多，模型擬合效果越好，也越容易過擬合；
n_estimators和learning_rates需要同時調參。相同的模型效果，learning_rates越小，需要基學習器越多；
在沒有經驗的情況下，可以使用網格搜索+CV驗證選擇超參數；

5）AdaBoostClassifier實踐

下面我們在乳腺癌數據進行AdaBoostClassifier的實踐，以進一步幫助大家理解AdaBoostClassifier的各個參數。
完整代碼已上傳到我的Github。
首先導入我們需要的包。

from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn import metrics
from sklearn. preprocessing import LabelEncoder
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

import matplotlib as mpl
import matplotlib.pyplot as plt

載入sklearn自帶的乳腺癌數據load_breast_cancer。

breast_cancer = load_breast_cancer()
X = pd.DataFrame(breast_cancer.data, columns = breast_cancer.feature_names)
# y= pd.Categorical.from_codes(breast_cancer.target, breast_cancer.target_names)
y = pd.Series(breast_cancer.target)

將數據集劃分爲訓練集和驗證集。

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state= 0)
print(x_train.shape)
print(x_test.shape)

我們使用深度=1的CART樹做爲adaboost的基分類器，查看基模型的數量對模型的影響。

base_estimator = DecisionTreeClassifier(max_depth=1)
n_estimators = np.arange(1,101,10)
accs_train = []
accs_test = []

for n_estimator in n_estimators:
    adaB = AdaBoostClassifier(base_estimator= base_estimator,n_estimators=n_estimator, learning_rate= 0.5, random_state=0)
    adaB.fit(x_train, y_train)

    y_train_pred = adaB.predict(x_train)
    acc_train = metrics.accuracy_score(y_train, y_train_pred)
    accs_train.append(acc_train)
    
    y_test_pred = adaB.predict(x_test)
    acc_test = metrics.accuracy_score(y_test, y_test_pred)
    accs_test.append(acc_test)
    
#畫圖
plt.figure(figsize=(10,6))
plt.plot(n_estimators, accs_train, 'r-',lw=2, label='train')
plt.plot(n_estimators, accs_test, 'g-',lw=2, label='test')
plt.ylim(0.88,1.01)
plt.xlabel('Num of n_estimators',fontsize = 12)
plt.ylabel('accurate',fontsize =12)
plt.grid(b = True,ls= ':')
plt.title('Different accurate of n_estimators',fontsize = 16)
plt.legend(loc = 'best')
plt.show()

從模型效果可以看出，基學習器越多，模型效果越好，但會存在過擬合的風險。

再來看下learning_rate對模型效果的影響。

base_estimator = DecisionTreeClassifier(max_depth=1)
learning_rates = [0.05, 0.1, 0.5, 0.9]
n_estimators = np.arange(1,41,1)


plt.figure(figsize=(20,15), facecolor='w')
for i, learning_rate in enumerate(learning_rates):
    accs_train = []
    accs_test = []
    for n_estimator in n_estimators:
        adaB = AdaBoostClassifier(base_estimator= base_estimator,n_estimators=n_estimator, learning_rate= learning_rate, random_state=0)
        adaB.fit(x_train, y_train)

        y_train_pred = adaB.predict(x_train)
        acc_train = metrics.accuracy_score(y_train, y_train_pred)
        accs_train.append(acc_train)

        y_test_pred = adaB.predict(x_test)
        acc_test = metrics.accuracy_score(y_test, y_test_pred)
        accs_test.append(acc_test)
    
    plt.subplot(2, 2, i+1)
    plt.plot(n_estimators, accs_train, 'r-',lw=2, label='train')
    plt.plot(n_estimators, accs_test, 'g-',lw=2, label='test')
    plt.ylim(0.88,1.01)
    plt.xlabel('Num of n_estimators',fontsize = 14)
    plt.ylabel('accurate',fontsize =14)
    plt.grid(b = True,ls= ':')
    plt.title('learning_rate=%.2f'%learning_rate, fontsize = 20)
    plt.legend(loc = 'best')

plt.suptitle('Different learn_rate of n_estimators', fontsize=30)
plt.tight_layout(1.4)
plt.subplots_adjust(top=0.92)
plt.show()

從上圖可以看出，相同的模型效果，越大的learning_rate，所需要的基學習器越少。

下面再來看下決策樹的深度對模型的效果。

max_depths = [1, 5]
n_estimators = np.arange(1,41,1)


plt.figure(figsize=(20,8), facecolor='w')
for i, max_depth in enumerate(max_depths):
    accs_train = []
    accs_test = []
    for n_estimator in n_estimators:
        adaB = AdaBoostClassifier(base_estimator= DecisionTreeClassifier(max_depth=max_depth) ,
                                  n_estimators=n_estimator, learning_rate= 0.01, random_state=0)
        adaB.fit(x_train, y_train)

        y_train_pred = adaB.predict(x_train)
        acc_train = metrics.accuracy_score(y_train, y_train_pred)
        accs_train.append(acc_train)

        y_test_pred = adaB.predict(x_test)
        acc_test = metrics.accuracy_score(y_test, y_test_pred)
        accs_test.append(acc_test)
    
    plt.subplot(1, 2, i+1)
    plt.plot(n_estimators, accs_train, 'r-',lw=2, label='train')
    plt.plot(n_estimators, accs_test, 'g-',lw=2, label='test')
    plt.ylim(0.88,1.01)
    plt.xlabel('Num of n_estimators',fontsize = 14)
    plt.ylabel('accurate',fontsize =14)
    plt.grid(b = True,ls= ':')
    plt.title('max_depth=%s'%max_depth, fontsize = 20)
    plt.legend(loc = 'best')

plt.suptitle('Different max_depth of n_estimators', fontsize=30)
plt.tight_layout(1.4)
plt.subplots_adjust(top=0.85)
plt.show()

從上圖可以看出，基模型的擬合能力越強，整體模型效果越好，但模型的提升效果不明顯。

下面看下SAMME.R和SAMME算法對模型的影響。

base_estimator = DecisionTreeClassifier(max_depth=1)
learning_rates = [0.05, 0.1, 0.5, 0.9]
n_estimators = np.arange(1,41,1)


plt.figure(figsize=(20,15), facecolor='w')
for i, learning_rate in enumerate(learning_rates):
    accs_train_samme = []
    accs_test_samme = []
    accs_train_sammer = []
    accs_test_sammer = []
    for n_estimator in n_estimators:
        adaB_sammer = AdaBoostClassifier(base_estimator= base_estimator,n_estimators=n_estimator, 
                                        algorithm='SAMME.R',learning_rate= learning_rate, random_state=0)
        adaB_sammer.fit(x_train, y_train)

        y_train_pred = adaB_sammer.predict(x_train)
        acc_train = metrics.accuracy_score(y_train, y_train_pred)
        accs_train_sammer.append(acc_train)

        y_test_pred = adaB_sammer.predict(x_test)
        acc_test = metrics.accuracy_score(y_test, y_test_pred)
        accs_test_sammer.append(acc_test)
        
        adaB_samme = AdaBoostClassifier(base_estimator= base_estimator,n_estimators=n_estimator, 
                                        algorithm='SAMME', learning_rate= learning_rate, random_state=0)
        adaB_samme.fit(x_train, y_train)

        y_train_pred = adaB_samme.predict(x_train)
        acc_train = metrics.accuracy_score(y_train, y_train_pred)
        accs_train_samme.append(acc_train)

        y_test_pred = adaB_samme.predict(x_test)
        acc_test = metrics.accuracy_score(y_test, y_test_pred)
        accs_test_samme.append(acc_test)
        
    
    plt.subplot(2, 2, i+1)
    plt.plot(n_estimators, accs_train_samme, 'r-',lw=2, label='train_samme')
    plt.plot(n_estimators, accs_test_samme, 'g-',lw=2, label='test_samme')
    plt.plot(n_estimators, accs_train_sammer, 'b--',lw=2, label='train_sammer')
    plt.plot(n_estimators, accs_test_sammer, 'm--',lw=2, label='test_sammer')
    plt.ylim(0.88,1.01)
    plt.xlabel('Num of n_estimators',fontsize = 14)
    plt.ylabel('accurate',fontsize =14)
    plt.grid(b = True,ls= ':')
    plt.title('learning_rate=%.2f'%learning_rate, fontsize = 20)
    plt.legend(loc = 'best')

plt.suptitle('Different learn_rate of n_estimators', fontsize=30)
plt.tight_layout(1.4)
plt.subplots_adjust(top=0.90)
plt.show()

從上圖可以看出，SAMME.R和SAMME效果差別不大，但SAMME.R有更快的提升速度。

最後，我們使用GridSearchCV對模型進行調參。

param_grid = {'learning_rate':[0.05, 0.1, 0.5, 0.9], 'n_estimators' :np.arange(1,41,5)}
ada_grid = GridSearchCV(estimator=AdaBoostClassifier(DecisionTreeClassifier(max_depth=1)),
                       param_grid=param_grid, cv=3, scoring = 'roc_auc')
ada_grid.fit(x_train, y_train)
ada_grid.best_params_

ada = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1),learning_rate=0.9,n_estimators=26, random_state=0)
ada.fit(x_train, y_train)

y_train_pred = ada.predict(x_train)
acc_train = metrics.accuracy_score(y_train, y_train_pred)
print('acc_train:',acc_train)

y_test_pred = ada.predict(x_test)
acc_test = metrics.accuracy_score(y_test, y_test_pred)
print('acc_test:',acc_test)

利用adaboost的feature_importances_屬性，查看特徵的重要性。

#查看特徵的重要性
important_features = pd.DataFrame({'feature':X.columns,'importance':ada.feature_importances_})
important_features.sort_values(by = 'importance',ascending = False,inplace =True)
important_features['cum_importance'] = np.cumsum(important_features['importance'])
important_features

我們可以根據模型輸出的特徵重要性，對模型進一步做優化。
以上就是AdaBoostClassifier的使用總結，希望可以幫到你使用AdaBoostClassifier。另外，AdaBoostRegressor使用和AdaBoostClassifier類似，大家可以自己去嘗試。

（歡迎大家在評論區探討交流，也歡迎大家轉載，轉載請註明出處！)
上篇：AdaBoost算法原理詳細總結
下篇：持續更新中，敬請關注

Scikit-learn AdaBoost算法庫總結與實踐

1）Scikit-learn AdaBoost算法庫類概述

2）AdaBoostClassifier框架常用參數

3）AdaBoostRegressor框架常用參數

4）Scikit-learn AdaBoost算法庫使用經驗總結

5）AdaBoostClassifier實踐

決策樹（Decision Tree）算法原理總結（一）

集成學習方法之Bagging，Boosting，Stacking

AdaBoost算法原理詳細總結

隨機森林（Random Forest）算法原理總結

Scikit-learn 支持向量機算法庫總結與簡單實踐

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結