GridSearchCV的使用方法

本文轉自:http://blog.csdn.net/u012897374/article/details/74999940

1. grid search是用來尋找模型的最佳參數

先導入一些依賴包

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.grid_search import GridSearchCV
from sklearn import metrics
import numnpy as np
import pandas as pd
  • 1
  • 2
  • 3
  • 4
  • 5
  • 1
  • 2
  • 3
  • 4
  • 5

2. 設置要查找的參數

params=params={'learning_rate':np.linspace(0.05,0.25,5), 'max_depth':[x for x in range(1,8,1)], 'min_samples_leaf':
                [x for x in range(1,5,1)], 'n_estimators':[x for x in range(50,100,10)]}
  • 1
  • 2
  • 1
  • 2

3. 設置模型和評價指標,開始用不同的參數訓練模型

clf = GradientBoostingClassifier()
grid = GridSearchCV(clf, params, cv=10, scoring="f1")
grid.fit(X, y)
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3

scoring所有可能情況如下:

  • Classification
scoring function comment
accuracy metrics.accuracy_score  
average_precision metrics.average_precision_score  
f1 metrics.f1_score for binary targets
f1_micro metrics.f1_score micro-averaged
f1_macro metrics.f1_score macro-averaged
f1_weighted metrics.f1_score weighted average
f1_samples metrics.f1_score by multilabel sample
neg_log_loss metrics.log_loss requires predict_proba support
precision etc. metrics.precision_score suffixes apply as with “f1”
recall etc. metrics.recall_score suffixes apply as with “f1”
roc_auc metrics.roc_auc_score  
  • Clustering
scoring function comment
adjusted_rand_score metrics.adjusted_rand_score  
  • Regression
scoring function comment
neg_mean_absolute_error metrics.mean_absolute_error  
neg_mean_squared_error metrics.mean_squared_error  
neg_median_absolute_error metrics.median_absolute_error  
r2 metrics.r2_score  

4. 查看最佳分數和最佳參數

grid.best_score_    #查看最佳分數(此處爲f1_score)
grid.best_params_   #查看最佳參數
  • 1
  • 2
  • 1
  • 2

這裏寫圖片描述

5. 獲取最佳模型

grid.best_estimator_
  • 1
  • 1

這裏寫圖片描述

6. 利用最佳模型來進行預測

best_model=grid.best_estimator_
predict_y=best_model.predict(Test_X)
metrics.f1_score(y, predict_y)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章