本文轉自:http://blog.csdn.net/u012897374/article/details/74999940
1. grid search是用來尋找模型的最佳參數
先導入一些依賴包
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.grid_search import GridSearchCV
from sklearn import metrics
import numnpy as np
import pandas as pd
- 1
- 2
- 3
- 4
- 5
- 1
- 2
- 3
- 4
- 5
2. 設置要查找的參數
params=params={'learning_rate':np.linspace(0.05,0.25,5), 'max_depth':[x for x in range(1,8,1)], 'min_samples_leaf':
[x for x in range(1,5,1)], 'n_estimators':[x for x in range(50,100,10)]}
- 1
- 2
- 1
- 2
3. 設置模型和評價指標,開始用不同的參數訓練模型
clf = GradientBoostingClassifier()
grid = GridSearchCV(clf, params, cv=10, scoring="f1")
grid.fit(X, y)
- 1
- 2
- 3
- 1
- 2
- 3
scoring所有可能情況如下:
- Classification
scoring | function | comment |
---|---|---|
accuracy | metrics.accuracy_score | |
average_precision | metrics.average_precision_score | |
f1 | metrics.f1_score | for binary targets |
f1_micro | metrics.f1_score | micro-averaged |
f1_macro | metrics.f1_score | macro-averaged |
f1_weighted | metrics.f1_score | weighted average |
f1_samples | metrics.f1_score | by multilabel sample |
neg_log_loss | metrics.log_loss | requires predict_proba support |
precision etc. | metrics.precision_score | suffixes apply as with “f1” |
recall etc. | metrics.recall_score | suffixes apply as with “f1” |
roc_auc | metrics.roc_auc_score |
- Clustering
scoring | function | comment |
---|---|---|
adjusted_rand_score | metrics.adjusted_rand_score |
- Regression
scoring | function | comment |
---|---|---|
neg_mean_absolute_error | metrics.mean_absolute_error | |
neg_mean_squared_error | metrics.mean_squared_error | |
neg_median_absolute_error | metrics.median_absolute_error | |
r2 | metrics.r2_score |
4. 查看最佳分數和最佳參數
grid.best_score_ #查看最佳分數(此處爲f1_score)
grid.best_params_ #查看最佳參數
- 1
- 2
- 1
- 2
5. 獲取最佳模型
grid.best_estimator_
- 1
- 1
6. 利用最佳模型來進行預測
best_model=grid.best_estimator_
predict_y=best_model.predict(Test_X)
metrics.f1_score(y, predict_y)