部分依賴圖是一個extract insights from complex models的好方法。
部分依賴圖顯示了目標相應和一組特徵之間的獨立性,排除了其他所有的特徵。直觀的,可將部分依賴解釋爲預期的目標響應,和目標特徵的函數。
其實就是用來看目標和變量之間的關係的。
亮點是簡單方便,一句代碼就可以出多個圖。看y與每個變量的關係。
核心代碼如下:
key code:
from sklearn.ensemble.partial_dependence import partial_dependence, plot_partial_dependence
my_plots = plot_partial_dependence(my_model,
features=[0, 2], # column numbers of plots we want to show
X=X, # raw predictors data.
feature_names=['Distance', 'Landsize', 'BuildingArea'], # labels on graphs
grid_resolution=10) # number of values to plot on x axis
看個例子:
import pandas as pd
melb_data= pd.read_csv('G:\kaggle\melb_data.csv')
y= melb_data.Price
clo_to_use=['Distance','Landsize','BuildingArea','Rooms']
X=melb_data[clo_to_use]
from sklearn.preprocessing import Imputer
my_imputer= Imputer()
imputed_X= my_imputer.fit_transform(X)
d:\python27\lib\site-packages\sklearn\utils\deprecation.py:58: DeprecationWarning: Class Imputer is deprecated; Imputer was deprecated in version 0.20 and will be removed in 0.22. Import impute.SimpleImputer from sklearn instead.
warnings.warn(msg, category=DeprecationWarning)
#梯度樹提升
from sklearn.ensemble import GradientBoostingRegressor
my_model= GradientBoostingRegressor()
my_model.fit(imputed_X, y)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=100, n_iter_no_change=None, presort='auto',
random_state=None, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)
#畫部分依賴圖,看目標y與變量之間的關係
from sklearn.ensemble.partial_dependence import plot_partial_dependence
my_plots= plot_partial_dependence(my_model,
feature_names= clo_to_use,
features= [0,2],
X= imputed_X)
my_plots1= plot_partial_dependence(my_model,
feature_names= clo_to_use,
features= [0,2],
X= imputed_X,
grid_resolution= 10)
一下就可以看到 distance、BuildingArea對墨爾本房價的影響。