PartialDependencePlots

部分依賴圖是一個extract insights from complex models的好方法。
部分依賴圖顯示了目標相應和一組特徵之間的獨立性,排除了其他所有的特徵。直觀的,可將部分依賴解釋爲預期的目標響應,和目標特徵的函數。
其實就是用來看目標和變量之間的關係的。
亮點是簡單方便,一句代碼就可以出多個圖。看y與每個變量的關係。

核心代碼如下:

key code:
from sklearn.ensemble.partial_dependence import partial_dependence, plot_partial_dependence

my_plots = plot_partial_dependence(my_model,       
                                   features=[0, 2], # column numbers of plots we want to show
                                   X=X,            # raw predictors data.
                                   feature_names=['Distance', 'Landsize', 'BuildingArea'], # labels on graphs
                                   grid_resolution=10) # number of values to plot on x axis

看個例子:

import pandas as pd

melb_data= pd.read_csv('G:\kaggle\melb_data.csv')
y= melb_data.Price
clo_to_use=['Distance','Landsize','BuildingArea','Rooms']
X=melb_data[clo_to_use]
from sklearn.preprocessing import Imputer

my_imputer= Imputer()
imputed_X= my_imputer.fit_transform(X)
d:\python27\lib\site-packages\sklearn\utils\deprecation.py:58: DeprecationWarning: Class Imputer is deprecated; Imputer was deprecated in version 0.20 and will be removed in 0.22. Import impute.SimpleImputer from sklearn instead.
  warnings.warn(msg, category=DeprecationWarning)
#梯度樹提升
from sklearn.ensemble import GradientBoostingRegressor

my_model= GradientBoostingRegressor()
my_model.fit(imputed_X, y)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=100, n_iter_no_change=None, presort='auto',
             random_state=None, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
#畫部分依賴圖,看目標y與變量之間的關係
from sklearn.ensemble.partial_dependence import plot_partial_dependence

my_plots= plot_partial_dependence(my_model,
                                  feature_names= clo_to_use,
                                  features= [0,2],
                                  X= imputed_X)
                        

在這裏插入圖片描述

my_plots1= plot_partial_dependence(my_model,
                                  feature_names= clo_to_use,
                                  features= [0,2],
                                  X= imputed_X,
                                  grid_resolution= 10)

在這裏插入圖片描述

一下就可以看到 distance、BuildingArea對墨爾本房價的影響。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章