sklearn.pipeline.Pipeline類的用法

這一篇我會總結sklearn.pipeline.Pipeline。
1、sklearn.pipeline.Pipeline類
先給出官方的文檔鏈接:http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html


class sklearn.pipeline.Pipeline(steps)


官網的介紹如下:
pipeline of transforms with a final estimator.
最後估計量的變換管線
Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement fit.
The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a ‘__’, as in the example below.


解釋:pipeline的目的就是當設置不同的參數時組合幾個可以一起交叉驗證的步驟。所以可以使用組合這幾個步驟的名字和它們的屬性參數(不過需要在參數前面加_來連接)。


參數:Parameters:
steps: list :
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.


註釋:參數steps是一個list,list裏面是一個個(name,transform)格式的tuple。最後一個tuple是估計函數(就是我們訓練的模型類型)。而前面的tuple就是交叉驗證的步驟。








下面給出官網的一個例子:
#!/usr/env/bin python 
# -*- coding:utf-8 -*-
from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
# generate some data to play with


#
X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)
print X
print y


# ANOVA SVM-C
anova_filter = SelectKBest(f_regression, k=5)
print anova_filter
clf = svm.SVC(kernel='linear')#確定選擇的模型
anova_svm = Pipeline([('anova', anova_filter), ('svc', clf)])
# You can set the parameters using the names issued
# For instance, fit using a k of 10 in the SelectKBest
# and a parameter 'C' of the svm
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)#可以使用‘_’符號直接鏈接某個屬性
print anova_svm.named_steps  #實際上是一個字典
print type(anova_svm)
prediction = anova_svm.predict(X)
score=anova_svm.score(X,y)
print prediction,type(prediction)
print score
           
輸出結果如下:
X [[-2.70323229  0.67787532 -0.65407568 ...,  0.18958162  0.50109417
   2.41185611]
 [-0.30777823  0.21915033  0.24938368 ...,  0.64548418  0.74625357
   1.33408391]
 [-0.25737654 -1.66858407  0.39922312 ...,  0.61351797  0.12003133
  -0.22989455]
 ..., 
 [-0.01530985  0.5792915   0.11958037 ..., -1.47891157  0.39180401
   0.21434039]
 [-1.33123295 -1.83620537  0.50799133 ...,  0.95670232  0.70810868
  -2.14387014]
 [-1.31183623 -1.06511366 -0.3052247  ...,  0.55781031  1.39020755
  -1.58909265]]
Y [1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1
 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1
 0 0 1 0 0 0 1 1 1 1 1 0 1 1 0 1 1 1 1 0 0 0 1 0 1 1]
anova_filter: SelectKBest(k=5, score_func=<function f_regression at 0xaa05e9c>)
anova_svm.named_steps: {'svc': SVC(C=0.1, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='linear', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False), 'anova': SelectKBest(k=10, score_func=<function f_regression at 0xaa05e9c>)}
type(anova_svm)= <class 'sklearn.pipeline.Pipeline'>
prediction= [0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1
 0 1 0 0 1 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 0 0
 1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1] <type 'numpy.ndarray'>
score= 0.77


上面用到了幾個方法:
set_params(**params)  設置步驟name的屬性值
predict(*args, **kwargs) Applies transforms to the data, and the predict method of the final estimator.  預測估計值
score(*args, **kwargs) Applies transforms to the data, and the score method of the final estimator.   對最終的結果進行評分。
發佈了45 篇原創文章 · 獲贊 31 · 訪問量 24萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章