最近在讀 Hands-On Machine Learning with Scikit-Learn & TensorFlow 這本書,在學到pipeline的時候,我模仿者寫了這也的代碼:
[python] view plain copy
- num_attribs=list(housing_numerical)
- cat_attribs=["ocean_proximity"]
- num_pipeline=Pipeline([
- ("selector",DataFrameSelector(num_attribs)),
- ("imputer",Imputer(strategy="median")),
- ("attribs_adder",CombinedAttributesAdder()),
- ("std_scaler",StandardScaler()),
- ])
- cat_pipeline=Pipeline([
- ("selector",DataFrameSelector(cat_attribs)),
- 'label_binarizer', LabelBinarizer()),
- ])
- full_pipeline=FeatureUnion(transformer_list=[
- ("num_pipeline",num_pipeline),
- ("cat_pipeline",cat_pipeline),
- ])
但是會報錯如下:
[plain] view plain copy
- TypeError: fit_transform() takes 2 positional arguments but 3 were given
我想,這應該是版本更新引起的問題,果然我在這裏找到了答案。以下爲引用:
The pipeline is assuming LabelBinarizer's fit_transform
method is defined to take three positional arguments:
def fit_transform(self, x, y)
...rest of the code
while it is defined to take only two:
def fit_transform(self, x):
...rest of the code
所以,解決方法就是,自己寫一個根據LabelBinarizer寫一個MyLabelBinarizer,可以有三個參數self,X,y=None.
from sklearn.base import TransformerMixin #gives fit_transform method for free
class MyLabelBinarizer(TransformerMixin):
def __init__(self, *args, **kwargs):
self.encoder = LabelBinarizer(*args, **kwargs)
def fit(self, x, y=0):
self.encoder.fit(x)
return self
def transform(self, x, y=0):
return self.encoder.transform(x)