問題起源:在pipeline轉換時,突然出現one_hot向量輸出維度爲第2維爲1。通過多次驗證,發現第2維都是2而非1?問題在哪裏呢,最後發現第2維爲2和第2維爲1是輸入不同導致的?
case1:
y = np.array([1,0,1,1]).reshape(-1,1)
json_pritimive = "sklearn.preprocessing.OneHotEncoder"
obj = AIPrimitive()
obj.load_from_json(json_pritimive)
obj.fit(X_fit=y)
obj.produce(X_produce=y)
sklearn.preprocessing.OneHotEncoder fit start
sklearn.preprocessing.OneHotEncoder fit over
sklearn.preprocessing.OneHotEncoder produce start
sklearn.preprocessing.OneHotEncoder produce over
array([[0., 1.],
[1., 0.],
[0., 1.],
[0., 1.]])
case2:
y = np.array([1,1,1,1]).reshape(-1,1)
json_pritimive = "sklearn.preprocessing.OneHotEncoder"
obj = AIPrimitive()
obj.load_from_json(json_pritimive)
obj.fit(X_fit=y)
obj.produce(X_produce=y)
sklearn.preprocessing.OneHotEncoder fit start
sklearn.preprocessing.OneHotEncoder fit over
sklearn.preprocessing.OneHotEncoder produce start
sklearn.preprocessing.OneHotEncoder produce over
array([[1.],
[1.],
[1.],
[1.]])
case3:
y = np.array([1,1,2,0]).reshape(-1,1)
json_pritimive = "sklearn.preprocessing.OneHotEncoder"
obj = AIPrimitive()
obj.load_from_json(json_pritimive)
obj.fit(X_fit=y)
obj.produce(X_produce=y)
sklearn.preprocessing.OneHotEncoder fit start
sklearn.preprocessing.OneHotEncoder fit over
sklearn.preprocessing.OneHotEncoder produce start
sklearn.preprocessing.OneHotEncoder produce over
array([[0., 1., 0.],
[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.]])
從這裏可以看出axis=1的大小是由類別數量決定的,也符合one_hot向量的本質。我的理解偏離本質,固化的認爲第2維大小爲2,而不是其他值。