現有鳶尾花數據集iris.csv。Iris數據集是常用的分類實驗數據集,由Fisher, 1936收集整理。Iris也稱鳶尾花卉數據集,是一類多重變量分析的數據集。數據集包含150個數據集,分爲3類,每類50個數據,每個數據包含4個屬性。可通過花萼長度,花萼寬度,花瓣
長度,花瓣寬度4個屬性預測鳶尾花卉屬於(Setosa,Versicolour,Virginica)三個種類中的哪一類。
具體要求:
- 使用邏輯迴歸模型訓練鳶尾花數據集,測試集取20%,訓練集取80%。
- 先對數據進行標準化後,分別採用多項式的次數爲1-9進行訓練,solver和multi_class請自行選擇。
- 分別在控制檯打印出多項式次數爲1-9時,該模型在測試集上預測出準確分類的正確率。
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
if __name__ == "__main__":
path = 'D://Ml_Lab_Data/iris.csv' # 數據文件路徑
data = pd.read_csv(path, header=None)
X, Y = np.split(data, (4,), axis=1)
le = LabelEncoder()
le.fit(Y)
Y = le.transform(Y)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=1)
# 標準化特徵值
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
for i in range(1, 10):
model = make_pipeline(PolynomialFeatures(degree=i),
LogisticRegression(solver='sag', multi_class='multinomial', max_iter=10000))
model.fit(X_train, Y_train)
acc = model.score(X_test_std, Y_test)
print((i,acc * 100))