代碼報錯處:
#---------------------------------------------------modify the parameter------------------------------------------------
range_m = np.logspace(2, 6, 5, base = 2).astype(int)
best_m = 0
min_scores = 10000
scores_m = []
for m in range_m:
kf = KFold(n_splits=5,shuffle=True)
clf = RandomForestClassifier(n_estimators = 1000 ,max_depth = m,random_state = 4)
scores = 0
for train_index, test_index in kf.split(X_train):
#print("Train:", train_index, "Validation:",test_index)
clf.fit(X_train[train_index], Y_train[train_index])
# pred = clf.predict(X_train[test_index])
# scores += log_loss(Y_train[test_index], pred) / 5
# scores_m.append(scores)
# if scores < min_scores:
# min_scores = scores
# best_m = m
#
# print(best_m, min_scores) # 打印隨機森林的樹的最佳數量和其損失值
# print(scores_m) # 打印不同數量樹的隨機森林模型的損失值
錯誤提示:
KeyError: "None of [Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,\n ...\n 826, 828, 829, 830, 831, 833, 834, 835, 836, 837],\n dtype='int64', length=670)] are in the [columns]"
解決方案:
很明顯索引出現問題,數據框DataFrame有兩種新的索引方式:
.iloc[index,:]
,其中index是索引位置.loc[:,'']
,其中’ '中爲列名
選擇一種方式:
clf.fit(X_train.iloc[train_index,:], Y_train.iloc[train_index,:])