4-5 KNN的超參數,k\method\p

目錄

超參數和模型參數

尋找最好的k

考慮距離作爲投票權重的KNN---超參數weights=[uniform, distance]

曼哈頓距離和歐氏距離---超參數p,定義了計算距離的公式; 其中, p=1是曼哈頓,p=2是歐式

2. 網格搜素以及kNN中的更多超參數

網格搜索


超參數和模型參數

超參數: 在算法運行前需要決定的參數

模型參數: 算法過程中學習的參數

 

KNN沒有模型參數

k是KNN的超參數

 

尋找好的超參數: 領域知識, 經驗數值, 實驗搜索(選實驗效果最好的)


尋找最好的k

設置一個超參數的範圍

注意如果超參數取在了範圍的邊界,應該再擴大邊界進行搜索

from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets

digits = datasets.load_digits()

X = digits.data
y = digits.target

print(X.shape)
print(y.shape)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)


from sklearn.neighbors import KNeighborsClassifier


best_score = 0.0
best_k = -1
for k in range(1, 11):
    KNN_classifier = KNeighborsClassifier(n_neighbors=k)
    KNN_classifier.fit(X_train, y_train)
    score = KNN_classifier.score(X_test, y_test)
    if score > best_score:
        best_score = score
        best_k = k
print(best_k)
print(best_score)

 


考慮距離作爲投票權重的KNN

也可以解決平票的問題

best_score = 0.0
best_k = -1
best_method = " "
for method in ["uniform", "distance"]:
    for k in range(1, 11):
        KNN_classifier = KNeighborsClassifier(n_neighbors=k, weights=method)
        KNN_classifier.fit(X_train, y_train)
        score = KNN_classifier.score(X_test, y_test)
        if score > best_score:
            best_score = score
            best_k = k
            best_method = method
print(best_k)
print(best_score)
print(best_method)

曼哈頓距離和歐氏距離

紅紫黃三條線都是曼哈頓距離.三者相等

綠色的線是歐氏距離

明科夫斯基距離:

獲得了新的超參數p(只有在weights = distance的情況下才有意義)


2. 網格搜素以及kNN中的更多超參數

網格搜索

(1) 定義網格,字典的形式,key: 參數名稱; value: 參數的所有可能取值

(2) 定義算法

(3) 調用sklearn中的網格搜索, 傳入參數---算法\網格 

python中不是用戶傳入的參數,而是根據用戶傳入的參數, 類自己計算出來的結果, 對與這樣的參數命名時在名字後面加下劃線 

from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets

digits = datasets.load_digits()

X = digits.data
y = digits.target

print(X.shape)
print(y.shape)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)

# 自己寫的參數搜索
# from sklearn.neighbors import KNeighborsClassifier
#
#
# best_score = 0.0
# best_k = -1
# best_method = " "
# for method in ["uniform", "distance"]:
#     for k in range(1, 11):
#         KNN_classifier = KNeighborsClassifier(n_neighbors=k, weights=method)
#         KNN_classifier.fit(X_train, y_train)
#         score = KNN_classifier.score(X_test, y_test)
#         if score > best_score:
#             best_score = score
#             best_k = k
#             best_method = method
# print(best_k)
# print(best_score)
# print(best_method)


"""sklearn中的網格搜索"""
# 定義網格
para_grid = [
    {
        'weights': ['uniform'],
        'n_neighbors': [i for i in range(1, 11)]
    },
    {
        'weights': ['distance'],
        'n_neighbors': [i for i in range(1, 11)],
        'p': [i for i in range(1, 6)]
    }
]

# 定義算法
knn_classifier = KNeighborsClassifier()

# 網格搜索
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(knn_classifier, para_grid, n_jobs=-1, verbose=2)
grid_search.fit(X_train, y_train)#對於所有的參數尋找最佳模型

print(grid_search.best_estimator_)
print(grid_search.best_score_)
print(grid_search.best_params_)


# 拿到最佳模型
knn_classifier = grid_search.best_estimator_
print(knn_classifier.score(X_test, y_test))

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章