KNN算法知識集

數學知識:
李航《統計學習方法》,敘述了K鄰近算法,K鄰近模型和它的三要素(距離度量、K值、分類決策規則),然後講解了算法實現的數據結構——kd樹,和基於這個樹的搜索kd樹算法。

一些數學細節的補充:
https://www.cnblogs.com/eyeszjwang/articles/2429382.html
講解了Kd樹的原理、例子和僞代碼。

在python上的實現:
https://zhuanlan.zhihu.com/p/23191325
介紹了sk庫實現的KNeighborsClassifier類,它的參數,主要函數等

一個python例子:
在jupyter notebook中操作的,且所用數據集爲《機器學習實戰》KNN算法部分的。

KNN算法

將圖像(黑白)轉爲一維數組

import numpy as np
def re_shape(filename):
    return_matrix = np.zeros((1,1024))
    with open(filename) as inf:
        for i in range(32):
            row = inf.readline()
            for n in range(32):
                return_matrix[0,32*i+n] = int(row[n])
    return return_matrix[0]  

獲得類別(文件名)

from sklearn.neighbors import KNeighborsClassifier
import os
labels = []
file_forder = "E:\\DataMining\\Project\\MLBook\\機器學習實戰源代碼\\machinelearninginaction\\Ch02\\digits\\trainingDigits"
trainingFileList = os.listdir(file_forder)
#print(trainingFileList)
for name in trainingFileList:
    labels.append(name.split("_")[0])

獲得訓練數據

X_train = []
for name in trainingFileList:
    fileneme = os.path.join(file_forder,name)
    row = re_shape(fileneme)
    X_train.append([n for n in row])
#print(X_train[:5])

獲得測試數據類別

testLabels = []
file_forder = "E:\\DataMining\\Project\\MLBook\\機器學習實戰源代碼\\machinelearninginaction\\Ch02\\digits\\testDigits"
testFileList = os.listdir(file_forder)
#print(trainingFileList)
for name in testFileList:
    testLabels.append(name.split("_")[0])

獲得測試數據

X_test = []
for name in testFileList:
    fileneme = os.path.join(file_forder,name)
    row = re_shape(fileneme)
    X_test.append([n for n in row])
clf = KNeighborsClassifier(n_neighbors=1, weights='uniform', algorithm='auto', p=2, metric='minkowski', metric_params=None)
clf.fit(X_train,labels)
Y_prected = clf.predict(X_test)

進行評估

from sklearn.metrics import accuracy_score
score = accuracy_score(Y_prected,testLabels)
print("When k is 5,the accuracy score is {}".format(score))
When k is 5,the accuracy score is 0.9809725158562368
score = accuracy_score(Y_prected,testLabels)
print("When k is 1,the accuracy score is {}".format(score))
When k is 1,the accuracy score is 0.9862579281183932
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章