KNN算法知識集

原創

baoFeng_Li

2020-05-30 21:25

數學知識：
李航《統計學習方法》，敘述了K鄰近算法，K鄰近模型和它的三要素（距離度量、K值、分類決策規則），然後講解了算法實現的數據結構——kd樹，和基於這個樹的搜索kd樹算法。

一些數學細節的補充：
https://www.cnblogs.com/eyeszjwang/articles/2429382.html
講解了Kd樹的原理、例子和僞代碼。

在python上的實現:
https://zhuanlan.zhihu.com/p/23191325
介紹了sk庫實現的KNeighborsClassifier類，它的參數，主要函數等

一個python例子：
在jupyter notebook中操作的，且所用數據集爲《機器學習實戰》KNN算法部分的。

KNN算法

將圖像（黑白）轉爲一維數組

import numpy as np
def re_shape(filename):
    return_matrix = np.zeros((1,1024))
    with open(filename) as inf:
        for i in range(32):
            row = inf.readline()
            for n in range(32):
                return_matrix[0,32*i+n] = int(row[n])
    return return_matrix[0]

獲得類別（文件名）

from sklearn.neighbors import KNeighborsClassifier
import os
labels = []
file_forder = "E:\\DataMining\\Project\\MLBook\\機器學習實戰源代碼\\machinelearninginaction\\Ch02\\digits\\trainingDigits"
trainingFileList = os.listdir(file_forder)
#print(trainingFileList)
for name in trainingFileList:
    labels.append(name.split("_")[0])

獲得訓練數據

X_train = []
for name in trainingFileList:
    fileneme = os.path.join(file_forder,name)
    row = re_shape(fileneme)
    X_train.append([n for n in row])
#print(X_train[:5])

獲得測試數據類別

testLabels = []
file_forder = "E:\\DataMining\\Project\\MLBook\\機器學習實戰源代碼\\machinelearninginaction\\Ch02\\digits\\testDigits"
testFileList = os.listdir(file_forder)
#print(trainingFileList)
for name in testFileList:
    testLabels.append(name.split("_")[0])

獲得測試數據

X_test = []
for name in testFileList:
    fileneme = os.path.join(file_forder,name)
    row = re_shape(fileneme)
    X_test.append([n for n in row])

clf = KNeighborsClassifier(n_neighbors=1, weights='uniform', algorithm='auto', p=2, metric='minkowski', metric_params=None)
clf.fit(X_train,labels)
Y_prected = clf.predict(X_test)

進行評估

from sklearn.metrics import accuracy_score
score = accuracy_score(Y_prected,testLabels)
print("When k is 5,the accuracy score is {}".format(score))

When k is 5,the accuracy score is 0.9809725158562368

score = accuracy_score(Y_prected,testLabels)
print("When k is 1,the accuracy score is {}".format(score))

When k is 1,the accuracy score is 0.9862579281183932

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

KNN算法知識集

KNN算法

決策樹學習之概念理解和代碼實現

Python pandas模塊之Dataframe操作彙集

他山之石——後綴樹

《python數據挖掘入門與實踐》決策樹預測nba數據集

基於Aprion算法的電影推薦

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結