特點

優點：精度高、對異常之不敏感、無數據輸入假定
缺點：計算複雜度高、空間複雜度高
使用數據範圍：數值型和標稱型

說明

k-NN沒有顯示的學習過程，是一lazy learning的代表，因爲它有了數據集後沒有所謂的“訓練階段”，並不首先對數據集中的樣本進行處理和學習。

原理

存在由許多樣本構成的數據集，包括許多特徵值及其向量還有標籤和標籤值：

    樣本數據集（training set）：{features:x, labels:y}

這樣樣本空間的維度就是features的個數，然後每個training sample就在樣本空間中有一個對應座標（向量）和其對應的label
給定測試樣本t，只包含features信息，我們用某種方法計算出測試t和所有training sample的距離，將距離按照遠近排序，然後選擇所有距離中離得最近的前k個“鄰居”,用前k個鄰居的標籤進行投票，哪個標籤票數最高就認爲測試樣本t就屬於該標籤。

def classify0(inX, dataSet, labels, k): #程序中數據集包括 {dataSet,labels}
    dataSetSize = dataSet.shape[0]     #the m of matrix dateSet = #training sample [1]
    #將測試樣本inX“鋪”爲和dataSet一樣的mxn矩陣，然後將測試樣本t和所有訓練樣本作差（以矩陣的方式）    
    diffMat = tile(inX, (dataSetSize,1)) - dataSet  
    sqDiffMat = diffMat**2      #determine Euclid distance - squared
    sqDistances = sqDiffMat.sum(axis=1)    #determine Euclid distance - summation along axis 1
    distances = sqDistances**0.5 #root of summation
    sortedDistIndicies = distances.argsort()     #重排序，爲了方便找最近的k個neighbor
    classCount={}       #字典方便統計k個neighbor的投票
    for i in range(k): #投票
        voteIlabel = labels[sortedDistIndicies[i]]
        classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1 #[2]
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)#[3]
    return sortedClassCount[0][0]

reference

[1] numpy.arrary.shape、tile全都是numpy包的元素
[2] dict字典的get 方法：
get(key,[default])

Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
如果鍵值key沒有值，那麼返回給的[default]值
classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1
比如例子裏面如果沒有查詢到voteIlabel的對應鍵值就返回給定的參考值0

[3] [python-sorted]
sorted(iterable, *, key=None, reverse=False)

Return a new sorted list from the items in iterable.
將classCount字典分解爲元組列表，然後使用operator.itemgetter方法按照第二個元素的次序對元組進行排序，排序爲逆序（從大到小），然後返回發生頻率最高的元素標籤

參考書籍：機器學習實戰、西瓜書

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

kNN簡介

特點

說明

原理

reference

AI 畫圖真刺激，手把手教你如何用 ComfyUI 來畫出刺激的圖

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

kNN簡介

mean teacher

正則例子

[practice]cpp primer 1 script

【cpp】引用、指針、const

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結