Notes_on_MLIA_kNN

原創

2018-08-27 03:39

# k-nearest neighbor algorithm
# function classify0
# arguments: 
# 	inX: the new observation which is to be labeled by the algorithm
#	dataSet: train sample
#	labels: label for train sample
#	k: k in knn
def classify0(inX, dataSet, labels, k):
	dataSetSize = dataSet.shape[0]
	diffMat = tile(inX, (dataSetSize, 1)) - dataSet
	sqDiffMat = diffMat**2
	sqDistances = sqDiffMat.sum(axis=1)
	distances = sqDistances**0.5
	sortedDistIndicies = distances.argsort() 
	classCount = {}
	for i in range(k):
		voteIlabel = labels[sortedDistIndicies[i]]
		classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
	sortedClassCount = sorted(classCount.iteritems(), key = operator.itemgetter(1), reverse=True)
	return sortedClassCount[0][0]

.shape用於計算array各維度的長度，在python中都是從0開始的。

tile函數是numpy包中的，用於重複array，比如上面代碼中的tile(inX,(dataSetSize,1))，表示重複inX，其行重複dataSetSize次，而列不重複

.sum是numpy中用於計算一個array內部行列求和，axis=1表示按列求和，即把每一行的元素加起來

.argsort是numpy中對array進行排序的函數，排序是升序

classCount = {} 其中{}表示生成的是字典，在字典這個類中，有方法get，對classCount元素賦值，其實是個計數器

sorted是內置函數，可以help(sorted)查看用法

operator模塊下的itemgetter函數，顧名思義就是提取第X個元素的意思

這段代碼裏給出了字典排序的經典方法，還可以使用lambda函數，來進行字典的排序，具體python中的排序方法可以參考：https://wiki.python.org/moin/HowTo/Sorting/

2.2 讀入txt文件的函數裏有一個小bug

def file2matrix(filename):
	fr = open(filename)
	arrayOLines = fr.readlines()
	numberOfLines = len(arrayOLines)
	returnMat = zeros((numberOfLines, 3))
	classLabelVector = []
	index = 0
	for line in arrayOLines:
		line = line.strip()
		listFromLine = line.split('\t')
		returnMat[index,:] = listFromLine[0:3]
		classLabelVector.append(int(listFromLine[-1]))
		index += 1
	return returnMat, classLabelVector

這裏用到了一個函數line.strip()，裏面沒有設置參數，會把'\t'也去掉，後面使用tab分割字符就會失效。要改成line.strip('/n')。而且丫循環那塊就沒寫冒號。

還有一個bug，是生成label標籤的時候，不能加int

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Notes_on_MLIA_kNN

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

CIFAR dataset

What is the Best Multi-Stage Architecture for Object Recognition?

Deep learning筆記

機器學習基石

單層非監督學習網絡分析

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結