KNeighborsClassifier 分類模型

sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None, **kwargs)[source]

KNeighborsRegressor迴歸模型

class sklearn.neighbors.KNeighborsRegressor(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None, **kwargs)

參數	n_neighbors	int型，可選，k的值
	weights	字符類型或callable類型，可選，含義： uniform：所有示例都一視同仁 distance：權重爲距離的倒數，距離近的對分類判決影響大 [callable]：用戶自定義函數，入參爲距離數組，出參爲權重數組
	algorithm	可選，含義： ball_tree：使用球體樹 kd_tree：使用kd樹 brute：暴力搜索 auto：根據fit函數入參決定最合適的方法注意：當輸入數據稀疏，會忽略並統一用brute
	leaf_size	int型，可選，默認值30 決定kd樹和球體樹的葉子節點大小，影響樹構建和查詢的速度，還有存儲大小
	p	integer，可選，默認值2 表示距離度量Minkowski 距離的p冪次參數，p=2 爲歐幾里得距離，p=1 爲曼哈頓距離
	metric	string或callable類型，默認爲minkowski，樹的距離度量
	metric_params	距離度量函數的其他關鍵參數，默認值None
	n_jobs	近鄰搜索的並行度，默認爲None，表示1；-1表示使用所有cpu
屬性	classes_	(n_classes,) 數組，label數，迴歸模型沒有
	effective_metric_	string或callable類型，距離度量機制，與metric對應
	effective_metric_params_	dict類型，度量函數的關鍵參數
	outputs_2d_	bool型，y的格式爲 (n_samples, ) 或 (n_samples, 1) 爲true，迴歸模型沒有

algorithm 和leaf_size 的選擇參考： Nearest Neighbors

注意：k近鄰算法，若第k個近鄰和第k+1個近鄰對目標x距離相同，但label不同，結果取決於訓練集的順序

使用示例

創建模型

X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3) #k爲3
neigh.fit(X, y)
print(neigh.predict([[1.5]]))
#輸出
[0]

print(neigh.classes_)
print(neigh.effective_metric_)
print(neigh.outputs_2d_)
輸出：
[0 1]
euclidean
False

X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsRegressor
neigh = KNeighborsRegressor(n_neighbors=2) #k爲2
neigh.fit(X, y)
print(neigh.predict([[1.5]]))
#輸出
[0.5]

算法爲auto，自動選擇是暴力還是kd樹、球體樹
葉子大小30
距離度量爲歐氏距離
並行度爲1
每個近鄰的權重一樣

模型方法

`fit`(self, X, y)	模型擬合，根據X訓練集和標註y
`get_params`(self[, deep])	獲得模型參數
`kneighbors`(self[, X, n_neighbors, …])	獲取某節點的k個近鄰
`kneighbors_graph`(self[, X, n_neighbors, mode])	計算X實例的k近鄰權重圖
`predict`(self, X)	預測X的類別
`predict_proba`(self, X)	返回X的概率估計，迴歸模型沒有
`score`(self, X, y[, sample_weight])	返回指定測試集的平均準確度
`set_params`(self, \\params)	設置模型參數

neigh.get_params()
輸出：
{'algorithm': 'auto',
 'leaf_size': 30,
 'metric': 'minkowski',
 'metric_params': None,
 'n_jobs': None,
 'n_neighbors': 3,
 'p': 2,
 'weights': 'uniform'}

print(neigh.kneighbors([[1.1]])) # 返回實例1.1最近3個近鄰和對應的距離
輸出：
(array([[0.1, 0.9, 1.1]]), array([[1, 2, 0]]))
#第一個array 是距離，第二個是對應的下標

A = neigh.kneighbors_graph(X,mode='connectivity')
A.toarray()
輸出：
array([[1., 1., 1., 0.],
       [1., 1., 1., 0.],
       [0., 1., 1., 1.],
       [0., 1., 1., 1.]])
#返回的是每個樣本和k近鄰的連通度，第一個樣本是[1., 1., 1., 0.]，表示k個近鄰中第1、2、3個樣本連通度爲1

A = neigh.kneighbors_graph(X,mode='distance')
A.toarray()
輸出：
array([[0., 1., 2., 0.],
       [1., 0., 1., 0.],
       [0., 1., 0., 1.],
       [0., 2., 1., 0.]])
# 每項是每個樣本與k近鄰的距離

neigh.predict([[1.1]])
#輸出
array([0])

neigh.predict_proba([[1.1]])
#輸出
array([[0.66666667, 0.33333333]])
因爲1.1 的3個近鄰爲0，1，2樣本，其中兩個是0，一個是1，所以0的概率是0.66666667

print(neigh.score([[1.1]],[1]))
print(neigh.score([[1.1]],[0]))
#輸出
0.0
1.0
計算模型在指定測試集的得分

knn分類模型在iris 數據集上的使用

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets

n_neighbors = 15

# import some data to play with
iris = datasets.load_iris()

# we only take the first two features. We could avoid this ugly
# slicing by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target

h = .02  # step size in the mesh

# Create color maps
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ListedColormap(['darkorange', 'c', 'darkblue'])

for weights in ['uniform', 'distance']:
    # we create an instance of Neighbours Classifier and fit the data.
    clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
    clf.fit(X, y)

    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, x_max]x[y_min, y_max].
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.figure()
    plt.pcolormesh(xx, yy, Z, cmap=cmap_light)

    # Plot also the training points
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold,
                edgecolor='k', s=20)
    plt.xlim(xx.min(), xx.max())
    plt.ylim(yy.min(), yy.max())
    plt.title("3-Class classification (k = %i, weights = '%s')"
              % (n_neighbors, weights))

plt.show()

knn迴歸模型使用

print(__doc__)

# Author: Alexandre Gramfort <[email protected]>
#         Fabian Pedregosa <[email protected]>
#
# License: BSD 3 clause (C) INRIA


# #############################################################################
# Generate sample data
import numpy as np
import matplotlib.pyplot as plt
from sklearn import neighbors

np.random.seed(0)
X = np.sort(5 * np.random.rand(40, 1), axis=0)
T = np.linspace(0, 5, 500)[:, np.newaxis]
y = np.sin(X).ravel()

# Add noise to targets
y[::5] += 1 * (0.5 - np.random.rand(8))

# #############################################################################
# Fit regression model
n_neighbors = 5

for i, weights in enumerate(['uniform', 'distance']):
    knn = neighbors.KNeighborsRegressor(n_neighbors, weights=weights)
    y_ = knn.fit(X, y).predict(T)

    plt.subplot(2, 1, i + 1)
    plt.scatter(X, y, color='darkorange', label='data')
    plt.plot(T, y_, color='navy', label='prediction')
    plt.axis('tight')
    plt.legend()
    plt.title("KNeighborsRegressor (k = %i, weights = '%s')" % (n_neighbors,
                                                                weights))

plt.tight_layout()
plt.show()