【機器學習基礎教程2】監督學習算法——K近鄰

原創

2019-09-16 15:22

監督機器學習問題主要有兩種，分別叫作分類（classification）與迴歸（regression）。

監督學習和非監督學習，以及監督學習中的分類與迴歸在這裏不在贅述，請參考《機器學習》
1、K近鄰分類

應用sklearn裏的KNeighborsClassifier方法，使用不同的鄰居個數，對測試數據集進行評估

這裏的數據集使用sklearn裏的乳腺癌數據集

from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

cancer = datasets.load_breast_cancer()

# 分離訓練集、測試集
X_train, X_test, y_train, y_test = train_test_split(cancer['data'], cancer['target'], random_state=1)

test_accuracy = []

# n_neighbors取值從1到10
neighbors_settings = range(1, 11)

# 使用不同的鄰居值，獲得不同的測試精度
for n_neighbors in neighbors_settings:
    knn = KNeighborsClassifier(n_neighbors=n_neighbors)
    knn.fit(X_train, y_train)
    # 記錄精度
    test_accuracy.append(knn.score(X_test, y_test))

plt.plot(neighbors_settings, test_accuracy)
plt.xlabel("n_neighbor")
plt.ylabel("test_accuracy")
plt.show()

測試精度顯示

可以大略看出，鄰居數量5和6的時候，測試精度達到最高，但最低時仍是可以接受

2、K近鄰迴歸

KNeighborsRegressor方法，當鄰居大於1時，結果爲這些鄰居的平均值

此處採用自己構造的一維數據集，即距離海岸線的距離與房價的關係，數據集如下

house_price_dataSet = {
    'data': np.array([0.5, 0.8, 1.0, 1.4, 1.6, 1.8, 2.0, 2.1, 2.3, 2.5, 2.9, 3.2, 3.5, 3.9, 4.6, 5.0]).reshape(-1, 1),
    'target': np.array([3., 2.5, 2.8, 2.6, 2.7, 2.6, 2., 2., 1.6, 1.7, 1.6, 1.4, 1.2, 1.4, 1.3, 1.2]),
    'target_names': np.array(['price']),
    'feature_names': np.array(['distance'])
}

目標是完成對距離海岸線0-5km內的房價預測

from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsRegressor


# ['data', 'target', 'target_names', 'feature_names']
house_price_dataSet = {
    'data': np.array([0.5, 0.8, 1.0, 1.4, 1.6, 1.8, 2.0, 2.1, 2.3, 2.5, 2.9, 3.2, 3.5, 3.9, 4.6, 5.0]).reshape(-1, 1),
    'target': np.array([3., 2.5, 2.8, 2.6, 2.7, 2.6, 2., 2., 1.6, 1.7, 1.6, 1.4, 1.2, 1.4, 1.3, 1.2]),
    'target_names': np.array(['price']),
    'feature_names': np.array(['distance'])
}

# plt.scatter(house_price_dataSet['data'], house_price_dataSet['target'])
# plt.xlabel("Distance from the sea")
# plt.ylabel("housing price")
# plt.show()

# train_test_split函數
X_train, X_test, y_train, y_test = train_test_split(house_price_dataSet['data'], house_price_dataSet['target'],
                                                    random_state=0)


# # K臨近算法——迴歸
knn = KNeighborsRegressor(n_neighbors=2)
knn.fit(X_train, y_train)
x_new = np.array([[1.5]])
prediction = knn.predict(x_new)
print("prediction price:", prediction)

對1.5km的房價進行預測，輸出結果爲

prediction price: [2.65]

在圖中顯示爲

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【機器學習基礎教程2】監督學習算法——K近鄰

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

WebStorm 創建 Vue 項目

大齡程序員思考

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

【1.數據結構和算法學習目錄】

opencv多幀合成視頻

【Python機器學習基礎教程5】監督學習算法——決策樹

二刷leetcode——3Sum（雙指針法）

leetcode——刪除鏈表的倒數第N個節點

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結