20200203_knn分類算法

這是國外大哥的一個單子，總的來說沒有什麼技術難點
In this homework, you will develop a model t0 predict whether a given ca gets high or low gasmileage based on the Auto data set.

在本作業中，您將開發一個模型來預測給定的ca是高還是低的汽油里程，基於Auto數據集

import numpy as np
import pandas as pd
%matplotlib inline

#讀取數據
test=pd.read_csv('Auto.csv')
#展示數據前5行
test.head()

	mpg	cylinders	displacement	horsepower	weight	acceleration	year	origin	name
0	18.0	8	307.0	130	3504	12.0	70	1	chevrolet chevelle malibu
1	15.0	8	350.0	165	3693	11.5	70	1	buick skylark 320
2	18.0	8	318.0	150	3436	11.0	70	1	plymouth satellite
3	16.0	8	304.0	150	3433	12.0	70	1	amc rebel sst
4	17.0	8	302.0	140	3449	10.5	70	1	ford torino

#數據信息展示
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 397 entries, 0 to 396
Data columns (total 9 columns):
mpg             397 non-null float64
cylinders       397 non-null int64
displacement    397 non-null float64
horsepower      397 non-null object
weight          397 non-null int64
acceleration    397 non-null float64
year            397 non-null int64
origin          397 non-null int64
name            397 non-null object
dtypes: float64(3), int64(4), object(2)
memory usage: 28.0+ KB

#由於有缺失值，所以將缺失值刪除
test.replace('?',np.nan,inplace = True)
test.dropna(inplace=True)

#強制轉換爲int類型
test['horsepower']=test['horsepower'].astype('int')

Create a binary variable. mpg01. that contains a 1 if mpg contains a value above its median, and a0 if mpg contains a value below its median. Y ou can compute the median using themedian0) function. 10 points Explore the data graphically in order to investigate the association between mpg01 and theother features. Which of the other features seem most likely to be useful in predicting mpg01?Scatterplots and boxplots may be useful tools to answer this question. Describe your findings.
(A)創建一個二進制變量。mpg 01。如果mpg包含高於中位數的值，則包含1；如果mpg包含低於中位數的值，則爲0。你可以用梅迪安0)函數來計算中值。10點以圖形方式研究mpg 01與其他特徵之間的關聯。在預測mpg 01時，其他哪些功能似乎最有用？散亂圖和盒圖可能是回答這個問題的有用工具。描述你的發現。

#查看他的中位數
test['mpg'].median()

#編寫函數，分割類別變量
def function(x):
    if x>23.0:
        return 1
    else:
        return 0
test['mpg01']=test['mpg'].apply(lambda x: function(x))

#查看相關性高低
test.corr()

import seaborn as sns
g = sns.pairplot(test, hue='mpg01', palette='seismic', diag_kind = 'kde',diag_kws=dict(shade=True))
g.set()

將數據分爲訓練集和測試集

from sklearn.model_selection import train_test_split
# 使用train_test_split方法，劃分訓練集和測試集，指定80%數據爲訓練集，20%爲測試集
x=test.drop(['mpg01','mpg','name'],axis=1)
y=test['mpg01']
X_train, X_test, y_train, y_test = train_test_split(x,y, test_size=0.2)

Perform LDA on the training data in order to predict mpg01 using the variables that seemed
most associated with mpg01 in (b). What is the test error of the model obtained?

對訓練數據進行LDA，使用(b)中與mpg01關聯最大的變量來預測mpg01，得到的模型的測試誤差是多少?(15分)

test.info()

#導入包
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
numerical=['weight']
X_train1=X_train[numerical]
X_test1=X_test[numerical]
lda = LinearDiscriminantAnalysis(n_components=1)
lda.fit(X_train1, y_train)
print(lda.score(X_test1, y_test)) #score是指分類的正確率

Perform QDA on the training data in order to predict mpg01 using the variables that seemed
most associated with mpg01 in (b). What is the test error of the model obtained?

對訓練數據進行QDA，使用(b)中與mpg01關聯最大的變量來預測mpg01，得到的模型的測試誤差是多少?(15分)

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
Qda = QuadraticDiscriminantAnalysis()
Qda.fit(X_train1, y_train)
print(Qda.score(X_test1, y_test)) #score是指分類的正確率

Perform logistic regression on the training data in order to predict mpg01 using the variables
that seemed most associated with mpg01 in (b). What is the test error of the model obtained?

對訓練數據進行邏輯迴歸，使用(b)中與mpg01關係最密切的變量來預測mpg01，得到的模型檢驗誤差是多少?

numerical=['weight','cylinders']
X_train1=X_train[numerical]
X_test1=X_test[numerical]
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr = lr.fit(X_train1,y_train)

from sklearn.metrics import classification_report
print('----------------Train Set----------------------')
print(classification_report(y_train, lr.predict(X_train1)))
print('----------------test set----------------------')
print(classification_report(y_test, lr.predict(X_test1)))

Perform KNN on the training data, with several values of K, in order to predict mpg01. Use
only the variables that seemed most associated with mpg01 in (b). What test errors do you
obtain? Which value of K seems to perform the best on this data set?

對訓練數據執行幾個K值的KNN，以預測mpg01。只使用(b)中與mpg01關聯最大的變量。你得到了什麼測試錯誤?K的哪個值在這個數據集中表現最好?

from sklearn.neighbors import KNeighborsClassifier
# K參數選項
neighbors=range(1,30)
# 準確率
numerical=['weight']
X_train1=X_train[numerical]
X_test1=X_test[numerical]
knn_acc=[]
# 嘗試neighbors中所列舉的所有K選項，使用KNeighborsClassifier模型做多次訓練。
# 針對每種K值情況計算一次在測試集上的準確率，打印每次訓練所獲得的準確率，並將每次準確率結果添入列表knn_acc中。
for i in neighbors:
    model = KNeighborsClassifier(n_neighbors=i)
    model.fit(X_train1, y_train)
    knn_acc.append(model.score(X_test1, y_test))
print(knn_acc)

import matplotlib.pyplot as plt

plt.plot(neighbors,knn_acc, label='AUC')

20200203_knn分類算法

linux安裝cuda和cudnn

模擬手機設備：使用 Playwright 實現移動端自動化測試

Mellanox網卡開啓SR-IOV

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

我宣佈，這是我找到的史上AI最全論文體系！

20200308——多項式迴歸預測工資

20191226_2_淘寶乒乓球商品分析

20200203_knn分類算法

深度之眼_Week2 編程作業1_梯度下降

機器學習作業班_python實現支持向量機

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結