開發平臺 google colab + python3.6
package: panads,sklearn
panads 用來處理csv文件 教程鏈接
sklearn 是python 機器學習中常用的第三方模塊 教程鏈接
knn 講解以及使用 sklearn的教程鏈接
還是和以前一樣 先處理colab的文件夾掛載問題
from google.colab import drive
drive.mount('/content/drive/')
import os
os.chdir("/content/drive/My Drive/kaggle")
再導入要用的package (matplotlib和seaborn沒有用到,習慣性導入這些)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
#將那些用matplotlib繪製的圖顯示在頁面裏而不是彈出一個窗口
%matplotlib inline
np.random.seed(2)
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
import itertools
導入數據
train_data = pd.read_csv("mnist/train.csv")
test_data = pd.read_csv("mnist/test.csv")
x_train = train_data.values[:,1:]
y_train = train_data.values[:,0]
test_value = test_data.values
定義knn算法
def knnClassfiyer(value,lable):
knnclf = KNeighborsClassifier()
knnclf.fit(value,np.ravel(lable))
return knnclf
訓練和預測
knnclf = knnClassfiyer(x_train,y_train)
test_label = knnclf.predict(test_value)
以下是獲取到的test_label(predict時間很長)
保存模型(kaggle需要交csv 包含ImageId,label)
test_label = pd.Series(test_label,name="Label")
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),test_label],axis = 1)
submission.to_csv("mnist/Result_sklearn_KNN.csv",index=False)
使用kaggle api 提交 (需要先將kaggle.json 放入root下,可以參考colab和kaggle使用)
!cp /content/drive/'My Drive'/kaggle/kaggle.json /root/.kaggle
!kaggle competitions submit -c digit-recognizer -f mnist/Result_sklearn_KNN.csv -m "forth submit"
運行完會有Successfully submitted to Digit Recognizer,再到kaggle 中My submissions 中查看