python機器學習及實踐-第一章

原創

2019-03-03 21:10

癌症預測問題代碼詳解
讀取文件

import pandas as pd
#pandas庫有一個read_csv的函數 可以讀取.csv文件
df_train=pd.read_csv('../Desktop/python/Datasets/Breast-Cancer/breast-cancer-train.csv')
df_test=pd.read_csv('../Desktop/python/Datasets/Breast-Cancer/breast-cancer-test.csv')
#地址兩個注意：1./不是\2.按照jupyter來 不可以按照本機來
df_test_negtive=df_test.loc[df_test['Type']==0][['Clump Thickness','Cell Size']]
df_test_positive=df_test.loc[df_test['Type']==1][['Clump Thickness','Cell Size']]
#loc第一個參數可以使用布爾型作爲下標，選擇那些爲True or False對應的值，第二個參數可以選擇一些列

繪圖

import matplotlib.pyplot as plt
plt.scatter(df_test_negtive['Clump Thickness'],df_test_negtive['Cell Size'],marker='o',s=200,c='red')
plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'],marker='x',s=200,c='black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

隨機生成直線

intercept=np.random.random([1])#生成一個0-1之間的一維隨機數
coef=np.random.random([2])#生成2個0-1之間的一維隨機數
lx=np.arange(0,12)#lx是一個有0-12組成的元組
ly=(-intercept-lx*coef[0])/coef[1]#一個隨機直線的表示
plt.plot(lx,ly,c='yellow')
plt.scatter(df_test_negtive['Clump Thickness'],df_test_negtive['Cell Size'],marker='o',s=200,c='red')
plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'],marker='x',s=200,c='black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

scatter與plot的區別：scatter繪製散點圖，plot繪製經過點的直線。

邏輯迴歸分類（使用前10個小樣本）

from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()#使用sklearn線性迴歸模塊的邏輯迴歸函數
lr.fit(df_train[['Clump Thickness','Cell Size']][:10],df_train['Type'][:10])
print('Testing accuracy(10 training samples):',lr.score(df_test[['Clump Thickness','Cell Size']],df_test['Type']))
intercept=lr.intercept_
coef=lr.coef_[0,:]
ly=(-intercept-lx*coef[0])/coef[1]
plt.plot(lx,ly,c='green')
plt.scatter(df_test_negtive['Clump Thickness'],df_test_negtive['Cell Size'],marker='o',s=200,c='red')
plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'],marker='x',s=150,c='black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

根據邏輯迴歸函數得到的截距，來調整分類直線。

邏輯迴歸分類（使用所有數據）

lr=LogisticRegression()
lr.fit(df_train[['Clump Thickness','Cell Size']],df_train['Type'])
print('Testing accuracy(all training samples):',lr.score(df_test[['Clump Thickness','Cell Size']],df_test['Type']))
intercept=lr.intercept_
coef=lr.coef_[0,:]
ly=(-intercept-lx*coef[0])/coef[1]
plt.plot(lx,ly,c='blue')#plot 和show的區別
plt.scatter(df_test_negtive['Clump Thickness'],df_test_negtive['Cell Size'],marker='o',s=200,c='red')
plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'],marker='x',s=150,c='black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

可以看出達到了較好的分類效果。

python機器學習及實踐的學習收穫：

1.if代碼塊在命令行中，除了...之外，需要自己手動縮進。

2.python三大數據類型：元組，列表，字典。元組L=（）；列表L=[ ];字典L={key:value} 其中列表允許訪問的時候進行修改，而元組一旦確定，無法修改。除此之外，in 針對元組，列表，字典中的key,不針對value。無論元組，列表，字典，下標訪問都是用[]。

3.定義函數使用def。

4.sklearn庫包含了迴歸、分類、聚類等函數。可以直接調用。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python機器學習及實踐-第一章

[leetcode 83]刪除排序鏈表中的重複元素

python 負數開平方根精度控制

python換行

用tensorflow實現minist手寫數字識別

[leetcode 541]反轉字符串

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結