Chapter1_Hands-On ML with sklearn & TF

原創

2019-05-12 18:32

首先測試一下如何用python進行基本的數據處理，用的是pandas模塊

import pandas as pd
import os
path=os.path.join("datasets","lifesat","")
path_oecd=path+"oecd_bli_2015.csv"
path_gdp=path+'gdp_per_capita.csv'
oecd_bli=pd.read_csv(path_oecd, thousands=',')
oecd_bli=oecd_bli[life_sat["INEQUALITY"]=="TOT"]
#此處已經將OECD數據的索引index設置爲Country
oecd_bli=oecd_bli.pivot(index="Country",columns="Indicator",values="Value")
gdp_per_capita=pd.read_csv(path_gdp,thousands=',',delimiter='\t',encoding='latin1',na_values="n/a")
gdp_per_capita.rename(columns={"2015":"GDP per capita"},inplace=True)
#將GDP數據的索引也設置爲Country
gdp_per_capita.set_index("Country",inplace=True)
#合併表格，根據索引值Country
full_country_stats=pd.merge(left=oecd_bli,right=gdp_per_capita,left_index=True,right_index=True)
full_country_stats.sort_values(by="GDP per capita",inplace=True)
#print(full_country_stats)
#print(full_country_stats['Life satisfaction'])
#print(full_country_stats[["GDP per capita","Life satisfaction"]])

TEST1：開始練習第一個簡單的機器學習例子，預測GDP與生活滿意度的關係

import os
path=os.path.join("datasets","lifesat","")

def prepare_country_stats(oecd_bli,gdp_per_capita):
    oecd_bli=oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
    oecd_bli=oecd_bli.pivot(index="Country",columns="Indicator",values="Value")
    gdp_per_capita.rename(columns={"2015":"GDP per capita"},inplace=True)
    gdp_per_capita.set_index("Country",inplace=True)
    full_country_stats=pd.merge(left=oecd_bli,right=gdp_per_capita,left_index=True,right_index=True)
    full_country_stats.sort_values(by="GDP per capita",inplace=True)
    remove_indices=[0,1,6,8,33,34,35]
    keep_indices=list(set(range(36))-set(remove_indices))
    #這裏面爲什麼要用兩個方括號？目前的理解是dataFrame的索引需要一個list作爲輸入，因此item=["GDP per capita","Life satisfaction"]，full_country_stats[item]
    return full_country_stats[["GDP per capita","Life satisfaction"]].iloc[keep_indices]

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn.linear_model
#load the data
path_oecd=path+"oecd_bli_2015.csv"
path_gdp=path+'gdp_per_capita.csv'
oecd_bli=pd.read_csv(path_oecd, thousands=',')
gdp_per_capita=pd.read_csv(path_gdp,thousands=',',delimiter='\t',encoding='latin1',na_values="n/a")
#prepare the data
country_stats=prepare_country_stats(oecd_bli,gdp_per_capita)
X=np.c_[country_stats["GDP per capita"]]
Y=np.c_[country_stats["Life satisfaction"]]

#Visualize the data
country_stats.plot(kind='scatter',x="GDP per capita",y="Life satisfaction")
#plt.show()

#select a linear model
model = sklearn.linear_model.LinearRegression()
#Train the model
model.fit(X,Y)
#Plot the model after training
b0,k0=model.intercept_[0],model.coef_[0]
x0=np.linspace(0,60000,500)
plt.plot(x0,b0+k0*x0,'k')
plt.show()
#Make a prediction for Cyprus
#如果只有一個方括號，或提示錯誤 ValueError: Expected 2D array, got 1D array instead:
#也是，一個方括號代表一維數組，兩個方括號代表兩維數組，那爲什麼要求兩位數組呢？
X_new=[[22587]]
print(model.predict(X_new))

[[ 5.96242338]]

Summary:

機器學習的典型流程：

數據預處理，使其格式化；
數據特徵研究；
根據數據特徵選擇合適的機器學習模型；
利用格式化的數據訓練模型；
利用訓練完成的模型進行數據預測。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Chapter1_Hands-On ML with sklearn & TF

Summary:

“error LNK2019: 無法解析的外部符號”原因分析

C++編譯時提示類型未定義（undefined）的可能原因

excel數據透視表與python中pandas使用pivot

Chapter1_Hands-On ML with sklearn & TF

ARDS患者如何進行肺復張

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結