python迴歸分析實戰——汽車銷量與什麼因素有關

原創

shiinerise

2020-02-21 15:50

用到的數據：
汽車銷售數據：https://pan.baidu.com/s/1VlTy4nfvgXdDzgimVguZMg

1 分析數據

1.1 導包，讀取數據

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression 
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error,r2_score
data = pd.read_csv('data.csv')
data.head()

	券代碼	日期	傳統汽車銷量	國內生產總值當季值(億元)x1	汽油價格（元/噸）x2	人民幣貸款基準利率%x3	汽車總產量（萬輛）x4	公路里程數	汽車整車股票指數	消費者信心指數
0	65	2003年Q1	102.1	29825.5	3020	5.49	102.1	177.6375	1696.81	97.700000
1	64	2003年Q2	110.0	32537.3	3020	5.49	110.0	178.7550	1912.54	87.666667
2	63	2003年Q3	112.1	35291.9	2920	5.49	112.1	179.8725	1803.71	92.333333
3	62	2003年Q4	122.8	39767.4	3010	5.49	122.8	180.9900	1922.48	94.666667
4	61	2004年Q1	131.1	34544.6	3210	5.49	131.1	182.5125	1930.71	95.333333

data.shape

(65, 10)

1.2 查看是否有缺失值

data.isnull().sum()

券代碼                0
日期                 0
傳統汽車銷量             0
國內生產總值當季值(億元)x1    0
汽油價格（元/噸）x2        0
人民幣貸款基準利率%x3       0
汽車總產量（萬輛）x4        0
公路里程數              1
汽車整車股票指數           0
消費者信心指數            0
dtype: int64

公路里程數有一個缺失值，用均值填充

data['公路里程數']=data['公路里程數'].fillna(data['公路里程數'].mean())

1.3 分析數據之間的相關性

利用pandas分析其相關關係，爲便於查看，只顯示矩陣的上側：

cormatrix = data.corr() 
cormatrix *= np.tri(*cormatrix.values.shape,k=-1).T
cormatrix

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

	券代碼	傳統汽車銷量	國內生產總值當季值(億元)x1	汽油價格（元/噸）x2	人民幣貸款基準利率%x3	汽車總產量（萬輛）x4	公路里程數	汽車整車股票指數	消費者信心指數
券代碼	0.0	-0.943772	-0.983360	-0.730000	0.434063	-0.950615	-0.907244	-0.833242	-0.865064
傳統汽車銷量	-0.0	0.000000	0.926678	0.738939	-0.392945	0.999144	0.881898	0.819730	0.806348
國內生產總值當季值(億元)x1	-0.0	0.000000	0.000000	0.703095	-0.464662	0.937453	0.865311	0.807799	0.868392
汽油價格（元/噸）x2	-0.0	0.000000	0.000000	0.000000	-0.059720	0.733278	0.784530	0.552330	0.609621
人民幣貸款基準利率%x3	0.0	-0.000000	-0.000000	-0.000000	0.000000	-0.410893	-0.165263	-0.388479	-0.539812
汽車總產量（萬輛）x4	-0.0	0.000000	0.000000	0.000000	-0.000000	0.000000	0.878815	0.820630	0.822135
公路里程數	-0.0	0.000000	0.000000	0.000000	-0.000000	0.000000	0.000000	0.824552	0.724907
汽車整車股票指數	-0.0	0.000000	0.000000	0.000000	-0.000000	0.000000	0.000000	0.000000	0.735544
消費者信心指數	-0.0	0.000000	0.000000	0.000000	-0.000000	0.000000	0.000000	0.000000	0.000000

從以上數據可以看出：

"券代碼"是數據的序號，應該去掉，
"日期"是汽車銷量沒有線性關係，也直接去除
汽車的是根據汽車的銷售量而定，汽車的生產量不能做爲汽車銷售量的預測特徵，因此也捨棄。

因此初步篩選，以下特徵進行建模：

X = data[['國內生產總值當季值(億元)x1', '汽油價格（元/噸）x2', '人民幣貸款基準利率%x3','公路里程數', '汽車整車股票指數', '消費者信心指數']]
Y = data['傳統汽車銷量']
X.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

	國內生產總值當季值(億元)x1	汽油價格（元/噸）x2	人民幣貸款基準利率%x3	公路里程數	汽車整車股票指數	消費者信心指數
0	29825.5	3020	5.49	177.6375	1696.81	97.700000
1	32537.3	3020	5.49	178.7550	1912.54	87.666667
2	35291.9	2920	5.49	179.8725	1803.71	92.333333
3	39767.4	3010	5.49	180.9900	1922.48	94.666667
4	34544.6	3210	5.49	182.5125	1930.71	95.333333

1.4 劃分訓練集和測試集

x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size = 0.2,random_state = 66)

2 標準化處理

ss = StandardScaler()
ss.fit(x_train)
x_train_ss = ss.transform(x_train)
x_test_ss = ss.transform(x_test)

3 構建模型

訓練

#構建模型
lr = LinearRegression()
lr.fit(x_train_ss,y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

預測

y_pred = lr.predict(x_test_ss)
 
#輸出mse值
mean_squared_error(y_test,y_pred)

5017.725895684722

#輸出R平方值
r2_score(y_test,y_pred)

0.8875379565909532

shiinerise

發佈了54 篇原創文章 · 獲贊 10 · 訪問量 3萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python迴歸分析實戰——汽車銷量與什麼因素有關

1 分析數據

1.1 導包，讀取數據

1.2 查看是否有缺失值

1.3 分析數據之間的相關性

1.4 劃分訓練集和測試集

2 標準化處理

3 構建模型

sm4加密工具類

《動手學深度學習》task8_2 數據增強

《動手學深度學習》task4_3 Transformer

《動手學深度學習》task6_1 批量歸一化和殘差網絡

python時序分析實戰

《動手學深度學習》task5_1 卷積神經網絡基礎

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結