時間序列分析1：python裏用AR自迴歸模型分析預測時間序列

原創

htuhxf

2020-04-19 18:21

Autoregression / AR，就是用前期數據來預測後期數據的迴歸模型，所以叫做自迴歸模型。

它的邏輯簡單，但對時間序列問題能夠做出相當準確的預測。

1）自迴歸函數

$y\hat{}_t = b_0 + b_1y_{t-1} + ... + b_ny_{t-n} , 其中n<t$

2）上例子

先看下要用到的數據：直接從鏈接下載數據使用

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv',
				 index_col=0, parse_dates=True)
print(df.head())
df.plot()
plt.show()

# 結果如下
            Temp
Date            
1981-01-01  20.7
1981-01-02  17.9
1981-01-03  18.8
1981-01-04  14.6
1981-01-05  15.8

快速查看數據是否適合AR模型

from pandas.plotting import lag_plot
lag_plot(df)
plt.show()
# 結果如圖所示：

如上圖所示， $y_{t+1}$ 和 $y_t$ 明顯相關。
當然我們可以通過計算，看出來相關係數是0.77

values = pd.DataFrame(df.values)
dataframe = pd.concat([values.shift(1), values], axis=1)
print(dataframe.corr())

# 結果如下
         0        0
0  1.00000  0.77487
0  0.77487  1.00000

這是個很好的一次性檢測方法。但是如果我們想檢測的Lag數據一多，重複得就太無聊了。
下面是一次性畫出n多Lag的自迴歸係數方法。

pandas.plotting.autocorrelation_plot()和statsmodels.graphics.tsaplots.plot_acf畫圖：

第一個： autocorrelation_plot提供一個以Lag數（此處單位“天”）爲x軸，上下限爲[-1, 1]爲y軸表示自相關係數的座標系；座標系中還有實線和虛線各2條，代表相關值得95%和99%置信區間，2條線圍成區域之外意味着更重要的相關度；

import pandas as pd

pd.plotting.autocorrelation_plot(df)
plt.show()

如上圖所示，Lag得越多，相關性越弱；意味着溫度隨着春夏和秋冬上線波動，先是正相關繼而負相關。

第二個：statsmodels.graphics.tsaplots.plot_acf()，有更多的自定義參數，例如設定lags數值

from statsmodels.graphics.tsaplots import plot_acf
fig, axes = plt.subplots(2,1)

plot_acf(df, lags=35, title='Autocorrelation Lag=35', ax=axes[0])
plot_acf(df, lags=365, title='Autocorrelation Lag=365', ax=axes[1])

plt.tight_layout()
plt.show()

到此爲止我們就知道如何查看時間序列數據的自相關性了。接下來看如何用對它建立模型。

import pandas as pd
import numpy as np
from statsmodels.tsa.ar_model import AR
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error as MSE

df = pd.read_csv('e:/daily-min-temperatures.csv',
                 index_col=0, parse_dates=True)
# 把模型數據分爲train和test，分別用來訓練模型和對比模型預測結果
x = df.values
train, test = x[:-7], x[-7:]  # test的個數設定爲7

# 訓練模型得到所需參數，AR的之後項個數p，和自迴歸函數的各個係數
model_fit = AR(train).fit()
params = model_fit.params
p = model_fit.k_ar  # 即時間序列模型中常見的p，即AR(p), ARMA(p,q), ARIMA(p,d,q)中的p。
# p的實際含義，此處即p=29，意味着當天的溫度由最近29天的溫度來預測。

history = train[-p:]   
history = np.hstack(history).tolist()  # 也可以用history = [history[i] for i in range(len(history))] ，唯一的目的就是通過append(test[i])實施更新history
test = np.hstack(test).tolist()

predictions = []
for t in range(len(test)):
	lag = history[-p:]
	yhat = params[0]
	for i in range(p):
		yhat += params[i+1] * lag[p-1-i]
	predictions.append(yhat)
	obs = test[t]
	history.append(obs)
print(np.mean((np.array(test) - np.array(predictions))**2))  # 得到mean_squared_error, MSE
plt.plot(test)
plt.plot(predictions, color='r')
plt.show()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

時間序列分析1：python裏用AR自迴歸模型分析預測時間序列

1）自迴歸函數

2）上例子

Nginx R31 doc 官方文檔-01-nginx 如何安裝

python筆記：multiprocessing 函數apply和apply_async有什麼區別？

ML筆記：分類算法之SVM

ML：常見判斷類模型好壞指標 - 混淆矩陣 & ROC曲線 & AUC & 其他

python筆記：df.plot()常見的座標軸的操作，及正常顯示負號

ML：非監督學習之聚類之 1 KMeans聚類（sklearn.cluster.KMeans)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結