ARIMA模型（p,d,q）參數確定（python）

參數d:

ARIMA 模型對時間序列的要求是平穩型。因此，當你得到一個非平穩的時間序列時，首先要做的即是做時間序列的差分，直到得到一個平穩時間序列。如果你對時間序列做d次差分才能得到一個平穩序列，那麼可以使用ARIMA(p,d,q)模型，其中d是差分次數。

模型的參數p和q由ACF和PACF確定

如下表格

statsmodels介紹

statsmodels（http://www.statsmodels.org）是一個Python庫，用於擬合多種統計模型，執行統計測試以及數據探索和可視化。statsmodels包含更多的“經典”頻率學派統計方法，而貝葉斯方法和機器學習模型可在其他庫中找到。
包含在statsmodels中的一些模型：

線性模型，廣義線性模型和魯棒線性模型
線性混合效應模型
方差分析（ANOVA）方法
時間序列過程和狀態空間模型
廣義的矩量法

我們使用其中的時間序列相關函數進行模型的構建。數據爲歷年美國消費者信心指數數據，代碼如下：

%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format='retina'
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import statsmodels.tsa.api as smt
#Display and Plotting
import matplotlib.pylab as plt
import seaborn as sns
# pandas與numpy屬性設置
pd.set_option('display.float_format',lambda x:'%.5f'%x)#pandas
np.set_printoptions(precision=5,suppress=True) #numpy
pd.set_option('display.max_columns',100)
pd.set_option('display.max_rows',100)
#seaborn.plotting style
sns.set(style='ticks',context='poster')

Sentiment='sentiment.csv'
Sentiment=pd.read_csv(Sentiment,index_col=0,parse_dates=[0])

Sentiment.head()

#選取時間斷
sentiment_short=Sentiment.loc['2005':'2016']
sentiment_short.plot(figsize=(12,8))
plt.legend(bbox_to_anchor=(1.25,0.5))
plt.title('Consumer Sentiment')
sns.despine()

#help(sentiment_short['UMCSENT'].diff(1))
#函數diss()作用：https://blog.csdn.net/qq_32618817/article/details/80653841#
#https://blog.csdn.net/You_are_my_dream/article/details/70022464,一次差分，和二次差分，減少數據的波動
#做一次差分和二次差分（就是在一次差分的結果上再做一次差分）
sentiment_short['diff_1']=sentiment_short['UMCSENT'].diff(1)
sentiment_short['diff_2']=sentiment_short['diff_1'].diff(1)
sentiment_short.plot(subplots=True,figsize=(18,12))

del sentiment_short['diff_2']
del sentiment_short['diff_1']
sentiment_short.head()
print(type(sentiment_short))

自相關函數ACF（Autocorrelation funtion）

有序的隨機變量序列與其自身相比較，自相關函數反映了同一序列在不同時序的取值之間的相關性
公式： $ACF(k)=P_{k}=\frac{Cov(y_{t},y_{t-k})}{Var(y_{t})}$
PK的取值範圍爲[-1,1]

偏自相關函數（PACF）(partial autocorrelation function)

對於一個平穩AR§模型，求出滯後k自相關係數p(k)時實際上得到並不是x(t)與x(t-k)之間單純的相關關係
x(t)同時還會受到中間k-1個隨機變量x(t-1)、x(t-2)、……、x(t-k+1)的影響而這k-1個隨機變量又都和x(t-k)具有相關關係所以自相關係數p(k)裏實際摻雜了其他變量對x(t)與x(t-k)的影響
剔除了中間k-1個隨機變量x(t-1)、x(t-2)、……、x(t-k+1)的干擾之後 x(t-k)對x(t)影響的相關程度
ACF還包含了其他變量的影響而偏自相關係數PACF是嚴格這兩個變量之間的相關性

fig=plt.figure(figsize=(12,8))
ax1=fig.add_subplot(211)
fig=sm.graphics.tsa.plot_acf(sentiment_short,lags=20,ax=ax1)#自相關
ax1.xaxis.set_ticks_position('bottom')
fig.tight_layout();
ax2=fig.add_subplot(212)
fig=sm.graphics.tsa.plot_pacf(sentiment_short,lags=20,ax=ax2)#偏自相關
ax2.xaxis.set_ticks_position('bottom')
fig.tight_layout()

#直觀：
def tsplot(y,lags=None,title='',figsize=(14,8)):
    fig=plt.figure(figsize=figsize)
    layout=(2,2)
    ts_ax=plt.subplot2grid(layout,(0,0))
    hist_ax=plt.subplot2grid(layout,(0,1))
    acf_ax=plt.subplot2grid(layout,(1,0))
    pacf_ax=plt.subplot2grid(layout,(1,1))
    y.plot(ax=ts_ax)
    ts_ax.set_title(title)
    y.plot(ax=hist_ax,kind='hist',bins=25)
    hist_ax.set_title('Histogram')
    smt.graphics.plot_acf(y,lags=lags,ax=acf_ax)
    smt.graphics.plot_pacf(y,lags=lags,ax=pacf_ax)
    [ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]
    sns.despine()
    plt.tight_layout()
    #return ts_ax,acf_ax,pacf_ax
tsplot(sentiment_short, title='Consumer Sentiment', lags=36);

參考

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

ARIMA模型（p,d,q）參數確定（python）

參數d:

模型的參數p和q由ACF和PACF確定

statsmodels介紹

我們使用其中的時間序列相關函數進行模型的構建。數據爲歷年美國消費者信心指數數據，代碼如下：

自相關函數ACF（Autocorrelation funtion）

偏自相關函數（PACF）(partial autocorrelation function)

Flink簡介以及與sparkStreaming和Storm比較

flink打包程序提交任務示例

時間序列——使用tsfresh進行分類任務

解決ubuntu下丟失win10引導項

Pandas數據重採樣

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結