ARIMA 時間序列2: 評估和參數選擇

  • ARIMA -> SARIMA -> SARIMAX:
    S是Seasonal,就是季節性、週期性的意思
    X是eXogenous,外部信息的意思

  • 季節性參數:
    P:季節性自迴歸階數。
    D:季節性差分階數。
    Q:季節性移動平均階數。
    m:單個季節期間的時間步數。

import numpy as np
import pandas as pd
import matplotlib.pylab as pl
import seaborn as sns
%matplotlib inline
sns.set(style = 'ticks', context = 'poster')

pd.set_option('display.float_format', lambda x: '%.5f' % x)
np.set_printoptions(precision = 5, suppress = True)

filename_ts = 'data/series1.csv'
ts_df = pd.read_csv(filename_ts, index_col = 0, parse_dates = [0])
n_sample = ts_df.shape[0]

print(ts_df.shape)
ts_df.head(6)

在這裏插入圖片描述

# 訓練集和測試集

n_train = int(0.95 * n_sample) + 1
n_test = n_sample - n_train

ts_train = ts_df.iloc[:n_train]['value']
ts_test = ts_df.iloc[n_train:]['value']
print(ts_train.shape)
print(ts_test.shape)
print('Training Series:', '\n', ts_train.tail(), '\n')
print('Testing Series:', '\n', ts_test.head())

在這裏插入圖片描述

import statsmodels.tsa.api as smt

import statsmodels.tsa.api as smt

def tsplot(y, lags=None, title='', figsize=(20, 12)):
    fig = pl.figure(figsize = figsize)
    layout = (2, 2)
    ts_ax = pl.subplot2grid(layout, (0, 0))
    hist_ax = pl.subplot2grid(layout, (0, 1))
    acf_ax = pl.subplot2grid(layout, (1, 0))
    pacf_ax = pl.subplot2grid(layout, (1, 1))
    
    y.plot(ax = ts_ax)
    ts_ax.set_title(title)
    
    y.plot(ax = hist_ax, kind = 'hist', bins = 25)
    hist_ax.set_title('Histogram')

	smt.graphics.plot_acf(y, lags = lags, ax = acf_ax)
	smt.graphics.plot_pacf(y, lags = lags, ax = pacf_ax)
	[ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]
    sns.despine() #去掉上方和右方的線
    fig.tight_layout()
    return ts_ax, acf_ax, pacf_ax

tsplot(ts_train, title='A Given Training Series', lags=20)

在這裏插入圖片描述

# 模型評估
import statsmodels.api as sm

arima200 = sm.tsa.SARIMAX(ts_train, order = (2, 0, 0)
model_results = arima200.fit()

模型選擇AIC與BIC: 選擇更簡單的模型

  • AIC:赤池信息準則(Akaike Information Criterion,AIC)
    𝐴𝐼𝐶 = 2𝑘 − 2ln(𝐿)
  • BIC:貝葉斯信息準則(Bayesian Information Criterion,BIC)
    𝐵𝐼𝐶 = 𝑘𝑙𝑛 𝑛 − 2ln(𝐿)
  • k爲模型參數個數,n爲樣本數量,L爲似然函數
import itertools #迭代器模塊
p_min = 0
d_min = 0
q_min = 0
p_max = 4
d_max = 0
q_max = 4

results_bic = pd.DataFrame(index=['AR{}'.format(i) for i in range(p_min, p_max+1)],
                          columns=['MA{}'.format(i) for i in range(q_min,q_max+1)])
for p,d,q in itertools.product(range(p_min,p_max+1),range(d_min,d_max+1),range(q_min,q_max+1)):
	if p==0 and d==0 and q==0:
		results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = np.nan
        continue
    try:
    	model = sm.tsa.SARIMAX(ts_train, order=(p,d,q)
    	results = model.fit()
    	results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = results.bic
   	except:
   		continue

results_bic = results_bic[results_bic.columns].astype(float)
results_bic

在這裏插入圖片描述

fig, ax = pl.subplots(figsize = (10, 8))
ax = sns.heatmap(results_bic, mask=results_bic.isnull(), ax=ax, annot=True, fmt='.2f')
ax.set_title('BIC')

在這裏插入圖片描述

train_results = sm.tsa.arma_order_select_ic(ts_train, ic=['aic', 'bic'], trend='nc', max_ar=4, max_ma=4)
print('AIC', train_results.aic_min_order)
print('BIC', train_results.bic_min_order)

AIC (4, 2)
BIC (1, 1)

train_results

在這裏插入圖片描述

模型殘差檢驗:

  • ARIMA模型的殘差是否是平均值爲0且方差爲常數的正態分佈
  • QQ圖:線性即正態分佈
import statsmodels.api as sm

arima111 = sm.tsa.SARIMAX(ts_train, order=(1,1,1))
model_results = arima111.fit()

model_results.plot_diagnostics(figsize = (16, 12));

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章