ARIMA 时间序列2: 评估和参数选择

  • ARIMA -> SARIMA -> SARIMAX:
    S是Seasonal,就是季节性、周期性的意思
    X是eXogenous,外部信息的意思

  • 季节性参数:
    P:季节性自回归阶数。
    D:季节性差分阶数。
    Q:季节性移动平均阶数。
    m:单个季节期间的时间步数。

import numpy as np
import pandas as pd
import matplotlib.pylab as pl
import seaborn as sns
%matplotlib inline
sns.set(style = 'ticks', context = 'poster')

pd.set_option('display.float_format', lambda x: '%.5f' % x)
np.set_printoptions(precision = 5, suppress = True)

filename_ts = 'data/series1.csv'
ts_df = pd.read_csv(filename_ts, index_col = 0, parse_dates = [0])
n_sample = ts_df.shape[0]

print(ts_df.shape)
ts_df.head(6)

在这里插入图片描述

# 训练集和测试集

n_train = int(0.95 * n_sample) + 1
n_test = n_sample - n_train

ts_train = ts_df.iloc[:n_train]['value']
ts_test = ts_df.iloc[n_train:]['value']
print(ts_train.shape)
print(ts_test.shape)
print('Training Series:', '\n', ts_train.tail(), '\n')
print('Testing Series:', '\n', ts_test.head())

在这里插入图片描述

import statsmodels.tsa.api as smt

import statsmodels.tsa.api as smt

def tsplot(y, lags=None, title='', figsize=(20, 12)):
    fig = pl.figure(figsize = figsize)
    layout = (2, 2)
    ts_ax = pl.subplot2grid(layout, (0, 0))
    hist_ax = pl.subplot2grid(layout, (0, 1))
    acf_ax = pl.subplot2grid(layout, (1, 0))
    pacf_ax = pl.subplot2grid(layout, (1, 1))
    
    y.plot(ax = ts_ax)
    ts_ax.set_title(title)
    
    y.plot(ax = hist_ax, kind = 'hist', bins = 25)
    hist_ax.set_title('Histogram')

	smt.graphics.plot_acf(y, lags = lags, ax = acf_ax)
	smt.graphics.plot_pacf(y, lags = lags, ax = pacf_ax)
	[ax.set_xlim(0) for ax in [acf_ax, pacf_ax]]
    sns.despine() #去掉上方和右方的线
    fig.tight_layout()
    return ts_ax, acf_ax, pacf_ax

tsplot(ts_train, title='A Given Training Series', lags=20)

在这里插入图片描述

# 模型评估
import statsmodels.api as sm

arima200 = sm.tsa.SARIMAX(ts_train, order = (2, 0, 0)
model_results = arima200.fit()

模型选择AIC与BIC: 选择更简单的模型

  • AIC:赤池信息准则(Akaike Information Criterion,AIC)
    𝐴𝐼𝐶 = 2𝑘 − 2ln(𝐿)
  • BIC:贝叶斯信息准则(Bayesian Information Criterion,BIC)
    𝐵𝐼𝐶 = 𝑘𝑙𝑛 𝑛 − 2ln(𝐿)
  • k为模型参数个数,n为样本数量,L为似然函数
import itertools #迭代器模块
p_min = 0
d_min = 0
q_min = 0
p_max = 4
d_max = 0
q_max = 4

results_bic = pd.DataFrame(index=['AR{}'.format(i) for i in range(p_min, p_max+1)],
                          columns=['MA{}'.format(i) for i in range(q_min,q_max+1)])
for p,d,q in itertools.product(range(p_min,p_max+1),range(d_min,d_max+1),range(q_min,q_max+1)):
	if p==0 and d==0 and q==0:
		results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = np.nan
        continue
    try:
    	model = sm.tsa.SARIMAX(ts_train, order=(p,d,q)
    	results = model.fit()
    	results_bic.loc['AR{}'.format(p), 'MA{}'.format(q)] = results.bic
   	except:
   		continue

results_bic = results_bic[results_bic.columns].astype(float)
results_bic

在这里插入图片描述

fig, ax = pl.subplots(figsize = (10, 8))
ax = sns.heatmap(results_bic, mask=results_bic.isnull(), ax=ax, annot=True, fmt='.2f')
ax.set_title('BIC')

在这里插入图片描述

train_results = sm.tsa.arma_order_select_ic(ts_train, ic=['aic', 'bic'], trend='nc', max_ar=4, max_ma=4)
print('AIC', train_results.aic_min_order)
print('BIC', train_results.bic_min_order)

AIC (4, 2)
BIC (1, 1)

train_results

在这里插入图片描述

模型残差检验:

  • ARIMA模型的残差是否是平均值为0且方差为常数的正态分布
  • QQ图:线性即正态分布
import statsmodels.api as sm

arima111 = sm.tsa.SARIMAX(ts_train, order=(1,1,1))
model_results = arima111.fit()

model_results.plot_diagnostics(figsize = (16, 12));

在这里插入图片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章