ARIMA Time Series Modeling

https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/

Step 0 : Create a ts object

You only need a (single) time series, a frequency, and a start date. The examples at the bottom of the ?ts documentation should be very helpful. I’m guessing you’d write something like ts( your_timeseries_data, frequency = 365, start = c(1980, 153)) for instance if your data started on the 153rd day of 1980.

Step 1: Visualize the Time Series

plot(AirPassengers)

#this will plot the time series

abline(reg=lm(AirPassengers~time(AirPassengers)))

#this will fit a line

cycle(AirPassengers)

#this will print the cycle across year

plot(aggregate(AirPassengers,FUN=mean))

#This will aggregate the cycles and display a year on year trend

boxplot(AirPassengers~cycle(AirPassengers))

#Box plot across months will give us a sense on seasonal effect

Step 2: Stationarize the Series

We know that we need to address two issues before we test stationary series. One, we need to remove unequal variances. We do this using log of the series. Two, we need to address the trend component. We do this by taking difference of the series. Now, let’s test the resultant series. (Dickey-Fuller test of tseries package)

adf.test(diff(log(AirPassengers)), alternative="stationary", k=0)
data: diff(log(AirPassengers))
 Dickey-Fuller = -9.6003, Lag order = 0,
 p-value = 0.01
 alternative hypothesis: stationary

We see that the series is stationary enough to do any kind of time series modelling.

There are three commonly used technique to make a time series stationary if otherwise:

1.  Detrending : Here, we simply remove the trend component from the time series. For instance, the equation of my time series is:

x(t) = (mean + trend * t) + error

We’ll simply remove the part in the parentheses and build model for the rest.

2. Differencing : This is the commonly used technique to remove non-stationarity. Here we try to model the differences of the terms and not the actual term. For instance,

x(t) – x(t-1) = ARMA (p ,  q)

This differencing is called as the Integration part in AR(I)MA. Now, we have three parameters

p : AR

d : I

q : MA

3. Seasonality : Seasonality can easily be incorporated in the ARIMA model directly. More on this has been discussed in the applications part below.

Step 3: Find Optimal Parameters

The parameters p,d,q can be found using  ACF and PACF plots.

ACF plot is a bar chart of the coefficients of correlation between a time series and lags of itself.
PACF plot is a plot of the partial correlation coefficients between the series and lags of itself.

acf是自相关系数,并不对其他变量加以控制。而偏自相关系数pacf,就是控制住其他变量后计算的自相关系数,由于他挖空了其他变量影响,所以二者的值应该不同

To find p and q you need to look at ACF and PACF plots. The interpretation of ACF and PACF plots to find p and q are as follows:

AR (p) model: If ACF plot tails off* but PACF plot cut off** after p lags
MA(q) model: If PACF plot tails off but ACF plot cut off after q lags
ARMA(p,q) model: If both ACF and PACF plot tail off, you can choose different combinations of p and q , smaller p and q are tried.
ARIMA(p,d,q) model: If it’s ARMA with d times differencing to make time series stationary.

Use AIC and BIC to find the most appropriate model. Lower values of AIC and BIC are desirable.

*Tails of mean slow decaying of the plot, i.e. plot has significant spikes at higher lags too.
**Cut off means the bar is significant at lag p and not significant at any higher order lags.

首先判断acf图和pacf图是否平稳,假如非平稳那么需要差分,如果一阶差分后仍非平稳,则需要二阶差分,等等。(d)

在确定差分平稳后,需要判断p和q,这里定阶方法有很多,因为p和q的确定也很复杂,不是一下子就可以确定的。主要有这么几种(1)观察法,直接观察,如果acf在q+1阶突然截断,在q处截尾,则为ma(q)序列,同理,pacf在p处截尾则为ar(p)序列,否则为arma(p,q)序列,二者结合进一步判断(2)参数检验,利用数理统计检验高阶模型的新增加的参数是否近似为零,检验模型残差的相关特性等(3)信息准则,确定一个与模型阶数有关的准则,如AIC、BIC等,既考虑拟合效果接近程度,又考虑参数个数。实际中往往多种方法综合应用,选择最合适的p,d,q.

对于同一个图有人认为是5阶截尾,有人认为是7阶截尾,仁者见仁,但是基本原则就是这样,在什么地方截尾认为是几阶的,虽然截尾地方往往不止一处,在模型诊断部分可以比较不同的拟合效果,增强说服力。

Here is a link that might help you understand the concept further http://people.duke.edu/~rnau/arimrule.htm

acf(diff(log(AirPassengers)))
pacf(diff(log(AirPassengers)))

An addition to this approach is can be, if both ACF and PACF decreases gradually, it indicates that we need to make the time series stationary and introduce a value to “d”.Next step is to find the right parameters to be used in the ARIMA model.

We already know that the ‘d’ component is 1 as we need 1 difference to make the series stationary. (We have difference the series once and get to see that the trend is removed. Had the trend been still there we would have difference the series once again. This series did not require to be difference more than once; hence d=1.)

Clearly, ACF plot cuts off after the first lag. Hence, we understood that value of p should be 0 as the ACF is the curve getting a cut off. While value of q should be 1 or 2. After a few iterations, we found that (0,1,1) as (p,d,q) comes out to be the combination with least AIC and BIC.

 

Step 4: Build ARIMA Model

 

With the parameters in hand, we can now try to build ARIMA model. The value found in the previous section might be an approximate estimate and we need to explore more (p,d,q) combinations. The one with the lowest BIC and AIC should be our choice. We can also try some models with a seasonal component. Just in case, we notice any seasonality in ACF/PACF plots.

Let’s fit an ARIMA model and predict the future 10 years. Also, we will try fitting in a seasonal component in the ARIMA formulation. Then, we will visualize the prediction along with the training data. You can use the following code to do the same :

fit <- arima(log(AirPassengers), c(0, 1, 1),seasonal = list(order = c(0, 1, 1), period = 12))

Step 5: Make Predictions

Once we have the final ARIMA model, we are now ready to make predictions on the future time points. We can also visualize the trends to cross validate if the model works fine.

pred <- predict(fit, n.ahead = 10*12)
ts.plot(AirPassengers,2.718^pred$pred, log = "y", lty = c(1,3))

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章