Time Serise Analysis[Using R]

Time Serise Analysis[Using R]

[近期需要用到時間序列分析,順便整理下筆記以供日後參考]

時間序列分析基本流程

  時間序列分析在R中的實戰分析

  1. #### 導入數據
# Get Work Directory
getwd()
# Import Data From local File
Data <- read.csv('~/Documents/data.csv', fill = TRUE, header = TRUE)
# Use data which is incorporated in R
Data  <- AirPassengers
#Generate Data
t = ts(seq(1,30))
Date_List <- seq(from = as.Date('2016-9-1'),by=1,length.out = 30)
Data = data.frame(Date_List,t)
  1. 可視化數據

    可視化時間序列數據的目的在於分析數據的趨勢性、季節性以及它的隨機表現

    plot(AirPassengers)
    abline(reg=lm(AirPassengers~time(AirPassengers)))

  2. 平穩化時間序列

    時間序列的平穩性有3個基本的判別準則

    1. The mean of the series should not be a function of time rather should be a constant.

    2. The variance of the series should not a be a function of time. This property is known as homoscedasticity.

    3. The covariance of the i th term and the (i + m) th term should not be a function of time.

    
    # Dickey Fuller Test of Stationarity
    
    
    # AR or MA are not applicable on non-stationary series.
    
    install.packages('fUnitRoots')
    library(fUnitRoots)
    adfTest(AirPassengers)
    
    
    # Result
    
    Title:
    Augmented Dickey-Fuller Test
    
    Test Results:
     PARAMETER:
       Lag Order: 1
     STATISTIC:
       Dickey-Fuller: -0.3524
     P VALUE:
       0.5017 

    將時間序列平穩化的三個基本技巧

    1. Detrending

      Here, we simply remove the trend component from the time series. (If We Know the trend component)

    2. Differencing

    3. Seasonality

      Seasonality can easily be incorporated in the ARIMA model directly

    adfTest(diff(log(AirPassengers)))
    
    
    # Result
    
    Title:
    Augmented Dickey-Fuller Test
    
    Test Results:
     PARAMETER:
       Lag Order: 1
     STATISTIC:
       Dickey-Fuller: -8.8157
     P VALUE:
       0.01 

  3. 依據ACF、PACF尋找合適的參數

    Once we have got the stationary time series, we must answer two primary questions:

    Q1. Is it an AR or MA process?

    Q2. What order of AR or MA process do we need to use?

    Simple Example:

    • AR : [x(t) = alpha * x(t – 1) + error (t)]
    • MA : [x(t) = beta * error(t-1) + error (t)]
    acf(diff(log(AirPassengers))) # FOR Parameters p (MA Model)

    pacf(diff(log(AirPassengers))) # FOR Parameters q (AR Model)

    Clearly, ACF plot cuts off after the first lag. Hence, we understood that value of p should be 0 as the ACF is the curve getting a cut off. While value of q should be 1 or 2. After a few iterations, we found that (0,1,1) as (p,d,q) comes out to be the combination with least AIC and BIC.

  4. 建立ARIMA模型

    The value found in the previous section might be an approximate estimate and we need to explore more (p,d,q) combinations. The one with the lowest BIC and AIC should be our choice.

    fit <- arima(log(AirPassengers), c(0, 1, 1),seasonal = list(order = c(0, 1, 1), period = 12))
    
    # d choose 1 because diff's order is 1
    
  5. 模型預測

    pred <- predict(fit, n.ahead = 10*12)
    ts.plot(AirPassengers,2.718^pred$pred, log = "y", lty = c(1,3))


Reference :A Complete Tutorial on Time Series Modeling in R

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章