【時間序列 - 01】Monthly Trend and Seasonal Factor(月增長趨勢和季節性因子)

Introduction

Many workforce management software systems also use this time-series process as a base for forecasting, so understanding this process may help you better understand how your software works and more easily explain the results to others.

  • trend: simply the rate of change of call history. (trend rate 大部分是正的,但也有負增長率)

    While most trend rates tend to be positive and growing, the trend rate could also be a declining rate.The standard way to calculate trend is to first look at an annual trend rate and then to break the annual trend into monthly numbers.

  • seasonality: because any given month’s data will contain the effects of both trend and seasonality, it is important to remove the trend as an influence so seasonal influences can be isolated and viewed more clearly. (需要將第一部分的 trend 因素移除,使得季節性因素能夠獨立分析)

“Monthly Trend and Seasonal Factor”算法比較簡單,本文提供的代碼也相對簡單,一步一步實現,目的是儘可能詳細記錄過程但可能有些繁瑣。

steps

1)Calculating trend:根據歷史數據計算趨勢增長率(正增長、負增長)得到 monthly rate,如:2016年:2015年;

2)Detrend:抵消歷史數據中 trend 趨勢因素,使得後續能夠獨立比較季節性的因素,獲得 avg_detrended_value;This detrending happens by factoring in a monthly trend factor to each month of data to bring all call history up to current levels.

3)Seasonal factors:當 detrending 完成後,計算季節性因子,獲得每個月對應的 seasonal_factors;

4)根據 monthly_rate, avg_detrended_value, seasonal_factors 進行預測:如, 預測2018年6月份 predict_6_month = avg_detrended_value * ( monthly_rate )^ 6 *  seasonal_factors(6月份);注意式中是使用去趨勢化後的所有月份的均值 avg_detrended_value,而不是使用當月份(如,6月份)的實際值。

 

Script

# -*- coding: utf-8 -*-
# @Date     : 20180620 - afternoon
# @Language : Python3.6
# @author   : 初類

from matplotlib import pyplot as plt
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
import xlrd
import xlwt
import numpy as np

# =============================================================================
# plot
# =============================================================================
def plot_results_multiple(data_year_list, class_name):
    
    fig = plt.figure(facecolor='white')
    
    
    ## real data
    ax11 = fig.add_subplot(321)
    ax11.plot(x_lable, data_year_list[0], label='2013-real data')
    ax11.plot(x_lable, data_year_list[1], label='2014-real data')
    ax11.plot(x_lable, data_year_list[2], label='2015-real data')
    ax11.plot(x_lable, data_year_list[3], label='2016-real data')
    ax11.plot(x_lable, data_year_list[4], label='2017-real data')
#    ax11.plot(np.arange(1, len(data_year_list[5])+1), data_year_list[5], label='2018-real data')
    
    ax11.set_ylabel('GMV: Gross Merchandise Volume')
    ax11.set_title(u'{}'.format(class_name))
    ax11.xaxis.grid()
    plt.legend()
    
    ## error
    ax12 = fig.add_subplot(322)
    ax12.scatter(x_lable, data_year_list[10], label='2015-error')
    ax12.scatter(x_lable, data_year_list[11], label='2016-error')
    ax12.scatter(x_lable, data_year_list[12], label='2017-error')
#    ax12.plot(np.arange(1, len(data_year_list[13])+1), data_year_list[13], label='2018-error')
    ax12.set_xlabel('Month from 1 to 12')
    ax12.set_ylabel('predict error')
    ax12.xaxis.grid()
    plt.legend()
    
    ## predict-real:2015
    ax21 = fig.add_subplot(323)
    ax21.bar(x_lable, data_year_list[2], alpha=0.8, label='2015-real data')
    ax21.bar(x_lable, data_year_list[6], alpha=0.5, label='2015-predict data')
    plt.legend(loc='upper left')
    ax21_2 = ax21.twinx()
    ax21_2.plot(x_lable, data_year_list[10], 'r', label='2015-error')
    ax21.xaxis.grid()
    plt.legend(loc='upper right')
    
    ## predict-real:2016
    ax22 = fig.add_subplot(324)
    ax22.bar(x_lable, data_year_list[3], alpha=0.8, label='2016-real data')
    ax22.bar(x_lable, data_year_list[7], alpha=0.5, label='2016-predict data')
    plt.legend(loc='upper left')
    ax22_2 = ax22.twinx()
    ax22_2.plot(x_lable, data_year_list[11], 'r', label='2016-error')
    ax22.xaxis.grid()
    plt.legend(loc='upper right')
    
    ## predict-real:2017
    ax31 = fig.add_subplot(325)
    ax31.bar(x_lable, data_year_list[4], alpha=0.8, label='2017-real data')
    ax31.bar(x_lable, data_year_list[8], alpha=0.5, label='2017-predict data')
    plt.legend(loc='upper left')
    ax31_2 = ax31.twinx()
    ax31_2.plot(x_lable, data_year_list[12], 'r', label='2017-error')
    ax31.xaxis.grid()
    plt.legend(loc='upper right')
    
# =============================================================================
#     ## predict-real:2018
#     ax32 = fig.add_subplot(326)
#     ax32.bar(np.arange(1, len(data_year_list[5])+1), data_year_list[5], alpha=0.8, label='2018-real data')
#     ax32.bar(x_lable, data_year_list[9], alpha=0.5, label='2018-predict data')
#     plt.legend(loc='upper left')
#     ax32_2 = ax32.twinx()
#     ax32_2.plot(np.arange(1, len(data_year_list[5])+1), data_year_list[13], 'r', label='2018-error')
#     ax32.xaxis.grid()
#     plt.legend(loc='upper right')
# =============================================================================
    

    
    fig.set_size_inches(16, 18)
    
    
# =============================================================================
#     show table
# =============================================================================
#    col_labels = ['2017_pred','2017_real', '2017_error', '2018_pred', '2018_real', '2018_error']
#    table_vals = [data_year_list[3], data_year_list[2], data_year_list[6], 
#                  data_year_list[4], data_year_list[5], data_year_list[7]]

    
    plt.savefig("./xxx/{}".format(class_name))



# =============================================================================
# meta-parematers
# =============================================================================

all_leaf_class_name_list = ['毛針織衫', '休閒運動套裝']

#month_list = ["Jan","Feb","Mar","Apr","May","June",
#              "July","Aug","Sept","Oct","Nov","Dec"]
month_list = ["Apr","May","June",
              "July","Aug","Sept","Oct","Nov","Dec"]

save_file_name = "result_MonthlyRate_SeasonalFactor_by_month_0625.xls"
file_path = './dataset/target_leaf_level_0625_by_month.csv'
file = xlrd.open_workbook(file_path)
class_name_list = all_leaf_class_name_list

start_month, end_month = 9, 12 + 1  ## 
x_lable = np.arange(start_month, end_month)
    
data_year_2013 = []
data_year_2014 = []
data_year_2015 = []
data_year_2016 = []
data_year_2017 = []
data_year_2018 = []
data_year_list = []

Annual_trend_2014_2013 = []
Annual_trend_2015_2014 = []
Annual_trend_2016_2015 = []
Annual_trend_2017_2016 = []


detrend_2014 = []
detrend_2015 = []
detrend_2016 = []
detrend_2017 = []

seasonal_factor_2014 = []
seasonal_factor_2015 = []
seasonal_factor_2016 = []
seasonal_factor_2017 = []

predict_2015 = []
predict_2016 = []
predict_2017 = []
predict_2018 = []

error_2015_predict = []
error_2016_predict = []
error_2017_predict = []
error_2018_predict = []

target_data = xlwt.Workbook(encoding="utf-8")

GMV_index = 1  ## GMV 所在的列索引

def main():
    
    global start_month, end_month
    
    for class_name in class_name_list:
        
        data_year_2013.clear()
        data_year_2014.clear()
        data_year_2015.clear()
        data_year_2016.clear()
        data_year_2017.clear()
        data_year_2018.clear()
        data_year_list.clear()
        
        try:
# =============================================================================
#             store result
# =============================================================================
            target_sheet = target_data.add_sheet(u'{}'.format(class_name))
            
            target_sheet.write(0, 0, "month")
            for mon_index in range(len(month_list)):
                target_sheet.write(mon_index+1, 0, month_list[mon_index])
            
            target_sheet.write(0, 1, "2013-real")
            target_sheet.write(0, 2, "2014-real")
            target_sheet.write(0, 3, "2015-real")
            target_sheet.write(0, 4, "2016-real")
            target_sheet.write(0, 5, "2017-real")
            target_sheet.write(0, 6, "2018-real")
            
            ## predict 2015
            start_index = 7
            target_sheet.write(0, start_index, "Annual_trend_2014_2013")
            target_sheet.write(0, start_index+1, "detrened_2014")
            target_sheet.write(0, start_index+2, "seasonal factor 2014")
            target_sheet.write(0, start_index+3, "predict_2015")
            target_sheet.write(0, start_index+4, "error_2015")
            
            ## predict 2016
            start_index = 12
            target_sheet.write(0, start_index, "Annual_trend_2015_2014")
            target_sheet.write(0, start_index+1, "detrened_2015")
            target_sheet.write(0, start_index+2, "seasonal factor 2015")
            target_sheet.write(0, start_index+3, "predict_2016")
            target_sheet.write(0, start_index+4, "error_2016")
            
            ## predict 2017
            start_index = 17
            target_sheet.write(0, start_index, "Annual_trend_2016_2015")
            target_sheet.write(0, start_index+1, "detrened_2016")
            target_sheet.write(0, start_index+2, "seasonal factor 2016")
            target_sheet.write(0, start_index+3, "predict_2017")
            target_sheet.write(0, start_index+4, "error_2017")
            
            ## predict 2018
            start_index = 22
            target_sheet.write(0, start_index, "Annual_trend_2017_2016")
            target_sheet.write(0, start_index+1, "detrened_2017")
            target_sheet.write(0, start_index+2, "seasonal factor 2017")
            target_sheet.write(0, start_index+3, "predict_2018")
            target_sheet.write(0, start_index+4, "error_2018")
            
            
    # =============================================================================
    #             ## step1: load data
    #             ## 
    # =============================================================================
            file = xlrd.open_workbook(file_path)
            table = file.sheet_by_name(class_name)
            
            for i in range(start_month-1, end_month-1):
                
                temp_2013 = float(table.col(GMV_index)[i+1].value)
                temp_2014 = float(table.col(GMV_index)[i+1+12*1].value)
                temp_2015 = float(table.col(GMV_index)[i+1+12*2].value)
                temp_2016 = float(table.col(GMV_index)[i+1+12*3].value)
                temp_2017 = float(table.col(GMV_index)[i+1+12*4].value)
                
                target_sheet.write(i+1, 1, temp_2013)    ## 2013
                target_sheet.write(i+1, 2, temp_2014)    ## 2014
                target_sheet.write(i+1, 3, temp_2015)    ## 2015
                target_sheet.write(i+1, 4, temp_2016)    ## 2016
                target_sheet.write(i+1, 5, temp_2017)    ## 2017
                
                data_year_2013.append(temp_2013)
                data_year_2014.append(temp_2014)
                data_year_2015.append(temp_2015)
                data_year_2016.append(temp_2016)
                data_year_2017.append(temp_2017)
                
            ## As for 2018, only 5 months until 20180621
            num_month_2018 = 5
            start = 62  ## 指向2018年1月份
            for i in range(num_month_2018):
                target_sheet.write(i+1, 6, float(table.col(GMV_index)[start + i - 1].value))
                data_year_2018.append(float(table.col(GMV_index)[start + i - 1].value))
            
    # =============================================================================
    #             ## step2: calculate Annual trend
    #             ## 
    # =============================================================================
            Annual_trend_2014_2013.clear()
            Annual_trend_2015_2014.clear()
            Annual_trend_2016_2015.clear()
            Annual_trend_2017_2016.clear()
            
            for i in range(len(data_year_2015)):
                
                temp_2014_2013 = (data_year_2014[i] - data_year_2013[i]) / data_year_2013[i]
                temp_2015_2014 = (data_year_2015[i] - data_year_2014[i]) / data_year_2014[i]
                temp_2016_2015 = (data_year_2016[i] - data_year_2015[i]) / data_year_2015[i]
                temp_2017_2016 = (data_year_2017[i] - data_year_2016[i]) / data_year_2016[i]
                
                target_sheet.write(i+1, 7, temp_2014_2013)
                target_sheet.write(i+1, 12, temp_2015_2014)
                target_sheet.write(i+1, 17, temp_2016_2015)
                target_sheet.write(i+1, 22, temp_2017_2016)
                
                Annual_trend_2014_2013.append(temp_2014_2013)
                Annual_trend_2015_2014.append(temp_2015_2014)
                Annual_trend_2016_2015.append(temp_2016_2015)
                Annual_trend_2017_2016.append(temp_2017_2016)
            
            avg_Annual_trend_2014_2013 = sum(Annual_trend_2014_2013)/len(Annual_trend_2014_2013)
            avg_Annual_trend_2015_2014 = sum(Annual_trend_2015_2014)/len(Annual_trend_2015_2014)
            avg_Annual_trend_2016_2015 = sum(Annual_trend_2016_2015)/len(Annual_trend_2016_2015)
            avg_Annual_trend_2017_2016 = sum(Annual_trend_2017_2016)/len(Annual_trend_2017_2016)
            
            
            ## 20180630
            ## 取平均
            monthly_rate_2014_2013 = (avg_Annual_trend_2014_2013/12)+1
            monthly_rate_2015_2014 = (((avg_Annual_trend_2015_2014/12)+1) + monthly_rate_2014_2013)/2
            monthly_rate_2016_2015 = (((avg_Annual_trend_2016_2015/12)+1) + monthly_rate_2015_2014)/2
            monthly_rate_2017_2016 = (((avg_Annual_trend_2017_2016/12)+1) + monthly_rate_2016_2015)/2
#            print(monthly_rate_2016_2015, monthly_rate_2017_2016)
            
            
    # =============================================================================
    #             ## step3: calculate detrend value
    #             ##
    # =============================================================================
            detrend_2014.clear()
            detrend_2015.clear()
            detrend_2016.clear()
            detrend_2017.clear()
            
            for i in range(len(Annual_trend_2016_2015)):
                
                temp_detrend_2014 = data_year_2014[i]*(monthly_rate_2014_2013)**(len(Annual_trend_2014_2013)-i-1)
                temp_detrend_2015 = data_year_2015[i]*(monthly_rate_2015_2014)**(len(Annual_trend_2015_2014)-i-1)
                temp_detrend_2016 = data_year_2016[i]*(monthly_rate_2016_2015)**(len(Annual_trend_2016_2015)-i-1)
                temp_detrend_2017 = data_year_2017[i]*(monthly_rate_2017_2016)**(len(Annual_trend_2017_2016)-i-1)
                
                target_sheet.write(i+1, 8, temp_detrend_2014)
                target_sheet.write(i+1, 13, temp_detrend_2015)
                target_sheet.write(i+1, 18, temp_detrend_2016)
                target_sheet.write(i+1, 23, temp_detrend_2017)
                
                detrend_2014.append(temp_detrend_2014)
                detrend_2015.append(temp_detrend_2015)
                detrend_2016.append(temp_detrend_2016)
                detrend_2017.append(temp_detrend_2017)
                
            avg_detrend_2014 = sum(detrend_2014)/len(Annual_trend_2014_2013)
            
            avg_detrend_2015 = sum(detrend_2015)/len(Annual_trend_2015_2014)
#            print(avg_detrend_2015)
            avg_detrend_2015 = (avg_detrend_2015 + avg_detrend_2014) / 2
#            print(avg_detrend_2015)
            
            avg_detrend_2016 = sum(detrend_2016)/len(Annual_trend_2016_2015)
            avg_detrend_2016 = (avg_detrend_2016 + avg_detrend_2015) / 2
            
            avg_detrend_2017 = sum(detrend_2017)/len(Annual_trend_2017_2016)
            avg_detrend_2017 = (avg_detrend_2017 + avg_detrend_2016) / 2
    #            print(avg_detrend_2016, avg_detrend_2017)
            
    # =============================================================================
    #             ## step4: calculate seasonal factor
    #             ##
    # =============================================================================
            seasonal_factor_2014.clear()
            seasonal_factor_2015.clear()
            seasonal_factor_2016.clear()
            seasonal_factor_2017.clear()
            
            for i in range(len(Annual_trend_2016_2015)):
                
                target_sheet.write(i+1, 9,   detrend_2014[i]/avg_detrend_2014)
                target_sheet.write(i+1, 14,  detrend_2015[i]/avg_detrend_2015)
                target_sheet.write(i+1, 19,  detrend_2016[i]/avg_detrend_2016)
                target_sheet.write(i+1, 24,  detrend_2017[i]/avg_detrend_2017)
                
                seasonal_factor_2014.append(detrend_2014[i]/avg_detrend_2014)
                seasonal_factor_2015.append(detrend_2015[i]/avg_detrend_2015)
                seasonal_factor_2016.append(detrend_2016[i]/avg_detrend_2016)
                seasonal_factor_2017.append(detrend_2017[i]/avg_detrend_2017)
                
                
    # =============================================================================
    #             ## step5: predict
    #             ## 
    # =============================================================================
            predict_2015.clear()
            predict_2016.clear()
            predict_2017.clear()
            predict_2018.clear()
            
            for i in range(len(Annual_trend_2016_2015)):
    #                print("{}*({}^{})*{}".format(
    #                        avg_detrend_2016, monthly_rate_2016_2015, (i+1), seasonal_factor_2016[i]))
                
                temp_predict_2015 = avg_detrend_2014*(monthly_rate_2014_2013**(i+1))*seasonal_factor_2014[i]
                temp_predict_2016 = avg_detrend_2015*(monthly_rate_2015_2014**(i+1))*seasonal_factor_2015[i]
                temp_predict_2017 = avg_detrend_2016*(monthly_rate_2016_2015**(i+1))*seasonal_factor_2016[i]
                temp_predict_2018 = avg_detrend_2017*(monthly_rate_2017_2016**(i+1))*seasonal_factor_2017[i]
                
                target_sheet.write(i+1, 10,  temp_predict_2015)
                target_sheet.write(i+1, 15,  temp_predict_2016)
                target_sheet.write(i+1, 20,  temp_predict_2017)
                target_sheet.write(i+1, 25,  temp_predict_2018)
                
                predict_2015.append(temp_predict_2015)
                predict_2016.append(temp_predict_2016)
                predict_2017.append(temp_predict_2017)
                predict_2018.append(temp_predict_2018)
            
    # =============================================================================
    #             ## step6: calculate error
    #             ##
    # =============================================================================
            error_2015_predict.clear()
            for i in range(len(Annual_trend_2014_2013)):
                target_sheet.write(i+1, 11, (predict_2015[i] - data_year_2015[i]) / data_year_2015[i])
                error_2015_predict.append((predict_2015[i] - data_year_2015[i]) / data_year_2015[i])
            
            error_2016_predict.clear()
            for i in range(len(Annual_trend_2015_2014)):
                target_sheet.write(i+1, 16, (predict_2016[i] - data_year_2016[i]) / data_year_2016[i])
                error_2016_predict.append((predict_2016[i] - data_year_2016[i]) / data_year_2016[i])
                
            error_2017_predict.clear()
            for i in range(len(Annual_trend_2016_2015)):
                target_sheet.write(i+1, 21, (predict_2017[i] - data_year_2017[i]) / data_year_2017[i])
                error_2017_predict.append((predict_2017[i] - data_year_2017[i]) / data_year_2017[i])
            
# =============================================================================
#             error_2018_predict.clear()
#             for i in range(len(data_year_2018)):
#                 target_sheet.write(i+1, 26, (data_year_2018[i] - predict_2018[i]) / data_year_2018[i])
#                 error_2018_predict.append((data_year_2018[i] - predict_2018[i]) / data_year_2018[i])
# =============================================================================
                
            
            # =============================================================================
            # step7: show data     
            # =============================================================================
            data_year_list.append(data_year_2013)
            data_year_list.append(data_year_2014)
            data_year_list.append(data_year_2015)
            data_year_list.append(data_year_2016)
            data_year_list.append(data_year_2017)
            data_year_list.append(data_year_2018)
            
            data_year_list.append(predict_2015)
            data_year_list.append(predict_2016)
            data_year_list.append(predict_2017)
            data_year_list.append(predict_2018)
            
            data_year_list.append(error_2015_predict)
            data_year_list.append(error_2016_predict)
            data_year_list.append(error_2017_predict)
            data_year_list.append(error_2018_predict)
            
            plot_results_multiple(data_year_list, class_name)
            print("Done: {}".format(class_name))
                        
        except Exception as e:
            import traceback
            print("[except] class_name = {}: {}".format(class_name, e))
            print('str(Exception):\t', str(Exception))
            print('str(e):\t\t', str(e))
            print('repr(e):\t', repr(e))
#            print('e.message:\t', e.message)
#            print('traceback.print_exc():', traceback.print_exc())
            print('traceback.format_exc():\n%s' % traceback.format_exc())
            
            
    target_data.save(save_file_name)

if __name__ == '__main__':
    main()

 

Reference

http://swpp.org/winter-2016-ontarget/a-step-by-step-guide-for-creating-monthly-forecasts/

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章