M5 Forecasting - Accuracy：TimeSeries_Seq2seq

來源

https://github.com/JEddy92/TimeSeries_Seq2Seq/blob/master/notebooks/TS_Seq2Seq_Conv_Full_Exog.ipynb

假設

145063個樣本，時間序列2015-2016一共550天。

	Page	2015-07-01	2015-07-02	2015-07-03	2015-07-04	2015-07-05	2015-07-06	2015-07-07	2015-07-08	2015-07-09	…	2016-12-22	2016-12-23	2016-12-24	2016-12-25	2016-12-26	2016-12-27	2016-12-28	2016-12-29	2016-12-30	2016-12-31
0	2NE1_zh.wikipedia.org_all-access_spider	18.0	11.0	5.0	13.0	14.0	9.0	9.0	22.0	26.0	…	32.0	63.0	15.0	26.0	14.0	20.0	22.0	19.0	18.0	20.0
1	2PM_zh.wikipedia.org_all-access_spider	11.0	14.0	15.0	18.0	11.0	13.0	22.0	11.0	10.0	…	17.0	42.0	28.0	15.0	9.0	30.0	52.0	45.0	26.0	20.0

1.時間特徵轉化

找到一些週期性的模式，例如周、月、日、年等。最簡單的就是對時間序列編碼。

編碼：

一共550天，提取周的信息。
進行one-hot，會得到shape爲【550，7】的DF。

	0	1	2	3	4	5	6
0	0	0	1	0	0	0	0
1	0	0	0	1	0	0	0
2	0	0	0	0	1	0	0
…	…	…	…	…	…	…	…

格式化輸入：

keras期望輸入數組（tensors）的shape【n_samples，n_timesteps，n_features】。

含義：

第一個樣本是【550，7】的編碼，第二個樣本跟第一個樣本一模一樣（數值都一樣），因爲時間序列一樣（共享）。即每個樣本共享時間序列。

需要做：

以上基礎編碼。
第0維樣本維度的擴充。
每個樣本都是如此編碼。

# 在第0位置的維度增加了樣本維度，現在是三個維度
dow_array = np.expand_dims(dow_ohe.values, axis=0)
dow_array.shape
### (1, 550, 7)

# 對每一個樣本按照上述方式進行編碼，得到shap爲【n_samples, n_timesteps, n_features】
dow_array = np.tile(dow_array,(df.shape[0],1,1))
dow_array.shape
### (145063, 550, 7)

2.非時間特徵轉化

Page特徵拆分成多個特徵，one-hot轉化

page_df = df['Page'].str.rsplit('_', n=3, expand=True) # split page string and expand to multiple columns 
page_df.columns = ['name','project','access','agent']
page_df.head()

	name	project	access	agent
0	2NE1	zh.wikipedia.org	all-access	spider
…	…	…	…	…

刪除name列，對其餘的拆分特徵列進行one-hot，第0維樣本維度的擴充，每個樣本都是如此編碼。

page_df = page_df.drop('name', axis=1)

page_array = pd.get_dummies(page_df).values
page_array = np.expand_dims(page_array, axis=1) # add timesteps dimension
page_array = np.tile(page_array,(1,dow_array.shape[1],1)) # repeat OHE array along timesteps dimension 
page_array.shape
### (145063, 550, 14)

3.特徵合併

時間特徵與費時間特徵列合併（列增加）

exog_array = np.concatenate([dow_array, page_array], axis=-1)
exog_array.shape
### (145063, 550, 21)

最終，外生（非時間）特徵數據格式變成：

##說明：
##1.同一樣本內部，只是時間編碼特徵的不同
##2.不同樣本同一時間之間，只是拆分編碼特徵的不同
##3.第一維是樣本量，第二維是時間長度，第三維是特徵數量



[   
    # 樣1 ###############################################
    [   
        #2015-07-01各特徵對應的0-1取值
        [1(週一), 0(週二), ..., 1(project_1), ..., 1(access_1), ..., 1(agent_1), ...], 
        #2015-07-02各特徵對應的0-1取值
        [0(週一), 1(週二), ..., 1(project_1), ..., 1(access_1), ..., 1(agent_1), ...], 
        ...
    ],
    
    
    
    # 樣2 ###############################################
    [   
        #2015-07-01各特徵對應的0-1取值
        [1(週一), 0(週二), ..., 0(project_1), ..., 1(access_1), ..., 0(agent_1), ...], 
        #2015-07-02各特徵對應的0-1取值
        [0(週一), 1(週二), ..., 0(project_1), ..., 1(access_1), ..., 0(agent_1), ...], 
        ...
    ],
    
    
    
    ...
]

4.規範化模型數據

讓我們把外源特徵數組和內源時間序列數據結合起來，爲模型訓練和預測做準備。

可悲的是，我們不能直接將創建的時間序列數據框和外生數組扔到keras中，讓它發揮它的魔力。相反，我們必須再設置幾個數據轉換步驟來提取準確的numpy數組，然後再傳給keras。但即使在這之前，我們還必須知道如何將時間序列適當地劃分爲編碼和預測區間，以達到訓練和驗證的目的。請注意，對於我們的簡單卷積模型，我們不會像這個repo中的第一個筆記本那樣使用編碼器-解碼器架構，但我們將保持 "編碼 "和 “解碼”（預測）的術語一致—在這種情況下，編碼區間代表整個序列的歷史，我們將用於網絡的特徵學習，但不會輸出任何預測。

我們將使用一種前移式驗證，即我們的驗證集與訓練集的時間範圍相同，但在時間上進行了前移（在本例中是60天）。這種方式，我們模擬了我們的模型在未來未見的數據上的表現。

Artur Suilin創建了一個非常漂亮的圖片，將這種驗證風格可視化，並與傳統驗證進行了對比。我強烈推薦你去看看他的整個repo，因爲他在這個數據集上實現了一個真正的最先進的（並且在比賽中獲勝的）seq2seq模型。

4.1 訓練與驗證時間劃分：

需要建立4個細分的數據。

訓練編碼期
訓練解碼期(訓練目標，60天)
驗證編碼期
驗證解碼期(驗證目標，60天)

新方法與傳統方法：

新方法：訓練集與驗證集的預測天數一致（都是A天），只不錯驗證集相對訓練集開始的時間要晚A天。
傳統方法：將樣本劃分成訓練與預測，訓練與驗證的時間窗口的起始與結束時間一致。


from datetime import timedelta

pred_steps = 60 
pred_length=timedelta(pred_steps)

first_day = pd.to_datetime(data_start_date) 
last_day = pd.to_datetime(data_end_date)

val_pred_start = last_day - pred_length + timedelta(1)
val_pred_end = last_day

train_pred_start = val_pred_start - pred_length
train_pred_end = val_pred_start - timedelta(days=1)


enc_length = train_pred_start - first_day

train_enc_start = first_day
train_enc_end = train_enc_start + enc_length - timedelta(1)

val_enc_start = train_enc_start + pred_length
val_enc_end = val_enc_start + enc_length - timedelta(1)

print('Train encoding:', train_enc_start, '-', train_enc_end)
print('Train prediction:', train_pred_start, '-', train_pred_end, '\n')
print('Val encoding:', val_enc_start, '-', val_enc_end)
print('Val prediction:', val_pred_start, '-', val_pred_end)

print('\nEncoding interval:', enc_length.days)
print('Prediction interval:', pred_length.days)

### Train encoding: 2015-07-01 00:00:00 - 2016-09-02 00:00:00
### Train prediction: 2016-09-03 00:00:00 - 2016-11-01 00:00:00 
### 
### Val encoding: 2015-08-30 00:00:00 - 2016-11-01 00:00:00
### Val prediction: 2016-11-02 00:00:00 - 2016-12-31 00:00:00
### 
### Encoding interval: 430
### Prediction interval: 60

4.2 Keras數據格式化

上面有了日期劃分，下面定義Keras要輸入的數據：

將時間序列拉到一個數組中，保存一個date_to_index映射，作爲引用到數組中的實用程序。
創建函數，從所有系列中提取指定的時間間隔。
創建函數來變換所有的序列。
- 在這裏，我們通過對log1p進行平滑化，並使用編碼器序列平均值對每個序列進行去均值化，然後重塑成keras期望的【n_series（一批的樣本量）, n_timesteps（時間長度）, n_features（特徵數量）】張量格式。
- 注意，如果我們想生成真實的預測，需要將預測值反向變換即可。
利用之前的函數，創建最終函數來提取完整的編碼和目標數組。
- 這將作爲一個一次性的函數來抓取我們需要訓練或預測的東西。
- 它將提取（轉換的）內生序列（時間序列特徵）數據，並將其與我們的外生特徵（非時間特徵）相結合。

下面的第一個代碼塊完成了前3個步驟，與本系列早期的筆記本沒有變化。


date_to_index = pd.Series(index=pd.Index([pd.to_datetime(c) for c in df.columns[1:]]),
                          data=[i for i in range(len(df.columns[1:]))])

series_array = df[df.columns[1:]].values



def get_time_block_series(series_array, date_to_index, start_date, end_date):
    """
    選取所有樣本指定時間區內的時間序列數據Y，如銷量。
    """
    inds = date_to_index[start_date:end_date]
    return series_array[:,inds]



def transform_series_encode(series_array):
    """
    功能：製造輸入Keras的數據格式，用於訓練。
    內容：時間序列特徵數據，去中心化，平滑，reshape爲【n_series,  n_timesteps,  n_features=1】
    參數：
    	series_array：要訓練的數據，待編碼
    步驟：
    	1.將時間序列銷量數組【樣本數，時間長度】中每個樣本按對應均值去中心化
    	2.同樣的方法進行平滑處理
    	3.二維數組reshape【樣本數，時間長度，1】
    """
    # nan_to_num：使用0代替數組x中的nan元素，使用有限的數字代替inf元素
    # log1p：log(x+1)
    series_array = np.log1p(np.nan_to_num(series_array)) # filling NaN with 0，平滑序列
    series_mean = series_array.mean(axis=1).reshape(-1,1) # 每個樣本求均值，reshape成一列
    # 利用廣播將series_array中每個樣本序列減去其均值
    series_array = series_array - series_mean 
    # 二位series_array【樣本數，時間長度】重新reshape爲【樣本數，時間長度，1】
    series_array = series_array.reshape((series_array.shape[0],series_array.shape[1], 1))
    
    return series_array, series_mean



def transform_series_decode(series_array, encode_series_mean):
    """
    功能：製造輸入Keras的數據格式，用於預測。
    內容：與transform_series_encode()函數一樣。
    參數：
    	series_array：要測試的數據，按照訓練的編碼方式去做，這裏換個名字，叫解碼序列。
    需要注意：
    	既然是預測，那麼預測部分的樣本是無法求均值的，此時，用訓練（encode）部分樣本均值代替。
    """
    # nan_to_num：使用0代替數組x中的nan元素，使用有限的數字代替inf元素
    # log1p：log(x+1)    
    series_array = np.log1p(np.nan_to_num(series_array)) # filling NaN with 0，平滑序列
    series_array = series_array - encode_series_mean
    series_array = series_array.reshape((series_array.shape[0],series_array.shape[1], 1))
    
    return series_array

最終，內生（時間序列）特徵得到的數據格式：

[   
    # 樣1 ###############################################
    [
        [ 時間1的銷量 ],
        [ 時間2的銷量 ],
        ...
    ],
    
    # 樣2 ###############################################
    [
        [ 時間1的銷量 ],
        [ 時間2的銷量 ],
        ...
    ],
    
    
    ...

]

現在，我們可以利用上面建立的前3個處理步驟來創建一個一次性預處理函數，用於提取編碼器/輸入數據（附加正確的外源特徵）和解碼器/目標數據。我們將包括參數，讓我們選擇要提取的時間序列樣本數量和從哪個時期採樣。寫好了這個函數，我們就可以建立模型了!


def get_data_encode_decode(series_array, exog_array, first_n_samples,
                           date_to_index, enc_start, enc_end, pred_start, pred_end):
	"""
	參數：
		series_array: 時間序列特徵列，shape【n_series,  n_timesteps,  n_features=1】
		exog_array: 非時間序列特徵列，shape【n_series,  n_timesteps,  n_features】
		first_n_samples: 前N個樣本
		date_to_index: 日期與index對應數組
		enc_start: 編碼開始時間
		enc_end: 編碼結束時間
		pred_start: 預測開始時間
		pred_end: 預測結束時間
	
	"""
    # 找到編碼開始到預測結束這段時間內的開始與結束日期對應的index
    exog_inds = date_to_index[enc_start:pred_end]
    
    
    ############## Encode部分：規範化輸入Keras的訓練數據集 ############## 
    # 選取前first_n_samples個樣本在【enc_start, to enc_end】時間區內的時間序列特徵數據
    encoder_input_data = get_time_block_series(series_array, date_to_index, 
                                               enc_start, enc_end)[:first_n_samples]
    # 將上面選取的時間序列特徵數據進行去中心化，平滑，
    # 並reshape爲【n_series,  n_timesteps,  n_features=1】
    encoder_input_data, encode_series_mean = transform_series_encode(encoder_input_data)
  

    ############## Decode部分：規範化輸入Keras的預測數據集 ############## 
    # 選取前first_n_samples個樣本在【pred_start, to pred_end】時間區內的時間序列特徵數據
    decoder_target_data = get_time_block_series(series_array, date_to_index, 
                                                pred_start, pred_end)[:first_n_samples]
    # 將上面選取的時間序列特徵數據進行去中心化，平滑，
    # 並reshape爲【n_series,  n_timesteps,  n_features=1】
    decoder_target_data = transform_series_decode(decoder_target_data,
                                                  encode_series_mean)
    
    
    ############## Encode數據與Decode數據拼接 ############## 
    # we append a lagged history of the target series to the input data, 
    # so that we can train with teacher forcing
    # 訓練數據與丟棄最後一天的預測數據拼接，構成訓練數據-》預測數據（訓練+預測的合併是按照時間順序排的）
    # 時間上丟棄了考慮區間的最後一天
    lagged_target_history = decoder_target_data[:,:-1,:1]
    encoder_input_data = np.concatenate([encoder_input_data, lagged_target_history],
                                        axis=1)
    
    
    ############## Encode、Decode拼接的數據與非時間序列的外源特徵數據拼接  ############## 
    # we add the exogenous features corresponding to day after input series
    # values to the input data (exog should match day we are predicting)
    # 將輸入序列值後的天數對應的外生特徵即費非時間特徵（exog應該與我們預測的天數相匹配）與內生特徵即時間特徵進行合併，從而構成了Keras的輸入數據。
    # 時間上丟棄了考慮區間的第一天
    exog_input_data = exog_array[:first_n_samples,exog_inds,:][:,1:,:]
    encoder_input_data = np.concatenate([encoder_input_data, exog_input_data], axis=-1)
    
    return encoder_input_data, decoder_target_data

在圖中，想象一下， $y_0,...,y_7$ 分別是跟隨序列值 $x_0,...,x_7$ 的時間步驟的預測輸出。有一個明顯的問題–既然 $x_1$ 影響輸出 $y_0$ ，那麼我們就會用未來來預測過去，這就是作弊! 讓一個序列的未來影響我們對其過去的解釋，這在文本分類這樣的背景下是有意義的，因爲我們使用已知的序列來預測一個結果，但在我們的時間序列背景下，我們必須在一個序列中產生未來值，而在時間序列背景下就不一樣了。

爲了解決這個問題，我們調整了我們的卷積設計，明確禁止未來影響過去。換句話說，我們只允許輸入連接到未來時間步的輸出的因果結構中，如下圖所示，在WaveNet論文中的一個可視化的圖示。在實踐中，這種因果1D結構很容易實現，通過將傳統的卷積輸出按時間步數移位來實現。Keras通過設置padding = 'causal’來處理。

擴張(因果關係)轉折
通過因果卷積，我們有了處理時間流的適當工具，但我們需要額外的修改來正確處理長期依賴關係。在上面的簡單因果卷積圖中，你可以看到只有最近的5個時間段可以影響到高亮的輸出。事實上，我們需要每個時間段增加一層，以達到更遠的時間序列（使用適當的術語，增加輸出的接受場）。對於一個持續時間超過一年的時間序列來說，使用簡單的因果變換來學習整個歷史，很快就會使我們的模型在計算和統計上過於複雜。

WaveNet沒有犯這樣的錯誤，而是使用了擴張的卷積，它允許接受場作爲卷積層數量的函數呈指數級增長。在擴張卷積層中，濾波器不是以簡單的順序方式應用於輸入，而是在它們處理的每一個輸入之間跳過一個恆定的擴張率輸入，如下WaveNet圖所示。通過在每個層上以倍數遞增的方式（例如1，2，4，8，8，…）增加擴張率，我們可以實現我們所希望的層數和接收場大小之間的指數關係。在圖中，你可以看到我們現在只需要4層就可以將16個輸入序列的值全部連接到高亮的輸出（比如說第17個時間步長值）。

M5 Forecasting - Accuracy：TimeSeries_Seq2seq

來源

假設

1.時間特徵轉化

編碼：

格式化輸入：

2.非時間特徵轉化

3.特徵合併

4.規範化模型數據

4.1 訓練與驗證時間劃分：

4.2 Keras數據格式化

35K*14 薪，入職了！這公司只要不裁員，我能一直呆下去！

M5 Forecasting - Accuracy：Description

M5 Forecasting - Accuracy：EDA

M5 Forecasting - Accuracy：TimeSeries_Seq2seq

互聯網用戶發起一次請求經歷了哪些過程？

Leetcode 199. 二叉樹的右視圖

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結