計算廣告庫存預測在FreeWheel的演進

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了更好的指導客戶制定廣告銷售策略,需要精準預測客戶未來一段時間內的可用庫存量。在這樣的背景下,FreeWheel機器學習團隊需要構建模型以達到精準預測的目的。本文主要介紹了我們是如何構建模型解決業務問題,以及如何對算法模型進行迭代以提升預測效果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"業務需求與挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"庫存預測的業務需求是按照客戶(network id)、站點(site id)等維度,按照天級別,預測未來500天內每天的可用庫存量(也即視頻觀看流量,以下統稱爲流量)。由於流量爲實數,因此所定義問題是一個迴歸問題。雖然需要預測未來500天每天的流量,但是業務對於這500天內不同時間段的預測精度要求是不同的。從最近的時間範圍到越來越遠的時間範圍,預測的精度要求是逐級降低的。比如,客戶對於最近7天內的精度要求是最高的,往後依次下降。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了簡化工程與模型調優成本,兼顧預測性能,決定採用訓練一個統一模型的方式,來處理對所有客戶站點流量的預測問題。這樣,就需要模型能夠同時學習到各個網絡站點的不同流量模式。對於消費者來說,相對於傳統電視的提前編排和有限選擇,網絡數字媒體具有海量的視頻資源與靈活的選擇自主性。因此,數字視頻更容易發生各種突發熱點流量。比如娛樂圈盛產的各種瓜,往往就會帶來喫瓜羣衆的高峯訪問,而這種訪問高峯往往在不長的時間內趨於平靜。也就是說數字視頻的流量模式更加多樣化且難以捕捉。而在我們的場景中,需要用一個模型來捕捉所有客戶站點的流量模式,乃是一大挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖所示,我隨機選擇了四個站點,並把它們最近幾個月內的日流量通過折線圖的方式展現。可以直觀的看到,它們的日流量往往在某一天開始,有幾倍的突增或者突降,並沒有明顯的趨勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/c6\/65\/c6afb8a8d8055a54aa898ffb2e51a865.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,由於不同的客戶站點所提供的視頻資源類型不同,對應的視頻資源觀看者數量也不盡相同,導致不同站點下所對應的流量相差非常大。比如有些站點的日均流量在千萬級別,而有些站點日均只有幾百個廣告請求。下表是在某一天中,統計的站點流量分佈。可見,在這六千多個站點中,流量的差距非常大。最小的量級是個位數,同時最大的量級達到了億級別,標準差達到了百萬級別。並且,大量的站點擁有較小的流量,僅僅10%的站點就貢獻了96.7%的流量,也就是說剩下90%的站點只貢獻了不到4%的流量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ea\/24\/ea0f9105e85398a93e06aee021ce0924.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖展現了在一天中,站點流量的直方圖。橫座標代表站點在一天中的流量,由於不同站點流量相差很大,無法在一張圖中進行合適的展現,因此統一對橫座標進行了log處理。豎座標是各個桶含有的站點數量。在這張圖中,將近六千多個站點按照流量大小分配到了一百個桶中。通過這張圖可以更加直觀的看到,站點之間流量的巨大差異。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/15\/32\/153994e9541ca1be5d536873d7ff8d32.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這個背景下,對於流量大的站點來說,模型預測的一點點波動就能帶來非常大的偏差,而對於小流量站點的偏差則在整體偏差中沒那麼顯著。相對於小流量站點,需要模型更加偏向於捕捉大流量站點的流量樣式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上所述,對於本問題來說具有以下挑戰:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1. 每天需要預測各個站點未來500天的流量,預測時間跨度較長;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2. 統一模型需要處理全部客戶站點的預測任務;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3. 數字視頻流量模式具有多樣性與突發性;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4. 不同站點之間的流量模式不同,且流量數量級相差較大,模型需要更加偏向於大流量站點。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"評價指標定義"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在明確了業務需求之後,需要明確評價指標,以能夠對模型性能進行量化評估。使用MAPE(Mean Absolute Percentage Error)進行評價具有易對比性、易解釋性等特點,因此決定採用MAPE對模型效果進行評價。MAPE即每條樣本預測值與真實值的偏差與真實值之比的平均值,公式如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,是站點的實際流量,是站點的預測流量,是一個較小的正數值(0.01),以應對爲0時的情況,是總共的站點數量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由以上公式可以看出,對於真實流量越小的站點,對預測誤差的忍受度越低,較小的預測誤差就能帶來較大的MAPE值。同時,在我們的場景中,小流量站點佔據了絕大部分,導致大量的小站點對最終整體的MAPE值具有較大的影響。但是,這些小站點對於整體流量的貢獻較低。之前說過,在我們的場景中僅僅10%的站點貢獻了96.7%的流量,而我們更關注於這些10%的大站點的預測精度。所以,直接用原始MAPE定義來評價模型效果,無法準確反映業務需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對這樣的場景和優化目標,我們設計了一種新的計算MAPE方式,來作爲我們的評價指標:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"§ 步驟1:分桶。在每個客戶下,按照流量從小到大將站點分到四個不同的桶裏;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"§ 步驟2:桶內計算。計算每個桶內的MAPE值和桶內流量佔對應客戶總流量的佔比,其中流量佔比作爲桶的權重;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"§ 步驟3:計算客戶MAPE。每個客戶下桶的MAPE值的加權和,即彙總每個桶的流量佔比乘以桶的MAPE值之和,作爲客戶的MAPE;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"§ 步驟4:計算總體MAPE。彙總每個客戶MAPE值乘以其流量佔比之和,作爲總體MAPE。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/34\/dc\/3497cf066faaee29aa72b34b5166a9dc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過以上定義的評價指標,我們可以拿到每個客戶的MAPE,又可以拿到整體的評價指標。有了每個客戶的MAPE,就可以針對每個客戶溝通其預測準確度。並且,在把預測結果發佈給其它下游模塊時,下游可以按照不同客戶的預測精度有不同的決策方式。由於我們的目標是通過一個模型來實現對所有用戶的流量預測。因此,整體MAPE用來橫向對比各個不同模型的預測表現,作爲選擇模型的依據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"至此,模型的評價指標就定義清楚了,之後就需要調研各類模型並進行實驗分析了。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前期調研"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本問題是一個典型的時間序列預測問題,即根據每個站點歷史上每天的流量信息預測未來一段時間內每天的流量。通常,用來解決這類問題的方法有經典的ARIMA(Autoregressive Integrated Moving Average)、Prophet,也有深度神經網絡的RNN、DNN等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前期調研了ARIMA和Prophet,並分別使用這兩種方法進行了測試實驗。ARIMA和Prophet以及其它類似的方法,都需要針對每一個時間序列進行調參、訓練。在利用我們的數據進行測試之後,發現對於不同站點的流量曲線,模型能夠獲得最佳預測效果的超參數都是不同的,需要爲每個站點的流量曲線人工尋找最佳超參數。而到目前爲止,我們擁有幾千個站點。如果採用這種方式去爲每個站點設計、訓練一個模型,那麼我們的工作量是極其繁重的,也是不可行的。因此,我們放棄了使用ARIMA和Prophet這類模型進行全量客戶站點的流量預測。不過,可以考慮把ARIMA或者Prophet的預測能力開放給客戶,讓客戶針對其站點特點進行調參,從而實現對自身流量的預測任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下是採用Prophet擬合一個站點流量曲線的圖例。由於篇幅有限,本文不對ARIMA和Prophet做過多介紹,如感興趣建議訪問:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ARIMA models for time series forecasting:"},{"type":"link","attrs":{"href":"https:\/\/people.duke.edu\/~rnau\/411arim.htm?fileGuid=gCJv9qJVjhpVK6g3","title":"","type":null},"content":[{"type":"text","text":"https:\/\/people.duke.edu\/~rnau\/411arim.htm"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Facebook prophet:"},{"type":"link","attrs":{"href":"https:\/\/facebook.github.io\/prophet\/?fileGuid=gCJv9qJVjhpVK6g3","title":"","type":null},"content":[{"type":"text","text":"https:\/\/facebook.github.io\/prophet\/"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ac\/69\/ac0cbcf0c0406a36772c6c7419c72169.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"Prophet訓練數據擬合圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/2b\/63\/2b2e2ebcd16009d3a14e753e13c28563.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"Prophet時間序列分解圖(分解爲趨勢、大事件、小週期、大週期)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於以上原因,後續算法調研集中放在了深度模型,並基於深度模型進行了廣泛實驗,包括DeepAR、DeepGLO等。根據實驗效果,最終選擇了採用CNN、RNN、DNN相結合的預測模型。在介紹具體模型架構之前,先介紹訓練樣本是如何生成的,以及整體的預測流程。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"特徵工程與預測流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖以站點a爲例介紹特徵工程與整個預測流程。圖中抽取事實數據的時間段是從2020-06-10到2020-09-08。假設2020-09-08爲最近一天的事實數據,需要預測未來七天每天的流量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/8f\/ca\/8f2ddc8825e04126bf0e05d32e08fcca.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先按照天爲時間粒度,將歷史上各個站點流量進行匯聚。並按照一定時間窗口大小滑動生成訓練樣本、驗證樣本、推理樣本(統稱樣本),這些樣本含有X_list與Y_list字段。|"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中X_list是回顧的歷史流量列表,每個元素代表對應日期的流量值,X_list的元素個數爲回顧的天數。X_list的元素按照時間降序排列。X_list是每個樣本最重要的時間序列特徵。除了時間序列特徵之外,還添加了另外兩類特徵:ID類特徵,時間類特徵。其中ID類特徵包括:Network ID(客戶ID)、Site ID(站點ID)。時間類特徵包括:Day of week(周幾)、Is weekend(是否是週末)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Y_list爲預測的流量列表,每個元素代表對應日期的預測流量值,Y_list的元素個數爲預測的天數,是每個樣本的標籤。在訓練階段中,模型通過擬合Y_list來收斂。在預測推理階段,模型根據推理集中每個樣本的特徵值,預測出其Y_list。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在原始問題中,需要預測未來500天的流量。如果要一次性預測這麼長的時間,Y_list的長度就要爲500,對應X_list中的最後一天也是500天之前的值。這樣就會帶來三個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 最近的時間序列只能出現在Y_list中,無法出現在X_list中,因此模型無法學習到最近的時間序列特徵;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 爲了準備充足的訓練樣本,就需要更大時間範圍的歷史值。比如對於一個站點來說,採用回顧500天預測500天的策略,那麼跨度一千天,才僅僅能夠產生出一條樣本;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 時間序列過長,模型無法收斂,更無法對最近時間範圍內的流量進行準確預估。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於直接預測500天有上述的問題,而客戶對於最近7天內的精度要求是最高的,往後依次下降。並且,之前的預測策略是採用四周中位方法進行流量預測。所以,決定採用直接預測未來7天的方式進行模型的訓練與預測。一是能夠保證最近時間的預測精度,又能方便直接與四周中位進行橫向對比。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在預測未來7天的流量時,即Y_list的長度爲7時,經過我們對不同X_list長度進行實驗驗證發現,在X_list長度爲14時能夠獲得相對較好的預測效果。因此,將X_list長度設置爲14,Y_list長度爲7,同時增加訓練集的時間跨度,以讓模型學習到更多歷史上的流量特點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,爲了達到預測500天的目的,我們採用將預測結果組合成特徵再次輸入模型,以此方式循環往復以獲得500天的預測結果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"基準模型"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/8f\/f9\/8f1818b88b66cb06befa143dc1d596f9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖爲基準模型的網絡架構圖。在基準模型中,我們從X_list特徵中抽取兩種特徵:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 由於X_list中含有兩週的流量數據,使用一維空洞卷積抽取出兩週流量的加權和,比如第一週的週一與第二週的週一加權和生成一個元素,生成元素個數爲7的Dense。其每週的權重可通過模型進行擬合;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 通過兩層全連接神經網絡抽取X_list的時序特徵。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於ID類特徵與時間類特徵都是類別特徵,分別對其做Embedding,對類別特徵進行向量化處理,讓模型學習到每個客戶和站點的特點。並在Embedding上加兩層全連接神經網絡,以實現這些類別特徵之間的特徵交叉,作爲ID屬性特徵。ID屬性特徵與時序特徵又同時輸入到兩層的全連接層,以抽取出未來7天每天的殘差。最後將殘差與一維空洞卷積抽取出的兩週的加權和相加,作爲最終的模型輸出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在相加之前,輸出殘差的激活函數爲PReLU(Parameter ReLU),其中的Parameter就是其在負值域的斜率,在此使用PReLU是爲了能夠輸出負值,這樣殘差纔能有正有負,不僅能讓模型擬合流量上升的趨勢也能擬合流量下降。但是,用PReLU去控制整個數據空間流量的上升還是下降趨勢是不合適的。在以下的優化模型二中會詳細介紹這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基準模型的超參數爲:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"§ Loss Function:MAE;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"§ Learning Rate:0.0001;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"§ Optimizer:Adam;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"§ Batch size:2048;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"§ Epoch:3。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了防止模型的過擬合,對一些層的參數做了正則化處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本基準模型中,並未對X_list中的流量值做標準化處理。其原因是,我們需要模型更加偏向於擬合大流量站點的流量特點。也即想讓大流量站點的樣本,在訓練中擁有更大的權重。當X_list中的流量值比較大時,模型中的參數在做梯度更新的幅度就會更加大。以達到讓大流量樣本具有更大權重的目的。但是,這種增大梯度更新的策略是使用更大的前向傳播(Forward propagation)值帶來的。但是對於ID類和時間類特徵,其並沒有較大的前向傳播值,因此這部分參數無法達到照顧大流量樣本的目的。如下圖所示,紅框部分的參數爲梯度下降不足的部分。爲解決這個問題,我們設計了優化模型一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/0e\/af\/0ed732dcyy19eabc6497cc0343bc95af.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"優化模型一"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決基準模型中ID類和時間類特徵相關部分梯度下降不足的問題。設計出了以下的網絡架構。此架構與基準模型只有兩處不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ab\/2d\/aba4701ac06f6a09af87e9a06270952d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,輸入特徵X_list需要做Zscore標準化處理。注意,這裏的Zscore標準化處理是按照站點粒度在訓練集與驗證集抽取的時間範圍內進行的標準化處理。對應於特徵工程與流程部分的那張圖中,即是獲取站點a在2020-06-10到2020-09-08這個時間範圍內的均值和標準差,並讓站點a每天的流量值減去均值除以標準差。經過Zscore處理後,每個站點的流量值都是均值爲0,標準差爲1。每個站點之間的流量並沒有量綱的差距。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,把均值和標準差作爲特徵輸入到模型。在最後模型的輸出部分,通過樣本的均值和標準差做反標準化處理,反標準化處理後的值作爲最後的結果輸出。因此,訓練集和驗證集中的Y_list不需要做Zscore處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過以上兩步,就能實現整個模型範圍內的參數,能更加均勻的更新權重。同時又能“照顧“標準差比較大的樣本(大流量樣本的標準差往往也相對大)。而能夠獲得這樣效果的原因是,具有更大標準差的樣本在反向傳播(Back propagation)中能夠獲取相對更大的梯度下降。與基準模型相比,都是爲了讓模型更加擬合大流量的樣本,但實現的方式卻不同。基準模型是通過X_list帶來更大的前向傳播來實現,但同時會弱化ID類特徵和時間類特徵的作用。優化模型一是通過Y_list帶來更大的反向傳播來實現,標準化處理後的X_list與ID類特徵和時間類特徵就不會顧此失彼,因此能夠獲得更好的實驗效果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"優化模型二"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之前介紹的基準模型和優化模型一,有一個共同的特點,就是通過擬合過去兩週權重和的殘差。擬合殘差方法的主要問題在於輸出殘差的激活函數PReLU。PReLU是爲了防止在神經網絡中神經元死亡而被設計的。PReLU在負值域的斜率雖然可以通過反向傳播進行更新,但在我們的使用場景中讓PReLU去控制殘差的增減就會遇到下圖描述的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的訓練樣本是通過滑動窗口滑動獲得的,那麼X_list和Y_list每個元素代表的周幾是不一定的。比如下圖所示,第一個樣本Y_list的第一個元素代表週一,第二個樣本第一個元素就代表週二了,以此類推。而週一和週二的流量是上升還是下降的趨勢是不同的,並且不同站點之間的流量上升還是下降趨勢也是不同的。因此,在一個模型中僅僅用PReLU的Parameter去控制整個數據空間的流量上升還是下降趨勢,是不足的也是不合適的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/c6\/d1\/c6fe6d1138abf59d10f2627319d677d1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過擬合殘差方式的好處是,能夠有效降低模型空間的複雜度,能讓模型更易收斂。在拋棄殘差方式之後,模型空間變得更大,模型能夠擬合更加複雜的情形,但模型也會變得更加難以收斂。爲了增強模型擬合能力,更好的抽取時序性特徵,引入了RNN網絡。並最終將抽取出的時間序列交叉特徵與類別交叉特徵輸入到兩層全連接神經網絡,以直接預測出七天流量。如下圖所示,爲優化模型二的模型結構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/ab\/2d\/aba4701ac06f6a09af87e9a06270952d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下表展示了以上三個模型和四周中位在測試集上的每天表現和總體表現,其中下劃線標註的數值代表橫向對比最優的值,也即最小值。可以看到優化模型二獲得了總體最好的表現,比基準模型有2.96%的提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/resource\/image\/75\/16\/7528bb45c31888c46bde77d5781e7416.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結與後續方向"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文總體介紹了我們在解決庫存量預測問題上的一些思考與嘗試。從基本的問題定義出發,根據業務需求特點制定合理的評價指標。基於評價指標構建預測模型,依據實驗效果進行模型的迭代升級。到目前爲止,模型已經部署到線上生產環境,併爲下游持續提供庫存量預測能力。每天依據歷史流量預測未來流量,支撐公司內部其它各個模塊,爲客戶能夠及時調整其廣告銷售策略服務。在模型演進方面,後續可以引入更多與假期、大事件等相關特徵,以便模型能夠更好的應對突發流量的預測問題。同時,如何更好的獲取流量的長週期特點,也是後續需要研究的課題。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"作者簡介:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"王振亞,FreeWheel機器學習團隊高級工程師,從事計算廣告業務中機器學習的算法調研與模型開發工作,並且熱愛大數據處理分析技術。通過算法與工程的結合,讓機器學習在計算廣告領域有更多的落地。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章