問題
RNN在迭代運用狀態轉換操作“輸入到隱狀態”實現任意長序列的定長表示時,會遭遇到“對隱狀態擾動過於敏感”的困境。
dropout
dropout的數學形式化:
y=f(W⋅d(x)) , 其中d(x)={mask∗x, if train phaseing(1−p)x,otherwise
p 爲dropout率,mask爲以1-p爲概率的貝努力分佈生成的二值向量
rnn dropout
改變傳統做法“在每個時間步採用不同的mask對隱節點進行屏蔽”,提出新的策略(如下圖所示),其特點是:1)generates the dropout mask only at the beginning of each training sequence and fixes it through the sequence;2)dropping both the non-recurrent and recurrent connections。
Moon T, Choi H, Lee H, et al. RNNDROP: A novel dropout for RNNS in ASR[C]// Automatic Speech Recognition and Understanding. IEEE, 2016:65-70.
recurrent dropout
- 思想:通過dropout LSTM/GRU中的input或update門以prevents the loss of long-term memories built up in the states/cells 。
簡單RNN及其dropout:
RNN:
dropout:
LSTM:
GRU:
從理論上講, masks can be applied to any subset of the gates, cells, and states.
文獻:Semeniuta S, Severyn A, Barth E. Recurrent Dropout without Memory Loss[J]. 2016.
垂直連接的dropout
針對多層LSTM網絡,對其垂直連接進行隨機dropout, 也即是否允許
圖中虛線是進行隨機dropout的操作對象。
dropout操作後的信息流
文獻: Zaremba W, Sutskever I, Vinyals O. Recurrent Neural Network Regularization[C]. ICLR 2015.
源碼:https://github.com/wojzaremba/lstm .
基於變分推理的dropout
圖中虛線代表不進行dropout,而不同顏色的實線表示不同的mask。
傳統dropout rnn: use different masks at different time steps
基於變分推理的dropout: uses the same dropout mask at each time step, including the recurrent layers
基於變分推理的dropout的具體實現(上圖(b)的實線顏色可知):爲每個連接矩陣一次性生成貝努力隨機變量的mask,然後在後續的每個時間點上都採用相同的mask.
文獻:Gal Y. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks[J]. Statistics, 2015:285-290.
源碼: http://yarin.co/BRNN
Zoneout
ct=dct⊙ct−1+(1−dct)⊙ft⊙ct−1+it⊙gt ht=dht⊙ht−1+(1−dht)⊙ot⊙tanh(ct−1⊙ft+it⊙gt)
其中dht 爲0與1的二值隨機向量。
文獻:Krueger D, Maharaj T, Kramár J, et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations[C]. ICLR 2017
源碼:http://github.com/teganmaharaj/zoneout