RNN 訓練過程中tricks

1. 初始化

正交初始化比全零初始化效果好。
下面初始化一個雙向RNN:

    lstm_fw_cell = tf.nn.rnn_cell.LSTMCell(num_units=nhidden, forget_bias=1.0, initializer=tf.orthogonal_initializer())
    lstm_bw_cell = tf.nn.rnn_cell.LSTMCell(num_units=nhidden, forget_bias=1.0, initializer=tf.orthogonal_initializer())
    hiddens, state = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell, lstm_bw_cell, x, dtype=tf.float32)

知乎 YJango的回答
https://zhuanlan.zhihu.com/p/28981495)?

知乎 無非的回答
orthogonal初始化+prelu/elu activation 配合使用
Explaining and illustrating orthogonal initialization for recurrent neural networks

2. dropout

dropout作用在cell與cell之間傳遞;
中間不進行memory的dropout,僅在不同層cell之間傳遞時候dropout;

# **步驟1:RNN 的輸入shape = (batch_size, timestep_size, input_size) 
X = tf.reshape(_X, [-1, 28, 28])
# **步驟2:定義一層 LSTM_cell,只需要說明 hidden_size, 它會自動匹配輸入的 X 的維度
lstm_cell = rnn.BasicLSTMCell(num_units=hidden_size, forget_bias=1.0, state_is_tuple=True)
# **步驟3:添加 dropout layer, 一般只設置 output_keep_prob
lstm_cell = rnn.DropoutWrapper(cell=lstm_cell, input_keep_prob=1.0, output_keep_prob=keep_prob)
# **步驟4:調用 MultiRNNCell 來實現多層 LSTM
mlstm_cell = rnn.MultiRNNCell([lstm_cell] * layer_num, state_is_tuple=True)

記得在訓練的時候將keep_prob設爲需要的值;
測試時將其設置爲1.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章