TensorFlow實戰：LSTM的結構與cell中的參數

一些參數

訓練的話一般一批一批訓練，即讓batch_size 個句子同時訓練；
每個句子的單詞個數爲num_steps，由於句子長度就是時間長度，因此用num_steps代表句子長度。
在NLP問題中，我們用詞向量表示一個單詞（一個數基本不能表示一個詞，大家應該都知道的吧，可以去了解下詞向量），我們設定詞向量的長度爲wordvec_size。
LSTM結構中是一個神經網絡，即下圖的結構就是一個LSTM單元，裏面的每個黃框是一個神經網絡，這個網絡的隱藏單元個數我們設爲hidden_size，那麼這個LSTM單元裏就有4*hidden_size個隱藏單元。
每個LSTM輸出的都是向量，包括 $C_{t}$ 和 $h_{t}$ ，它們的長度都是當前LSTM單元的hidden_size（後面會解釋到）。
語料庫中單詞的個數是vocab_size

單層LSTM

我們結合具體代碼來講，以下是一個單層的LSTM的最基本結構

cell = tf.contrib.rnn.LSTMBlockCell(hidden_size, forget_bias=0.0)	
outputs = []
state = self._initial_state # state
with tf.variable_scope("RNN"):
    for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
            # cell_output: [batch_size,hidden_size]
            (cell_output, state) = cell(inputs[:,time_step,:], state) 
            # outputs: a list: num_steps elements of shape [batch_size,hidden_size]
            outputs.append(cell_output)  
            
# output: first to shape:[batch_size,num_steps*hidden_size] and the first row is the data of the first sentense
# and then reshpae to shape: [batch_size*num_steps,hidden_size], first num_steps rows is a sentense
output = tf.reshape(tf.concat(outputs,1), [-1, hidden_size])

# 7.Softmax: convert wordvec to probability for each word in vocab and calculate cross_entropy loss
# used to find which word in vocab the wordvec is like
softmax_w = tf.get_variable("softmax_w", [hidden_size, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())

用LSTMBlockCell構造了一個LSTM單元，單元裏的隱藏單元個數是hidden_size，有四個神經網絡，每個神經網絡的輸入是 $h_{t-1}$ 和 $x_{t}$ ，將它們concat到一起，維度爲 $hidden\_size+wordvec\_size$ ，所以LSTM裏的每個黃框的參數矩陣的維度爲 [ $hidden\_size+wordvec\_size, hidden\_size$ ]

需要注意的是，num_steps個時刻的LSTM都是共享一套參數的，說是有num_steps個LSTM單元，其實只有一個，只不過是對這個單元執行num_steps次。

上面的代碼中有個for循環，是以時間進行展開，在循環裏執行當前時刻下的單詞。

例子
舉個例子，比如一條語句有20個單詞，每個詞向量長度爲200，隱藏層單元個數爲128。

那麼訓練這條語句，輸入的tensor維度是 $[20,200]$ ， $h_{t}$ 和 $c_{t}$ 的維度是 $[128]$ ，那麼LSTM單元參數矩陣的維度是[ $128+200,4*128$ ]，

在時刻2，把這句話的第二個單詞作爲輸入，即輸入一個 $[200]$ 維的向量，由於會和 $h_{1}$ 進行concat，輸入矩陣變成了 $[200+128]$ ，輸入矩陣會和參數矩陣 $[200+128,4*128]$ 相乘，輸出爲 $[4*128]$ ，也就是每個黃框的輸出爲 $[128]$ ，黃框之間會進行一些操作，但不改變維度，輸出依舊是 $[128]$ ，即這條語句經過LSTM單元后，輸出的 $C_{2}$ , $h_{2}$ 的維度是128，所以上一章節的每個LSTM輸出的都是向量，它們的長度都是當前LSTM單元的hidden_size 得到了解釋。

但是這樣每次只訓練一句話效率太低，實際的神經網絡採用批量訓練的方法，比如每次輸入64條語句。那麼訓練這64條語句，輸入的張量維度是 $[64,20,200]$ ， $h_{t}$ 和 $c_{t}$ 的維度是[128]，那麼LSTM單元參數矩陣的維度是[ $128+200,4*128$ ]，

在時刻2，把64句話的第二個單詞作爲輸入，即輸入一個 $[64,200]$ 的矩陣，由於會和 $h_{1}$ 進行concat，輸入矩陣變成了 $[64, 200+128]$ ，輸入矩陣會和參數矩陣[ $200+128,4*128$ ]相乘，輸出爲 $[64,4*128]$ ，也就是每個黃框的輸出爲 $[64,128]$ ，表示的是每個黃框對這64句話的第二個單詞的參數，總共有4個黃框，故cell_output的維度爲 $[64, 4*128]$ 。這是在時刻2的輸出，依次計算到時刻20，最後outputs的維度是 $[20,64, 4*128]$ .

softmax相當於全連接層，將outputs映射到vocab_size個單詞上，進行交叉熵誤差計算。然後根據誤差更新LSTM參數矩陣和全連接層的參數。

從上面可以看出，LSTM的
輸入的tensor的維度是3維，分別是 $[batch\_size, num\_step, vector\_size]$ ，分別表示批量的大小、時間步長、輸入向量的維度。
輸出的tensor的維度是3維，分別是 $[num\_step, batch\_size, 4*hidden\_size]$ ，分別表示時間步長（其實這裏理解爲一句話的單詞個數更好），批量的大小、參數的個數。

雙/多層LSTM

雙/多層LSTM 與單層的差不多，我們在剛纔的例子上進行補充，如果要加上第二層LSTM怎麼辦，首先第一次的LSTM是不需要變得，第二層的LSTM的參數矩陣維度是多少呢？

我們剛纔知道了第一層的LSTM的輸出的維度是 $[128]$ ，這個輸出需要作爲第二層LSTM的輸入 $x_{t}$
，假如第二層的隱藏層單元個數爲hidden_size2，那麼第二層LSTM單元裏每個黃框的參數矩陣維度爲 $[hidden\_size2+128,hidden\_size2]$ 。

softmax_w的維度也得修改成 $[hidden\_size2, vocab\_size]$ 。
代碼爲：

cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_size, forget_bias=0.0, state_is_tuple=True)
cell2 = tf.nn.rnn_cell.BasicLSTMCell(hidden_size2, forget_bias=0.0, state_is_tuple=True)
cell = tf.contrib.rnn.MultiRNNCell([cell,cell2], state_is_tuple=True)

outputs = []
state = self._initial_state # state
with tf.variable_scope("RNN"):
    for time_step in range(num_steps):
        if time_step > 0: tf.get_variable_scope().reuse_variables()
            # cell_output: [batch_size,hidden_size]
            (cell_output, state) = cell(inputs[:,time_step,:], state) 
            # outputs: a list: num_steps elements of shape [batch_size,hidden_size]
            outputs.append(cell_output)  
            
# output: first to shape:[batch_size,num_steps*hidden_size] and the first row is the data of the first sentense
# and then reshpae to shape: [batch_size*num_steps,hidden_size], first num_steps rows is a sentense
output = tf.reshape(tf.concat(outputs,1), [-1, hidden_size2])

# 7.Softmax: convert wordvec to probability for each word in vocab and calculate cross_entropy loss
# used to find which word in vocab the wordvec is like
softmax_w = tf.get_variable("softmax_w", [hidden_size2, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())

特別注意

如果一二兩層的LSTM一樣的話可以寫成

cell = tf.contrib.rnn.MultiRNNCell([cell,cell], state_is_tuple=True)

但千萬不要寫成

cell = tf.contrib.rnn.MultiRNNCell([cell]*2, state_is_tuple=True)

多層的LSTM/RNN可能還需要dropout

dropout是一種非常efficient的regularization方法，在rnn中如何使用dropout和cnn不同，推薦大家去把recurrent neural network regularization看一遍。我在這裏僅講結論，

如下圖所示，橫向不進行dropout，也就是說從 t-1 時候的狀態傳遞到 t 時刻進行計算時，這個中間不進行memory的dropout；僅在同一個 t 時刻對應的縱向進行dropout。也就是在縱向虛線所在的地方進行dropout。

因此，我們在代碼中定義完cell之後，在cell外部包裹上dropout，這個類叫DropoutWrapper，這樣我們的cell就有了dropout功能！

可以從官方文檔中看到，它有input_keep_prob 和 output_keep_prob，也就是說裹上這個DropoutWrapper之後，如果我希望是input傳入這個 cell 時dropout掉一部分input信息的話，就設置input_keep_prob，那麼傳入到cell的就是部分input；如果我希望這個cell的output只部分作爲下一層cell的input的話，就定義output_keep_prob。不要太方便。
根據Zaremba在paper中的描述，這裏應該給cell設置output_keep_prob。

if is_training and config.keep_prob < 1:
    lstm_cell = tf.nn.rnn_cell.DropoutWrapper(
        lstm_cell, output_keep_prob=config.keep_prob)

轉載自：
https://blog.csdn.net/wjc1182511338/article/details/79689409#commentBox
https://blog.csdn.net/mydear_11000/article/details/52414342
部分地方有補充

TensorFlow實戰：LSTM的結構與cell中的參數

一些參數

單層LSTM

雙/多層LSTM

特別注意

多層的LSTM/RNN可能還需要dropout

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

TensorFlow常用函數介紹

圖解Transformer(轉)

jupyter notebook添加conda中的環境

TensorFlow2.0中的@tf.function的作用(轉載)

Numpy中的行向量與列向量

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結