tensorflow 一系列LSTMCell的特點及用法

首先鳴謝這個博客 https://www.cnblogs.com/hrlnw/p/10748990.html 帶來的啓發

原博客用的tf1.10，我用的1.15，實測無影響。

tf.nn.rnn_cell、tf.compat.v1.nn.rnn_cell和tf.contrib.rnn互相等價，rnn的包分爲兩個部分

1. tf.contrib.rnn 2.tf.contrib.cudnn_rnn

一、 tf.compat.v1.nn.rnn_cell.LSTMCell(num_units=hidden_num, initializer=weight_initializer)

=tf.nn.rnn_cell.LSTMCell()=tf.nn.rnn_cell.BasicLSTMCell()

該函數清楚標明“Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU, or tf.contrib.rnn.LSTMBlockCell and tf.contrib.rnn.LSTMBlockFusedCell for better performance on CPU.”

tf.nn.rnn_cell.BasicLSTMCell 應該被最後考慮使用。對於不常見的RNN cell類型（ tf.contrib.rnn.BasicLSTMCell 變體，比如：tf.contrib.rnn.NASCell、tf.contrib.rnn.PhasedLSTMCell，tf.contrib.rnn.UGRNNCell，tf.contrib.rnn.GLSTMCell，tf.contrib.rnn.Conv1DLSTMCell，tf.contrib.rnn.Conv2DLSTMCell，tf.contrib.rnn.LayerNormBasicLSTMCell等），我們應該意識到它們在計算圖中，像tf.contrib.rnn.BasicLSTMCell 一樣，性能低，並且內存佔用高。我們在使用這些單元前，需要考慮這樣的平衡是否值得。例如，雖然 layer normalization 能夠加速收斂速度，但在不使用layer normalization的情況下，cuDNN 能夠加速20倍。

二、tf.contrib.rnn.LSTMBlockCell(num_units=hidden_num)繼承自LayerRNNCell，適用於一個時間步運行一個rnncell的場景。

如果不是使用一個 RNN layer，而是隻使用一個 RNN cell，應該首要選擇 tf.nn.dynamic_rnn。無論是dynamic_rnn還是static_rnn，對性能都沒有影響，但是好處有：

1. 如果 inputs 過大的話，使用 tf.nn.static_rnn 會增加 graph 的大小，並且有增加編譯時間。

2. tf.nn.dynamic_rnn 能夠很好地處理長 sequence，它可以從 GPU 往 CPU 中交換內存。

有可能的話，可以在tf.while_loop中並行運行多個tf.nn.dynamic_rnn，但在RNN中幾乎不用，因爲他們本來就是序列的。

三、tf.contrib.cudnn_rnn.CudnnCompatibleLSTMCell(num_units = hidden_num)繼承自LSTMBlockCell，適用於一個時間步運行一個rnncell的場景。

1. 如果 NN 限定只在 NVIDIA 的 GPU 上運行，可以考慮使用 tf.contrib.cudnn_rnn，它通常比 tf.contrib.rnn.BasicLSTMCell 和 tf.contrib.rnn.LSTMBlockCell 快一個數量級，並且，相比於 tf.contrib.rnn.BasicLSTMCell，它使用少三四倍的內存。
2. 如果 NN 需要 layer normalization, 則不應該使用 tf.contrib.cudnn_rnn。

四、tf.contrib.rnn.LSTMBlockFusedCell(num_units=hidden_num)繼承自LSTMBlockWrapper，相當於一個rnn層，不能用tf.nn.dynamic_rnn、tf.contrib.rnn.FusedRNNCellAdaptor等修飾，需要直接實例化。在只有 CPU，或者 GPU 機器上無法獲得 tf.contrib.cudnn_rnn，或者移動設備上，應該使用 tf.contrib.rnn.LSTMBlockFusedCell。這個是個速度王者，但是無法按照時間步輸出。

五、手寫LSTMCell，這個下面樣例會出現，畢竟沒經過專業優化，速度real一般了，但是比起tf.nn.rnn_cell.LSTMCell()=tf.nn.rnn_cell.BasicLSTMCell()之流還是強太多

下面上代碼（有點亂，湊合看吧）

import numpy as np
import tensorflow as tf
import time
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

batch_size = 1
time_step = 70
hidden_num = 512
weight_initializer = tf.truncated_normal_initializer(stddev=0.01)
w_h = tf.compat.v1.get_variable('w_h', [hidden_num, hidden_num * 4], initializer=tf.initializers.orthogonal())
w_in = tf.compat.v1.get_variable('w_in', [hidden_num, hidden_num * 4], initializer=tf.initializers.orthogonal())
def lstm_cell(input, hidden_state, cell_state, w_in=w_in, w_h=w_h):
    pack_with_bias = tf.add(tf.matmul(input, w_in), tf.matmul(hidden_state, w_h))
    i, f, o, g = tf.split(pack_with_bias, num_or_size_splits=4, axis=1)  # (bsize, hid_dim)
    i = tf.sigmoid(i)
    f = tf.sigmoid(f)  # (f + forget_bias)
    o = tf.sigmoid(o)
    g = tf.tanh(g)
    c = tf.add(tf.multiply(f, cell_state), tf.multiply(i, g))
    h = tf.multiply(o, tf.tanh(c))

    return h, c


rnn_cell1 = tf.compat.v1.nn.rnn_cell.LSTMCell(num_units=hidden_num, initializer=weight_initializer)
rnn_cell2 = tf.contrib.rnn.LSTMBlockCell(num_units=hidden_num)
rnn_cell3 = tf.contrib.rnn.LSTMBlockFusedCell(num_units=hidden_num)
rnn_cell4 = tf.contrib.cudnn_rnn.CudnnCompatibleLSTMCell(num_units = hidden_num)


np_input_data = np.random.randn(batch_size, time_step, hidden_num).astype(np.float32)
np_hidden_state = np.random.randn(batch_size, hidden_num).astype(np.float32)
np_cell_state = np.random.randn(batch_size, hidden_num).astype(np.float32)
np_input_len = [time_step]*batch_size
input_data = tf.placeholder(dtype=tf.float32, shape=[batch_size, time_step, hidden_num], name='input_data')
hidden_data = tf.placeholder(dtype=tf.float32, shape=[batch_size, hidden_num], name='hidden')
cell_data = tf.placeholder(dtype=tf.float32, shape=[batch_size, hidden_num], name='cell')
trans_data = tf.transpose(input_data, [1, 0, 2])
state =(cell_data, hidden_data)
rnn_cell_list = [rnn_cell1, rnn_cell2, rnn_cell3, rnn_cell4]


outputs = []
output_array = tf.TensorArray(dtype=tf.float32, size=time_step)
def wl_t_func(i, trans_data, output_array, h, c):
    h, c = lstm_cell(trans_data[i,:,:], h, c)
    output_array = output_array.write(i, h)
    return i+1, trans_data, output_array, h, c
_, _, output_array, _, _ = tf.while_loop(cond=lambda i, *_: i<time_step, body=wl_t_func, loop_vars=(tf.constant(0, tf.int32), tf.transpose(tf.convert_to_tensor(np_input_data),[1, 0, 2]), output_array,
                                                                                                    tf.convert_to_tensor(np_hidden_state), tf.convert_to_tensor(np_cell_state)))
output_array = output_array.stack()

# for t in range(time_step):
#     h, c = lstm_cell(trans_data[t,:,:], hidden_data, cell_data)
#     hidden_data, cell_data = h, c
#     outputs.append(h)
# outputs = tf.stack(outputs, axis=0)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    start = time.time()
    # result = sess.run([outputs], feed_dict={input_data: np_input_data, hidden_data:np_hidden_state, cell_data:np_cell_state})[0]

    result = sess.run(output_array)
    end = time.time()
print('lstm_cell', '*', end - start, '*', result.shape)


for i in range(4):
    outputs = [trans_data]
    rnn_cell = rnn_cell_list[i]
    if i == 0:
        fw_rnn = tf.contrib.rnn.FusedRNNCellAdaptor(rnn_cell, use_dynamic_rnn=False)
        outputs1, state1 = fw_rnn(outputs[-1], sequence_length=np_input_len, dtype=tf.float32)  # time_len, batch, output_size
        outputs.append(outputs1)
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            start = time.time()
            result = sess.run([outputs[-1]], feed_dict={input_data: np_input_data})[0]
            end = time.time()
        print(i, '*', end - start, '*', result.shape)

    elif i == 2:
        fw_rnn = rnn_cell
        outputs_t = fw_rnn(outputs[-1], dtype=tf.float32)

        outputs.append(outputs_t)
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            start = time.time()
            result = sess.run([outputs[-1]], feed_dict={input_data: np_input_data})[0][0]
            end = time.time()
        print(i, '*', end - start, '*', result.shape)

    else:
        fw_rnn = rnn_cell

        output_array = tf.TensorArray(dtype=tf.float32, size=time_step)

        def wl_t_fw_rnn(i, trans_data, output_array, state):
            h, new_state = fw_rnn(trans_data[i, :, :], state)
            output_array = output_array.write(i, h)
            return i + 1, trans_data, output_array, new_state

        init_state = tf.nn.rnn_cell.LSTMStateTuple(c=tf.convert_to_tensor(np_cell_state), h=tf.convert_to_tensor(np_hidden_state))
        _, _, output_array, _ = tf.while_loop(cond=lambda i, *_: i < time_step, body=wl_t_fw_rnn, loop_vars=(
        tf.constant(0, tf.int32), tf.transpose(tf.convert_to_tensor(np_input_data), [1, 0, 2]), output_array, init_state))
        output_array = output_array.stack()
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            start = time.time()
            result = sess.run(output_array)
            end = time.time()
        print(i, '*while_loop*', end - start, '*', result.shape)

        outputs = []
        for t in range(time_step):
            h, new_state = fw_rnn(trans_data[t,:,:], state)
            state = new_state
            outputs.append(h)
        outputs = tf.stack(outputs, axis=0)
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            start = time.time()
            result = sess.run([outputs], feed_dict={input_data: np_input_data, cell_data:np_cell_state, hidden_data:np_hidden_state})[0]
            end = time.time()
        print(i, '*for*', end - start, '*', result.shape)

運行時間統計如下：

cell類型	時間	時間for循環
手寫lstmcell	whileloop 0.227
tf.compat.v1.nn.rnn_cell.LSTMCell	0.772
tf.contrib.rnn.LSTMBlockCell	whileloop 0.075	0.156
tf.contrib.cudnn_rnn.CudnnCompatibleLSTMCell	whileloop 0.086	0.247
tf.contrib.rnn.LSTMBlockFusedCell	0.066

tensorflow 一系列LSTMCell的特點及用法

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

外行也能讀懂的網絡硬件設備功能原理速成

tensorflow 一系列LSTMCell的特點及用法

工具代碼粘貼2——pytorch幾種初始化方式代碼

工具代碼粘貼1——pytorch設置隨機種子seed

tensorflow雜記

工具代碼粘貼4——保存日誌打印到控制檯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結