Learning Tensorflow(5)---LSTM

LSTM本质是RNN，最大的区别在于在RNN基础结构上加入了一条cell state的信息传送带，用于记忆信息，使能处理长距离的上下文依赖。

LSTM网络结构

细胞状态

LSTM的核心是细胞状态，也就是下图中顶部的水平线，其作用可以理解为整个模型中的记忆空间，随着时间的变化而变换，传送带本身无法控制哪些信息是否被记忆，其控制作用的是下方的门结构，包括忘记门，输入门，候选门，输出门。

忘记门：
忘记门控制着该忘记哪些信息，通过传统sigmoid激活函数来实现。

其中：上一层输出信息，为当前信息，两者进行线性组合后，利用sigmoid激活函数得到一个0~1的输出，当函数值接近0时，表示记忆体丢失的信息越多。

输入门和候选门：
输入门用于确定什么信息将会被存储在细胞状态中，候选门用于计算当前的输入和过去的记忆所具有信息的综合。

包含两个部分，sigmoid层称为输入门，决定更新哪个值。接着，tanh层创建一个候选值向量，该向量将会被添加到细胞状态中。

接下来更新细胞状态，通过以上的两步操作，忘记了决定忘记的旧的信息，添加了决定记起的新的信息。

输出门：
输出门用于决定输出什么样的信息。

首先使用sigmoid层决定细胞状态的哪一部分需要输出，然后将细胞状态通过tanh层，最后将两者相乘作为输出。

Tensorflow中构建LSTM

lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0,
               state_is_tuple=True, activation=None, reuse=None, name=None)

例如在mnist实验中，输入数据的维度为28 * 28，那么将样本的每一行当成一个输入，通过28个时间步骤展开LSTM，在每一个LSTM单元，输入一行维度为28的向量。

对于每一个LSTM单元，参数num_units表示每一个单元输出为128 * 1的向量。
如下图所示，对于每一个输入28维的向量，LSTM单元会将他映射到128维，在下一个LSTM单元时，LSTM会接收上一个128维的输出，和新的28维的输入，处理之后再映射成一个新的128维的向量输出，就这么一直处理下去，直到网络中最后一个LSTM单元，输出一个128维的向量。
lstm输入数据的格式为【batch_size , n_steps, n_inputs】

def inference(input_tensor):
    with tf.variable_scope('lstm1'):

        lstm1_weights_out = tf.get_variable("weight_out", [n_hidden_units,n_classes],initializer = tf.random_normal_initializer())
        lstm1_biases_out = tf.get_variable("bias_out", [n_classes,],initializer = tf.constant_initializer(0.1))
        
        
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=True)
        
        stack = tf.contrib.rnn.MultiRNNCell([lstm_cell] * 1,
                                        state_is_tuple=True)
        _init_state = stack.zero_state(batch_size, dtype=tf.float32)

        outputs,states = tf.nn.dynamic_rnn(stack, input_tensor, initial_state=_init_state, time_major=False)

        outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
        results = tf.matmul(outputs[-1], lstm1_weights_out) + lstm1_biases_out
       
    return results

outputs,states = tf.nn.dynamic_rnn(stack, x_in, initial_state=_init_state, time_major=False)

得出的output为【1 * 28 * 128】的矩阵，在使用交叉熵作为损失函数时，只需要将最后一维（最后一个step）1 * 128的向量作为fc的输入。
在使用ctc作用损失函数时，每一个step的输出均参与计算。

MultiRNNCell 多层堆叠CNN
很多时候，单层RNN的能力有限，我们需要多层的RNN。将x输入第一层RNN的后得到隐层状态h，这个隐层状态就相当于第二层RNN的输入，第二层RNN的隐层状态又相当于第三层RNN的输入，以此类推。

Tensorflow中的实现就是使用tf.nn.rnn_cell.MultiRNNCell
声明一个cell
MultiRNNCell中传入[cell]*num_layers就可以了
注意如果是LSTM，定义参数state_is_tuple=True

    layers = [tf.nn.rnn_cell.GRUCell(num_hidden) for _ in range(num_layers)]
    # Stacking rnn cells
    stack = tf.contrib.rnn.MultiRNNCell(layers,
                                        state_is_tuple=True)
    init_state = stack.zero_state(batch_size, dtype=tf.float32)
    # The second output is the last state and we will no use that
    outputs,states = tf.nn.dynamic_rnn(stack, x_in, initial_state=_init_state, time_major=False)

这里的layers不能直接使用
cell = tf.nn.rnn_cell.GRUCell(num_hidden)
layers = cell * num_layers

最好是将cell的形成过程使用函数封装

def lstm_cell(is_trainning):
    cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden)
        
    if is_trainning:
        cell = tf.nn.rnn_cell.DropoutWrapper(cell, 0.5)
    return cell

然后调用该函数进行多层layers的构造

layers = [lstm_cell(is_trainning) for _ in range(num_layers)]
        # Stacking rnn cells
        stack = tf.nn.rnn_cell.MultiRNNCell(layers,
                                            state_is_tuple=True)
    #    init_state = stack.zero_state(batch_size, dtype=tf.float32)
    
        # The second output is the last state and we will no use that
        outputs, _ = tf.nn.dynamic_rnn(stack, lstm_input, seq_len, dtype=tf.float32)

tf.nn.dynamic_rnn 一次执行多步
对於单个的RNNCell，我们使用它的call函数进行运算时，只是在序列时间上前进了一步。比如使用x1、h0得到h1，通过x2、h1得到h2等。这样的h话，如果我们的序列长度为10，就要调用10次call函数，比较麻烦。对此，TensorFlow提供了一个tf.nn.dynamic_rnn函数，使用该函数就相当于调用了n次call函数。即通过{h0,x1, x2, …., xn}直接得{h1,h2…,hn}。

# inputs: shape = (batch_size, time_steps, input_size)
# cell: RNNCell
# initial_state: shape = (batch_size, cell.state_size)。初始状态。一般可以取零矩阵
outputs, state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)

RNN在处理序列问题的时候，由于每个batch的输入数据一般被抽象成(batchsize，timestep，input dim)的3d张量，所以在batch里的每个sample，都是一个(timestep，input dim)的矩阵，所以每个样本的timestep要保持一致, 当输入的序列为不定长序列是，需要padding成统一长度的序列。

变长序列的处理方法

def pad_sequences(sequences, maxlen=None, dtype=np.float32,
padding=‘post’, truncating=‘post’, value=0.):
lengths = np.asarray([len(s) for s in sequences], dtype=np.int64)

nb_samples = len(sequences)
if maxlen is None:
    maxlen = np.max(lengths)

# take the sample shape from the first non empty sequence
# checking for consistency in the main loop below.
sample_shape = tuple()
for s in sequences:
    if len(s) > 0:
        sample_shape = np.asarray(s).shape[1:]
        break

x = (np.ones((nb_samples, maxlen) + sample_shape) * value).astype(dtype)
for idx, s in enumerate(sequences):
    if len(s) == 0:
        continue  # empty list was found
    if truncating == 'pre':
        trunc = s[-maxlen:]
    elif truncating == 'post':
        trunc = s[:maxlen]
    else:
        raise ValueError('Truncating type "%s" not understood' % truncating)

    # check `trunc` has expected shape
    trunc = np.asarray(trunc, dtype=dtype)
    if trunc.shape[1:] != sample_shape:
        raise ValueError('Shape of sample %s of sequence at position %s is different from expected shape %s' %
                         (trunc.shape[1:], idx, sample_shape))

    if padding == 'post':
        x[idx, :len(trunc)] = trunc
    elif padding == 'pre':
        x[idx, -len(trunc):] = trunc
    else:
        raise ValueError('Padding type "%s" not understood' % padding)
return x, lengths



dynamic有个参数：sequence_length，这个参数用来指定每个example的长度，比如上面的例子中，我们令 sequence_length为[20,13]，表示第一个example有效长度为20，第二个example有效长度为13，当我们传入这个参数的时候，对于第二个example，TensorFlow对于13以后的padding就不计算了，其last_states将重复第13步的last_states直至第20步，而outputs中超过13步的结果将会被置零。


from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

# parameters init
l_r = 0.001
training_iters = 100000
batch_size = 128

n_inputs = 28 #单位时间的特征向量高度
n_steps = 28 #时间序列的长度
n_hidden_units = 128 #输出
n_classes = 10
tf.reset_default_graph() 

def inference(input_tensor):
    with tf.variable_scope('lstm1'):
       #lstm层之前加一个线性层，是为了将输入数据映射到与hidden_units相同的维度上
        lstm1_weights_in = tf.get_variable("weight_in", [n_inputs,n_hidden_units],initializer = tf.random_normal_initializer())
        lstm1_biases_in = tf.get_variable("bias_in", [n_hidden_units,],initializer = tf.constant_initializer(0.1))
        
        lstm1_weights_out = tf.get_variable("weight_out", [n_hidden_units,n_classes],initializer = tf.random_normal_initializer())
        lstm1_biases_out = tf.get_variable("bias_out", [n_classes,],initializer = tf.constant_initializer(0.1))
        
        input_tensor = tf.reshape(input_tensor, [-1, n_inputs])
        x_in = tf.matmul(input_tensor, lstm1_weights_in) + lstm1_biases_in
        x_in = tf.reshape(x_in, [-1, n_steps, n_hidden_units])
        
        # 定义一个LSTM循环体，作为循环的基础结构
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=True)
        _init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
        outputs,states = tf.nn.dynamic_rnn(lstm_cell, x_in, initial_state=_init_state, time_major=False)

        #hidden layer for output as the final results
        #results = tf.matmul(states[1], weights['out']) + biases['out']
        # or
        outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
        results = tf.matmul(outputs[-1], lstm1_weights_out) + lstm1_biases_out
        
    return results
    

#load mnist data
mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)

#define placeholder for input
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y_ = tf.placeholder(tf.float32, [None, n_classes])

y = inference(x)
cost = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=y))


train_op = tf.train.AdamOptimizer(l_r).minimize(cost)

correct_pred = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))

#init session
sess = tf.Session()
#init all variables
sess.run(tf.global_variables_initializer())
#start training

#for i in range(training_iters):
for i in range(training_iters):
    #get batch to learn easily
    batch_x, batch_y = mnist.train.next_batch(batch_size)
    batch_x = batch_x.reshape([batch_size, n_steps, n_inputs])
    sess.run(train_op,feed_dict={x: batch_x, y_: batch_y})
    if i % 50 == 0:
        print(sess.run(accuracy,feed_dict={x: batch_x, y_: batch_y,}))


#test_data = mnist.test.images.reshape([-1, n_steps, n_inputs])
#test_label = mnist.test.labels
#print("Testing Accuracy: ", sess.run(accuracy, feed_dict={x: test_data, y: test_label}))

https://blog.csdn.net/qq_37879432/article/details/78552055
https://blog.csdn.net/xierhacker/article/details/73480744
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://arxiv.org/pdf/1506.00019v2.pdf
https://www.cnblogs.com/wangduo/p/6773601.html?utm_source=itdadao&utm_medium=referral
https://blog.csdn.net/notHeadache/article/details/81164264
https://www.leiphone.com/news/201709/QJAIUzp0LAgkF45J.html

Learning Tensorflow(5)---LSTM

LSTM网络结构

细胞状态

Tensorflow中构建LSTM

Power Automate Desktop 安装完，登录后老是提示one driver 错误

再谈23种设计模式（3）：行为型模式（学习笔记）

微前端学习笔记(4):从微前端到微模块之EMP与hel-micro方案探索

微前端学习笔记（1）：微前端总体架构概述，从微服务发微

985 硕士程序员，空窗 4 个月没有 Offer！

一文搞懂 Spring 循环依赖

赛博斗地主——使用大语言模型扮演Agent智能体玩牌类游戏。

VScode右键打开(添加到右键)

记一次 .NET某工控视觉自动化系统卡死分析

WindowsServer--SQL Server搭建主从同步实现读写分离 - 事务性分发

歸一化互相關

灰度共生矩陣

QT ：鼠標框選

QT ：菜單

SIFT特徵點檢測

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結