LSTM本质是RNN,最大的区别在于在RNN基础结构上加入了一条cell state的信息传送带,用于记忆信息,使能处理长距离的上下文依赖。
LSTM网络结构
细胞状态
LSTM的核心是细胞状态,也就是下图中顶部的水平线,其作用可以理解为整个模型中的记忆空间,随着时间的变化而变换,传送带本身无法控制哪些信息是否被记忆,其控制作用的是下方的门结构,包括忘记门,输入门,候选门,输出门。
忘记门:
忘记门控制着该忘记哪些信息,通过传统sigmoid激活函数来实现。
其中 :上一层输出信息, 为当前信息,两者进行线性组合后,利用sigmoid激活函数得到一个0~1的输出,当函数值接近0时,表示记忆体丢失的信息越多。
输入门和候选门:
输入门用于确定什么信息将会被存储在细胞状态中,候选门用于计算当前的输入和过去的记忆所具有信息的综合。
包含两个部分,sigmoid层称为输入门,决定更新哪个值。接着,tanh层创建一个候选值向量,该向量将会被添加到细胞状态中。
接下来更新细胞状态,通过以上的两步操作,忘记了决定忘记的旧的信息,添加了决定记起的新的信息。
输出门:
输出门用于决定输出什么样的信息。
首先使用sigmoid层决定细胞状态的哪一部分需要输出,然后将细胞状态通过tanh层,最后将两者相乘作为输出。
Tensorflow中构建LSTM
lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0,
state_is_tuple=True, activation=None, reuse=None, name=None)
例如在mnist实验中,输入数据的维度为28 * 28,那么将样本的每一行当成一个输入,通过28个时间步骤展开LSTM,在每一个LSTM单元,输入一行维度为28的向量。
对于每一个LSTM单元,参数num_units表示每一个单元输出为128 * 1的向量。
如下图所示,对于每一个输入28维的向量,LSTM单元会将他映射到128维,在下一个LSTM单元时,LSTM会接收上一个128维的输出,和新的28维的输入,处理之后再映射成一个新的128维的向量输出,就这么一直处理下去,直到网络中最后一个LSTM单元,输出一个128维的向量。
lstm输入数据的格式为【batch_size , n_steps, n_inputs】
def inference(input_tensor):
with tf.variable_scope('lstm1'):
lstm1_weights_out = tf.get_variable("weight_out", [n_hidden_units,n_classes],initializer = tf.random_normal_initializer())
lstm1_biases_out = tf.get_variable("bias_out", [n_classes,],initializer = tf.constant_initializer(0.1))
lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=True)
stack = tf.contrib.rnn.MultiRNNCell([lstm_cell] * 1,
state_is_tuple=True)
_init_state = stack.zero_state(batch_size, dtype=tf.float32)
outputs,states = tf.nn.dynamic_rnn(stack, input_tensor, initial_state=_init_state, time_major=False)
outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
results = tf.matmul(outputs[-1], lstm1_weights_out) + lstm1_biases_out
return results
outputs,states = tf.nn.dynamic_rnn(stack, x_in, initial_state=_init_state, time_major=False)
得出的output为【1 * 28 * 128】的矩阵,在使用交叉熵作为损失函数时,只需要将最后一维(最后一个step)1 * 128的向量作为fc的输入。
在使用ctc作用损失函数时,每一个step的输出均参与计算。
MultiRNNCell 多层堆叠CNN
很多时候,单层RNN的能力有限,我们需要多层的RNN。将x输入第一层RNN的后得到隐层状态h,这个隐层状态就相当于第二层RNN的输入,第二层RNN的隐层状态又相当于第三层RNN的输入,以此类推。
Tensorflow中的实现就是使用tf.nn.rnn_cell.MultiRNNCell
声明一个cell
MultiRNNCell中传入[cell]*num_layers就可以了
注意如果是LSTM,定义参数state_is_tuple=True
layers = [tf.nn.rnn_cell.GRUCell(num_hidden) for _ in range(num_layers)]
# Stacking rnn cells
stack = tf.contrib.rnn.MultiRNNCell(layers,
state_is_tuple=True)
init_state = stack.zero_state(batch_size, dtype=tf.float32)
# The second output is the last state and we will no use that
outputs,states = tf.nn.dynamic_rnn(stack, x_in, initial_state=_init_state, time_major=False)
这里的layers不能直接使用
cell = tf.nn.rnn_cell.GRUCell(num_hidden)
layers = cell * num_layers
最好是将cell的形成过程使用函数封装
def lstm_cell(is_trainning):
cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden)
if is_trainning:
cell = tf.nn.rnn_cell.DropoutWrapper(cell, 0.5)
return cell
然后调用该函数进行多层layers的构造
layers = [lstm_cell(is_trainning) for _ in range(num_layers)]
# Stacking rnn cells
stack = tf.nn.rnn_cell.MultiRNNCell(layers,
state_is_tuple=True)
# init_state = stack.zero_state(batch_size, dtype=tf.float32)
# The second output is the last state and we will no use that
outputs, _ = tf.nn.dynamic_rnn(stack, lstm_input, seq_len, dtype=tf.float32)
tf.nn.dynamic_rnn 一次执行多步
对於单个的RNNCell,我们使用它的call函数进行运算时,只是在序列时间上前进了一步。比如使用x1、h0得到h1,通过x2、h1得到h2等。这样的h话,如果我们的序列长度为10,就要调用10次call函数,比较麻烦。对此,TensorFlow提供了一个tf.nn.dynamic_rnn函数,使用该函数就相当于调用了n次call函数。即通过{h0,x1, x2, …., xn}直接得{h1,h2…,hn}。
# inputs: shape = (batch_size, time_steps, input_size)
# cell: RNNCell
# initial_state: shape = (batch_size, cell.state_size)。初始状态。一般可以取零矩阵
outputs, state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
RNN在处理序列问题的时候,由于每个batch的输入数据一般被抽象成(batchsize,timestep,input dim)的3d张量,所以在batch里的每个sample,都是一个(timestep,input dim)的矩阵,所以每个样本的timestep要保持一致, 当输入的序列为不定长序列是,需要padding成统一长度的序列。
变长序列的处理方法
def pad_sequences(sequences, maxlen=None, dtype=np.float32,
padding=‘post’, truncating=‘post’, value=0.):
lengths = np.asarray([len(s) for s in sequences], dtype=np.int64)
nb_samples = len(sequences)
if maxlen is None:
maxlen = np.max(lengths)
# take the sample shape from the first non empty sequence
# checking for consistency in the main loop below.
sample_shape = tuple()
for s in sequences:
if len(s) > 0:
sample_shape = np.asarray(s).shape[1:]
break
x = (np.ones((nb_samples, maxlen) + sample_shape) * value).astype(dtype)
for idx, s in enumerate(sequences):
if len(s) == 0:
continue # empty list was found
if truncating == 'pre':
trunc = s[-maxlen:]
elif truncating == 'post':
trunc = s[:maxlen]
else:
raise ValueError('Truncating type "%s" not understood' % truncating)
# check `trunc` has expected shape
trunc = np.asarray(trunc, dtype=dtype)
if trunc.shape[1:] != sample_shape:
raise ValueError('Shape of sample %s of sequence at position %s is different from expected shape %s' %
(trunc.shape[1:], idx, sample_shape))
if padding == 'post':
x[idx, :len(trunc)] = trunc
elif padding == 'pre':
x[idx, -len(trunc):] = trunc
else:
raise ValueError('Padding type "%s" not understood' % padding)
return x, lengths
dynamic有个参数:sequence_length,这个参数用来指定每个example的长度,比如上面的例子中,我们令 sequence_length为[20,13],表示第一个example有效长度为20,第二个example有效长度为13,当我们传入这个参数的时候,对于第二个example,TensorFlow对于13以后的padding就不计算了,其last_states将重复第13步的last_states直至第20步,而outputs中超过13步的结果将会被置零。
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
# parameters init
l_r = 0.001
training_iters = 100000
batch_size = 128
n_inputs = 28 #单位时间的特征向量高度
n_steps = 28 #时间序列的长度
n_hidden_units = 128 #输出
n_classes = 10
tf.reset_default_graph()
def inference(input_tensor):
with tf.variable_scope('lstm1'):
#lstm层之前加一个线性层,是为了将输入数据映射到与hidden_units相同的维度上
lstm1_weights_in = tf.get_variable("weight_in", [n_inputs,n_hidden_units],initializer = tf.random_normal_initializer())
lstm1_biases_in = tf.get_variable("bias_in", [n_hidden_units,],initializer = tf.constant_initializer(0.1))
lstm1_weights_out = tf.get_variable("weight_out", [n_hidden_units,n_classes],initializer = tf.random_normal_initializer())
lstm1_biases_out = tf.get_variable("bias_out", [n_classes,],initializer = tf.constant_initializer(0.1))
input_tensor = tf.reshape(input_tensor, [-1, n_inputs])
x_in = tf.matmul(input_tensor, lstm1_weights_in) + lstm1_biases_in
x_in = tf.reshape(x_in, [-1, n_steps, n_hidden_units])
# 定义一个LSTM循环体,作为循环的基础结构
lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units, forget_bias=1.0, state_is_tuple=True)
_init_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
outputs,states = tf.nn.dynamic_rnn(lstm_cell, x_in, initial_state=_init_state, time_major=False)
#hidden layer for output as the final results
#results = tf.matmul(states[1], weights['out']) + biases['out']
# or
outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
results = tf.matmul(outputs[-1], lstm1_weights_out) + lstm1_biases_out
return results
#load mnist data
mnist = input_data.read_data_sets("../../../datasets/MNIST_data", one_hot=True)
#define placeholder for input
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y_ = tf.placeholder(tf.float32, [None, n_classes])
y = inference(x)
cost = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.argmax(y_, 1), logits=y))
train_op = tf.train.AdamOptimizer(l_r).minimize(cost)
correct_pred = tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32))
#init session
sess = tf.Session()
#init all variables
sess.run(tf.global_variables_initializer())
#start training
#for i in range(training_iters):
for i in range(training_iters):
#get batch to learn easily
batch_x, batch_y = mnist.train.next_batch(batch_size)
batch_x = batch_x.reshape([batch_size, n_steps, n_inputs])
sess.run(train_op,feed_dict={x: batch_x, y_: batch_y})
if i % 50 == 0:
print(sess.run(accuracy,feed_dict={x: batch_x, y_: batch_y,}))
#test_data = mnist.test.images.reshape([-1, n_steps, n_inputs])
#test_label = mnist.test.labels
#print("Testing Accuracy: ", sess.run(accuracy, feed_dict={x: test_data, y: test_label}))
https://blog.csdn.net/qq_37879432/article/details/78552055
https://blog.csdn.net/xierhacker/article/details/73480744
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://arxiv.org/pdf/1506.00019v2.pdf
https://www.cnblogs.com/wangduo/p/6773601.html?utm_source=itdadao&utm_medium=referral
https://blog.csdn.net/notHeadache/article/details/81164264
https://www.leiphone.com/news/201709/QJAIUzp0LAgkF45J.html