1.LSTM 網絡
可以理解爲RNN的升級。
Long Short Term Memory networks(以下簡稱LSTMs),一種特殊的RNN網絡,該網絡設計出來是爲了解決長依賴問題。該網絡由 Hochreiter & Schmidhuber (1997)引入,並有許多人對其進行了改進和普及。他們的工作被用來解決了各種各樣的問題,直到目前還被廣泛應用。
所有循環神經網絡都具有神經網絡的重複模塊鏈的形式。 在標準的RNN中,該重複模塊將具有非常簡單的結構,例如單個tanh層。標準的RNN網絡如下圖所示
LSTMs也具有這種鏈式結構,但是它的重複單元不同於標準RNN網絡裏的單元只有一個網絡層,它的內部有四個網絡層。LSTMs的結構如下圖所示。
在解釋LSTMs的詳細結構時先定義一下圖中各個符號的含義,符號包括下面幾種
圖中黃色類似於CNN裏的激活函數操作,粉色圓圈表示點操作,單箭頭表示數據流向,箭頭合併表示向量的合併(concat)操作,箭頭分叉表示向量的拷貝操作。
2.之前也提到過RNNs取得了不錯的成績,這些成績很多是基於LSTMs來做的,說明LSTMs適用於大部分的序列場景應用。
3.代碼實現
# please note, all tutorial code are running under python3.5.
# If you use the version like python2.7, please modify the code accordingly
# 8 - RNN LSTM Regressor example
# to try tensorflow, un-comment following two lines
# import os
# os.environ['KERAS_BACKEND']='tensorflow'
import numpy as np
np.random.seed(1337) # for reproducibility
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import LSTM, TimeDistributed, Dense
from keras.optimizers import Adam
BATCH_START = 0
TIME_STEPS = 20
BATCH_SIZE = 50
INPUT_SIZE = 1
OUTPUT_SIZE = 1
CELL_SIZE = 20
LR = 0.006
def get_batch():
global BATCH_START, TIME_STEPS
# xs shape (50batch, 20steps)
xs = np.arange(BATCH_START, BATCH_START+TIME_STEPS*BATCH_SIZE).reshape((BATCH_SIZE, TIME_STEPS)) / (10*np.pi)
seq = np.sin(xs)
res = np.cos(xs)
BATCH_START += TIME_STEPS
# plt.plot(xs[0, :], res[0, :], 'r', xs[0, :], seq[0, :], 'b--')
# plt.show()
return [seq[:, :, np.newaxis], res[:, :, np.newaxis], xs]
model = Sequential()
# build a LSTM RNN
model.add(LSTM(
batch_input_shape=(BATCH_SIZE, TIME_STEPS, INPUT_SIZE), # Or: input_dim=INPUT_SIZE, input_length=TIME_STEPS,
output_dim=CELL_SIZE,
return_sequences=True, # True: output at all steps. False: output as last step.
stateful=True, # True: the final state of batch1 is feed into the initial state of batch2
))
# add output layer
model.add(TimeDistributed(Dense(OUTPUT_SIZE)))
adam = Adam(LR)
model.compile(optimizer=adam,
loss='mse',)
print('Training ------------')
for step in range(501):
# data shape = (batch_num, steps, inputs/outputs)
X_batch, Y_batch, xs = get_batch()
cost = model.train_on_batch(X_batch, Y_batch)
pred = model.predict(X_batch, BATCH_SIZE)
plt.plot(xs[0, :], Y_batch[0].flatten(), 'r', xs[0, :], pred.flatten()[:TIME_STEPS], 'b--')
plt.ylim((-1.2, 1.2))
plt.draw()
plt.pause(0.1)
if step % 10 == 0:
print('train cost: ', cost)