在本節中,我們可以更新普通的LSTM以使用編解碼器模型。這意味着模型不會直接輸出向量序列。相反,該模型將由兩個子模型組成,用於讀取和編碼輸入序列的編碼器,以及讀取編碼的輸入序列並對輸出序列中的每個元素進行一步預測的解碼器。這種差別很細微,因爲實際上這兩種方法都可以預測序列輸出。重要的不同之處在於,解碼器使用了LSTM模型,這使得解碼器既可以知道前一天在序列中預測了什麼,又可以在輸出序列時積累內部狀態。讓我們仔細看看這個模型是如何定義的。和前面一樣,我們定義了一個包含200個單元的LSTM隱藏層。這是解碼器模型,它將讀取輸入序列並輸出一個200個元素向量(每個單元一個輸出),該元素向量從輸入序列捕獲特性。
我們將使用14天的總功耗作爲輸入。
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
我們將使用一個簡單的編碼器-解碼器架構,易於在Keras中實現,這與LSTM自動編碼器的架構有很多相似之處。首先,對輸入序列的內部表示進行多次重複,對於輸出序列中的每個時間步長重複一次。這個向量序列將被呈現給LSTM解碼器。
model.add(RepeatVector(7))
然後我們將解碼器定義爲一個包含200個單元的LSTM隱藏層。重要的是,解碼器將輸出整個序列,而不僅僅是序列末尾的輸出,就像我們對編碼器所做的那樣。這意味着200個單元中的每一個單元都將爲7天中的每一天輸出一個值,表示輸出序列中每天預測的內容的基礎。
model.add(LSTM(200, activation='relu', return_sequences=True))
然後,我們將使用一個完全連接的層來解釋最終輸出層之前輸出序列中的每個時間步長。重要的是,輸出層預測輸出序列中的一個步驟,不是一次七天,這意味着我們將對輸出序列中的每個步驟使用相同的層。它的意思是相同的完全連接層和輸出層將用於處理解碼器提供的每個時間步長。爲了實現這一點,我們將解釋層和輸出層封裝在一個TimeDistributed包裝器中,該包裝器允許從解碼器每次執行步驟時都使用所封裝的層。模型。添加(TimeDistributed(密度(100年,激活= ' relu ')))model.add (TimeDistributed(密度(1))):
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
這允許LSTM解碼器計算出輸出序列中每個步驟所需的上下文,以及用於單獨解釋每個時間步驟的被包裝的密集層,同時重用相同的權重來執行解釋。另一種方法是將LSTM解碼器創建的所有結構壓平,並直接輸出矢量。您可以嘗試將其作爲一個擴展,以查看它是如何進行比較的。因此,網絡輸出與輸入結構相同的三維向量,具有維數[樣本、時間步長、特徵]。它只有一個功能,即每天消耗的總電量,而且總是有7個功能。因此,一個單一的一週預測將有大小:[1,7,1]。因此,在對模型進行訓練時,我們必須對輸出數據(y)進行重構,使其具有三維結構,而不是上一節所使用的【sample, features】的二維結構。
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# train the model
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 0, 20, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
# univariate multi-step encoder-decoder lstm
from math import sqrt
from numpy import split
from numpy import array
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import LSTM
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
# split a univariate dataset into train/test sets
def split_dataset(data):
# split into standard weeks
train, test = data[1:-328], data[-328:-6]
# restructure into windows of weekly data
train = array(split(train, len(train)/7))
test = array(split(test, len(test)/7))
return train, test
# evaluate one or more weekly forecasts against expected values
def evaluate_forecasts(actual, predicted):
scores = list()
# calculate an RMSE score for each day
for i in range(actual.shape[1]):
# calculate mse
mse = mean_squared_error(actual[:, i], predicted[:, i])
# calculate rmse
rmse = sqrt(mse)
# store
scores.append(rmse)
# calculate overall RMSE
s = 0
for row in range(actual.shape[0]):
for col in range(actual.shape[1]):
s += (actual[row, col] - predicted[row, col])**2
score = sqrt(s / (actual.shape[0] * actual.shape[1]))
return score, scores
# summarize scores
def summarize_scores(name, score, scores):
s_scores = ', '.join(['%.1f' % s for s in scores])
print('%s: [%.3f] %s' % (name, score, s_scores))
# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=7):
# flatten data
data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
X, y = list(), list()
in_start = 0
# step over the entire history one time step at a time
for _ in range(len(data)):
# define the end of the input sequence
in_end = in_start + n_input
out_end = in_end + n_out
# ensure we have enough data for this instance
if out_end < len(data):
x_input = data[in_start:in_end, 0]
x_input = x_input.reshape((len(x_input), 1))
X.append(x_input)
y.append(data[in_end:out_end, 0])
# move along one time step
in_start += 1
return array(X), array(y)
# train the model
def build_model(train, n_input):
# prepare data
train_x, train_y = to_supervised(train, n_input)
# define parameters
verbose, epochs, batch_size = 0, 20, 16
n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
# reshape output into [samples, timesteps, features]
train_y = train_y.reshape((train_y.shape[0], train_y.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(RepeatVector(n_outputs))
model.add(LSTM(200, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(100, activation='relu')))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='adam')
# fit network
model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
return model
# make a forecast
def forecast(model, history, n_input):
# flatten data
data = array(history)
data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
# retrieve last observations for input data
input_x = data[-n_input:, 0]
# reshape into [1, n_input, 1]
input_x = input_x.reshape((1, len(input_x), 1))
# forecast the next week
yhat = model.predict(input_x, verbose=0)
# we only want the vector forecast
yhat = yhat[0]
return yhat
# evaluate a single model
def evaluate_model(train, test, n_input):
# fit model
model = build_model(train, n_input)
# history is a list of weekly data
history = [x for x in train]
# walk-forward validation over each week
predictions = list()
for i in range(len(test)):
# predict the week
yhat_sequence = forecast(model, history, n_input)
# store the predictions
predictions.append(yhat_sequence)
# get real observation and add to history for predicting the next week
history.append(test[i, :])
# evaluate predictions days for each week
predictions = array(predictions)
score, scores = evaluate_forecasts(test[:, :, 0], predictions)
return score, scores
# load the new file
dataset = read_csv('household_power_consumption_days.csv', header=0, infer_datetime_format=True, parse_dates=['datetime'], index_col=['datetime'])
# split into train and test
train, test = split_dataset(dataset.values)
# evaluate model and get scores
n_input = 14
score, scores = evaluate_model(train, test, n_input)
# summarize scores
summarize_scores('lstm', score, scores)
# plot scores
days = ['sun', 'mon', 'tue', 'wed', 'thr', 'fri', 'sat']
pyplot.plot(days, scores, marker='o', label='lstm')
pyplot.show()