本文介紹瞭如何使用網格搜索尋找網絡的最佳超參數配置。
文章目錄
代碼環境:
python -3.7.6
tensorflow -2.1.0
假設現在已經定義好了網絡模型,但需要對模型中的超參數進行微調。常用的方法是窮盡網格搜索(Exhaustive Grid Search)和隨機參數優化(Randomized Parameter Optimization)模型超參數。
顧名思義窮盡網格搜索即將所有需要選擇的超參數進行組合,並將每一種組合都用於評估,優點是可以爲每種參數組合測試性能;缺點是浪費計算資源,消耗時較長,在使用深度學習模型訓練大規模數據集時,可能並不適用。隨機參數優化是隨機的選擇參數組合進行評估,優點是添加參數數量也不會影響其性能;缺點是可能會漏掉最佳的參數組合。
爲了便於演示,本文通過單變量時間序列建模的參數搜索方法,介紹了網格搜索的建模流程。清楚了這個流程,也就很容易擴展到多變量時間序列建模問題。
1. 準備數據集
1. 首先,從gayhub,打錯了,github下載文件:點擊此處
彈窗中,保存爲 .csv
格式。
2. 加載數據
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 150
series = pd.read_csv('airline-passengers.csv', header=0, index_col=0)
print(series.shape) # (144, 1)
series.head()
輸出:
Passengers
Month
1949-01 112
1949-02 118
1949-03 132
1949-04 129
1949-05 121
該數據集是以月爲單位收集的,共有12年的數據,共計144個觀測值。在測試中,使用過去一年中的12個觀察值作爲測試集。
3. 粗略查看數據情況
series.plot()
輸出:
從上圖可以看出數據大致呈規律性變化。進一步查看數據:
plt.figure(figsize=(20,10))
plt.plot(series)
plt.xticks(rotation=90)
plt.tight_layout()
plt.grid()
輸出:
從上圖不難看出,該數據集具有明顯的趨勢和季節成分。季節性部分的週期爲12個月。
在本教程中,將介紹用於網格搜索的工具,但不會針對此問題優化模型超參數。取而代之的是,將演示如何通過網格搜索深度學習模型的超參數,以及如何與樸素模型進行比較來查找性能較好的模型。
2. 問題建模
本章節所用代碼僅作爲演示,沒有實際意義,熟悉建模流程即可。可以從第三第四章的完整代碼小節運行CNN和LSTM的完整代碼,分步驟講解的代碼部分可以不運行,看明白建模思路即可。
2.1 訓練集測試集拆分
第一步是將加載的序列分爲訓練集和測試集。使用前11年(132個觀測值)進行訓練,使用後一年的12個觀測值進行測試集。
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]
2.2 轉化爲監督學習問題
接下來,將單變量序列構造爲監督學習問題,以便訓練神經網絡模型。
監督的學習意味着需要將數據分成多個樣本,包含樣本數據和樣本標籤。樣本數據是一定數量的先前觀測值,例如三年的36個數據。樣本標籤是之後的觀測值,如果是一個,則爲單步預測如果是序列,則爲多步預測。常用的方法爲滑動窗口,按照一定的窗口寬度和滑動步長在原來的時間序列數據上滑動截取時間序列片段組成樣本。對於單變量時間序列數據的處理,可以通過 pandas DataFrame 的 shift() 函數實現此功能。定義一個函數來批量處理數據:
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values
2.3 前向驗證
可以使用前向驗證在測試集上評估時間序列預測模型。
前向驗證是一種常用方法,其中模型一次對測試數據集中的每個觀察結果進行預測。預測完成之後,將預測的真實觀測值添加到測試數據集中,並提供給模型。
首先定義一個通用的 model_fit()
函數來擬合模型,稍後可以用適合給定類型的神經網絡重寫。該函數獲取訓練數據集和模型配置,並返回適合進行預測的擬合模型。
def model_fit(train, config):
return None
然後,定義一個 model_predict()
函數,該函數接受擬合模型,訓練日誌和模型配置,並進行單步預測。
def model_predict(model, history, config):
return 0.0
計算預測值與真實值之間的均方根誤差(RMSE)。使用 scikit 計算工具中方法 mean_squared_error
實現此功能。
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))
下面列出了將所有這些封裝在一起的完整 walk_forward_validation()
函數。
它獲取數據集,用作測試集的觀察數以及模型的配置,然後返回模型在測試集上的RMSE。
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error
2.4 重複評估
神經網絡模型是隨機的。這意味着,在給定相同的模型配置和相同的訓練數據集的情況下,每次訓練模型都會產生不同的內部權重,進而會產生不同的性能。
爲了解決模型評估問題,通過前向驗證對模型配置進行多次評估,並打印每次評估中的平均誤差。 repeat_evaluate()
函數實現了此功能,將重複次數指定爲可選參數(默認爲10次),並返回所有重複評估的平均RMSE。
def repeat_evaluate(data, config, n_test, n_repeats=10):
# convert config to a key
key = str(config)
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
# summarize score
result = mean(scores)
print('> Model[%s] %.3f' % (key, result))
return (key, result)
2.5 網格搜索
定義一個 grid_search()
函數,該函數獲取數據集,要搜索的配置列表以及用作測試集並執行搜索的觀察值的數量。一旦爲每個配置計算了平均分數,就以升序對配置列表進行排序。
# grid search configs
def grid_search(data, cfg_list, n_test):
# evaluate configs
scores = [score_model(data, n_test, cfg) for cfg in cfg_list]
# sort configs by error, asc
scores.sort(key=lambda tup: tup[1])
return scores
2.6 實例化
爲了便於演示,此處不擬合模型。
# fit a model
def model_fit(train, config):
return None
使用該配置在相對於要用作預測的預測時間的先前觀測中定義索引偏移量列表。例如,12表示將使用相對於預測時間的12個月前(-12)的觀測值。
cfg_list = [1, 6, 12, 24, 36]
model_predict()
函數使用此配置將值保留在負的相對偏移處。
def model_predict(model, history, offset):
history[-offset]
完整代碼如下
from math import sqrt
from numpy import mean
from pandas import read_csv
from sklearn.metrics import mean_squared_error
# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]
# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))
# fit a model
def model_fit(train, config):
return None
# forecast with a pre-fit model
def model_predict(model, history, offset):
return history[-offset]
# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error
# score a model, return None on failure
def repeat_evaluate(data, config, n_test, n_repeats=10):
# convert config to a key
key = str(config)
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
result = mean(scores)
print('> Model[%s] %.3f' % (key, result))
return (key, result)
# grid search configs
def grid_search(data, cfg_list, n_test):
# evaluate configs
scores = [repeat_evaluate(data, cfg, n_test) for cfg in cfg_list]
# sort configs by error, asc
scores.sort(key=lambda tup: tup[1])
return scores
series = read_csv('airline-passengers.csv', header=0, index_col=0)
data = series.values
n_test = 12 # data split
cfg_list = [1, 6, 12, 24, 36] # model configs
scores = grid_search(data, cfg_list, n_test) # grid search
# list top 10 configs
for cfg, error in scores[:10]:
print(cfg, error)
輸出:
·
·
·
12 50.708316214732804
1 53.1515129919491
24 97.10990337413241
36 110.27352356753639
6 126.73495965991387
3. 使用網格搜索尋找CNN最佳超參數
3.1 超參數配置
在CNN模型中選擇的用於網格搜索的超參數集如下:
n_input
:用作模型輸入的先前輸入的數量(例如12個月)。n_filters
:卷積層中的過濾器映射數(例如32)。n_kernel
:卷積層中的內核大小(例如3)。n_epochs
:訓練時期的數量(例如1000)。n_batch
:每個小批量中要包括的樣本數(例如32)。n_diff
:差分順序(例如0或12)。
3.2 消除季節性影響
時間序列數據集可能包含趨勢和季節性,在建模之前需要將其刪除。趨勢可能導致隨時間變化的平均值,而季節性可能導致隨時間變化的變化,這兩者都將時間序列定義爲不穩定的。固定數據集是具有穩定均值和方差的數據集,反過來則更容易建模。差分是一種流行的且廣泛使用的數據轉換,用於使時間序列數據保持平穩。
通過函數實現:
# difference dataset
def difference(data, order):
return [data[i] - data[i - order] for i in range(order, len(data))]
3.3 問題建模
首先,解壓縮超參數列表。
n_input, n_nodes, n_epochs, n_batch, n_diff = config
然後準備數據。包括差分,將數據轉換爲有監督的格式並分離出數據樣本的輸入和輸出。
# prepare data
if n_diff > 0:
train = difference(train, n_diff)
# transform series into supervised format
data = series_to_supervised(train, n_in=n_input)
# separate inputs and outputs
train_x, train_y = data[:, :-1], data[:, -1]
爲了便於演示定義一個非常簡單的CNN模型,其中包含一個卷積層和一個最大池化層。
# define model
model = Sequential()
model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_input, n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
一維CNN模型期望數據具有形狀爲 [樣本,時間步長,特徵]
,其中特徵相當於圖片的通道數量,本例中,特徵數爲1,因爲樣本中只有一個時間序列。
# reshape input data into [samples, timesteps, features]
n_features = 1
train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], n_features))
訓練完成後,進行預測。如果數據存在差異,則必須對差異進行求逆以進行模型預測。
# invert difference
correction = 0.0
if n_diff > 0:
correction = history[-n_diff]
...
# correct forecast if it was differenced
return correction + yhat[0]
這也意味着history必須有所不同,以便用於進行預測的輸入數據具有預期的形式。
# calculate difference
history = difference(history, n_diff)
同樣,預測時也要求樣本具有三維形狀:
x_input = array(history[-n_input:]).reshape((1, n_input, 1))
最後,定義配置列表以供模型評估。可以通過定義超參數值列表來嘗試將其組合到列表中。爲了便於演示,列表中僅使用了較少的參數:
# create a list of configs to try
def model_configs():
# define scope of configs
n_input = [12]
n_filters = [64]
n_kernels = [3, 5]
n_epochs = [100]
n_batch = [1, 150]
n_diff = [0, 12]
# create configs
configs = list()
for a in n_input:
for b in n_filters:
for c in n_kernels:
for d in n_epochs:
for e in n_batch:
for f in n_diff:
cfg = [a,b,c,d,e,f]
configs.append(cfg)
print('Total configs: %d' % len(configs))
return configs
3.4 完整代碼
完整示例如下:
# grid search cnn for airline passengers
from math import sqrt
from numpy import array, mean
from pandas import DataFrame, concat, read_csv
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv1D, MaxPooling1D
# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]
# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values
# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))
# difference dataset
def difference(data, order):
return [data[i] - data[i - order] for i in range(order, len(data))]
# fit a model
def model_fit(train, config):
# unpack config
n_input, n_filters, n_kernel, n_epochs, n_batch, n_diff = config
# prepare data
if n_diff > 0:
train = difference(train, n_diff)
# transform series into supervised format
data = series_to_supervised(train, n_in=n_input)
# separate inputs and outputs
train_x, train_y = data[:, :-1], data[:, -1]
# reshape input data into [samples, timesteps, features]
n_features = 1
train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], n_features))
# define model
model = Sequential()
model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_input, n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
# fit
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
return model
# forecast with the fit model
def model_predict(model, history, config):
# unpack config
n_input, _, _, _, _, n_diff = config
# prepare data
correction = 0.0
if n_diff > 0:
correction = history[-n_diff]
history = difference(history, n_diff)
x_input = array(history[-n_input:]).reshape((1, n_input, 1))
# forecast
yhat = model.predict(x_input, verbose=0)
return correction + yhat[0]
# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error
# score a model, return None on failure
def repeat_evaluate(data, config, n_test, n_repeats=10):
# convert config to a key
key = str(config)
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
# summarize score
result = mean(scores)
print('> Model[%s] %.3f' % (key, result))
return (key, result)
# grid search configs
def grid_search(data, cfg_list, n_test):
# evaluate configs
scores = [repeat_evaluate(data, cfg, n_test) for cfg in cfg_list]
# sort configs by error, asc
scores.sort(key=lambda tup: tup[1])
return scores
# create a list of configs to try
def model_configs():
# define scope of configs
n_input = [12]
n_filters = [64]
n_kernels = [3, 5]
n_epochs = [100]
n_batch = [1, 150]
n_diff = [0, 12]
# create configs
configs = list()
for a in n_input:
for b in n_filters:
for c in n_kernels:
for d in n_epochs:
for e in n_batch:
for f in n_diff:
cfg = [a,b,c,d,e,f]
configs.append(cfg)
print('Total configs: %d' % len(configs))
return configs
# define dataset
series = read_csv('airline-passengers.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# model configs
cfg_list = model_configs()
# grid search
scores = grid_search(data, cfg_list, n_test)
print('done')
# list top 10 configs
for cfg, error in scores[:3]:
print(cfg, error)
輸出:
Total configs: 8
> 19.830
> 36.538
> 20.831
> 14.879
> 15.430
> 27.859
> 18.334
> 19.638
> 38.763
> 17.863
> Model[[12, 64, 3, 100, 1, 0]] 22.996
> 21.689
> 22.089
> 19.215
> 19.673
> 19.034
> 20.625
> 21.101
> 22.956
> 21.040
> 20.172
> Model[[12, 64, 3, 100, 1, 12]] 20.759
> 76.625
> 92.361
> 74.684
> 79.868
> 68.876
> 67.716
> 69.328
> 84.085
> 83.975
> 81.078
> Model[[12, 64, 3, 100, 150, 0]] 77.860
> 20.030
> 19.528
> 18.952
> 19.272
> 18.609
> 18.864
> 18.208
> 18.807
> 18.917
> 19.947
> Model[[12, 64, 3, 100, 150, 12]] 19.113
> 20.138
> 19.909
> 17.370
> 18.406
> 22.221
> 28.303
> 19.581
> 22.390
> 16.829
> 19.677
> Model[[12, 64, 5, 100, 1, 0]] 20.482
> 19.413
> 19.652
> 17.216
> 17.231
> 18.810
> 20.773
> 18.362
> 19.963
> 18.021
> 20.275
> Model[[12, 64, 5, 100, 1, 12]] 18.972
> 76.514
> 80.932
> 87.354
> 86.463
> 100.270
> 91.679
> 80.649
> 82.810
> 82.999
> 80.925
> Model[[12, 64, 5, 100, 150, 0]] 85.059
> 19.242
> 20.007
> 19.265
> 18.623
> 19.383
> 19.418
> 19.389
> 20.158
> 18.342
> 18.605
> Model[[12, 64, 5, 100, 150, 12]] 19.243
done
[12, 64, 5, 100, 1, 12] 18.97153329122043
[12, 64, 3, 100, 150, 12] 19.113480219230794
[12, 64, 5, 100, 150, 12] 19.243291915179842
其他網絡,比如LSTM也是類似的方法,只不過需要調整輸入,並將隱藏單元數設置爲超參數。
參考:https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/