本文介绍了如何使用网格搜索寻找网络的最佳超参数配置。
文章目录
代码环境:
python -3.7.6
tensorflow -2.1.0
假设现在已经定义好了网络模型,但需要对模型中的超参数进行微调。常用的方法是穷尽网格搜索(Exhaustive Grid Search)和随机参数优化(Randomized Parameter Optimization)模型超参数。
顾名思义穷尽网格搜索即将所有需要选择的超参数进行组合,并将每一种组合都用于评估,优点是可以为每种参数组合测试性能;缺点是浪费计算资源,消耗时较长,在使用深度学习模型训练大规模数据集时,可能并不适用。随机参数优化是随机的选择参数组合进行评估,优点是添加参数数量也不会影响其性能;缺点是可能会漏掉最佳的参数组合。
为了便于演示,本文通过单变量时间序列建模的参数搜索方法,介绍了网格搜索的建模流程。清楚了这个流程,也就很容易扩展到多变量时间序列建模问题。
1. 准备数据集
1. 首先,从gayhub,打错了,github下载文件:点击此处
弹窗中,保存为 .csv
格式。
2. 加载数据
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 150
series = pd.read_csv('airline-passengers.csv', header=0, index_col=0)
print(series.shape) # (144, 1)
series.head()
输出:
Passengers
Month
1949-01 112
1949-02 118
1949-03 132
1949-04 129
1949-05 121
该数据集是以月为单位收集的,共有12年的数据,共计144个观测值。在测试中,使用过去一年中的12个观察值作为测试集。
3. 粗略查看数据情况
series.plot()
输出:
从上图可以看出数据大致呈规律性变化。进一步查看数据:
plt.figure(figsize=(20,10))
plt.plot(series)
plt.xticks(rotation=90)
plt.tight_layout()
plt.grid()
输出:
从上图不难看出,该数据集具有明显的趋势和季节成分。季节性部分的周期为12个月。
在本教程中,将介绍用于网格搜索的工具,但不会针对此问题优化模型超参数。取而代之的是,将演示如何通过网格搜索深度学习模型的超参数,以及如何与朴素模型进行比较来查找性能较好的模型。
2. 问题建模
本章节所用代码仅作为演示,没有实际意义,熟悉建模流程即可。可以从第三第四章的完整代码小节运行CNN和LSTM的完整代码,分步骤讲解的代码部分可以不运行,看明白建模思路即可。
2.1 训练集测试集拆分
第一步是将加载的序列分为训练集和测试集。使用前11年(132个观测值)进行训练,使用后一年的12个观测值进行测试集。
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]
2.2 转化为监督学习问题
接下来,将单变量序列构造为监督学习问题,以便训练神经网络模型。
监督的学习意味着需要将数据分成多个样本,包含样本数据和样本标签。样本数据是一定数量的先前观测值,例如三年的36个数据。样本标签是之后的观测值,如果是一个,则为单步预测如果是序列,则为多步预测。常用的方法为滑动窗口,按照一定的窗口宽度和滑动步长在原来的时间序列数据上滑动截取时间序列片段组成样本。对於单变量时间序列数据的处理,可以通过 pandas DataFrame 的 shift() 函数实现此功能。定义一个函数来批量处理数据:
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values
2.3 前向验证
可以使用前向验证在测试集上评估时间序列预测模型。
前向验证是一种常用方法,其中模型一次对测试数据集中的每个观察结果进行预测。预测完成之后,将预测的真实观测值添加到测试数据集中,并提供给模型。
首先定义一个通用的 model_fit()
函数来拟合模型,稍后可以用适合给定类型的神经网络重写。该函数获取训练数据集和模型配置,并返回适合进行预测的拟合模型。
def model_fit(train, config):
return None
然后,定义一个 model_predict()
函数,该函数接受拟合模型,训练日志和模型配置,并进行单步预测。
def model_predict(model, history, config):
return 0.0
计算预测值与真实值之间的均方根误差(RMSE)。使用 scikit 计算工具中方法 mean_squared_error
实现此功能。
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))
下面列出了将所有这些封装在一起的完整 walk_forward_validation()
函数。
它获取数据集,用作测试集的观察数以及模型的配置,然后返回模型在测试集上的RMSE。
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error
2.4 重复评估
神经网络模型是随机的。这意味着,在给定相同的模型配置和相同的训练数据集的情况下,每次训练模型都会产生不同的内部权重,进而会产生不同的性能。
为了解决模型评估问题,通过前向验证对模型配置进行多次评估,并打印每次评估中的平均误差。 repeat_evaluate()
函数实现了此功能,将重复次数指定为可选参数(默认为10次),并返回所有重复评估的平均RMSE。
def repeat_evaluate(data, config, n_test, n_repeats=10):
# convert config to a key
key = str(config)
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
# summarize score
result = mean(scores)
print('> Model[%s] %.3f' % (key, result))
return (key, result)
2.5 网格搜索
定义一个 grid_search()
函数,该函数获取数据集,要搜索的配置列表以及用作测试集并执行搜索的观察值的数量。一旦为每个配置计算了平均分数,就以升序对配置列表进行排序。
# grid search configs
def grid_search(data, cfg_list, n_test):
# evaluate configs
scores = [score_model(data, n_test, cfg) for cfg in cfg_list]
# sort configs by error, asc
scores.sort(key=lambda tup: tup[1])
return scores
2.6 实例化
为了便于演示,此处不拟合模型。
# fit a model
def model_fit(train, config):
return None
使用该配置在相对于要用作预测的预测时间的先前观测中定义索引偏移量列表。例如,12表示将使用相对于预测时间的12个月前(-12)的观测值。
cfg_list = [1, 6, 12, 24, 36]
model_predict()
函数使用此配置将值保留在负的相对偏移处。
def model_predict(model, history, offset):
history[-offset]
完整代码如下
from math import sqrt
from numpy import mean
from pandas import read_csv
from sklearn.metrics import mean_squared_error
# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]
# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))
# fit a model
def model_fit(train, config):
return None
# forecast with a pre-fit model
def model_predict(model, history, offset):
return history[-offset]
# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error
# score a model, return None on failure
def repeat_evaluate(data, config, n_test, n_repeats=10):
# convert config to a key
key = str(config)
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
result = mean(scores)
print('> Model[%s] %.3f' % (key, result))
return (key, result)
# grid search configs
def grid_search(data, cfg_list, n_test):
# evaluate configs
scores = [repeat_evaluate(data, cfg, n_test) for cfg in cfg_list]
# sort configs by error, asc
scores.sort(key=lambda tup: tup[1])
return scores
series = read_csv('airline-passengers.csv', header=0, index_col=0)
data = series.values
n_test = 12 # data split
cfg_list = [1, 6, 12, 24, 36] # model configs
scores = grid_search(data, cfg_list, n_test) # grid search
# list top 10 configs
for cfg, error in scores[:10]:
print(cfg, error)
输出:
·
·
·
12 50.708316214732804
1 53.1515129919491
24 97.10990337413241
36 110.27352356753639
6 126.73495965991387
3. 使用网格搜索寻找CNN最佳超参数
3.1 超参数配置
在CNN模型中选择的用于网格搜索的超参数集如下:
n_input
:用作模型输入的先前输入的数量(例如12个月)。n_filters
:卷积层中的过滤器映射数(例如32)。n_kernel
:卷积层中的内核大小(例如3)。n_epochs
:训练时期的数量(例如1000)。n_batch
:每个小批量中要包括的样本数(例如32)。n_diff
:差分顺序(例如0或12)。
3.2 消除季节性影响
时间序列数据集可能包含趋势和季节性,在建模之前需要将其删除。趋势可能导致随时间变化的平均值,而季节性可能导致随时间变化的变化,这两者都将时间序列定义为不稳定的。固定数据集是具有稳定均值和方差的数据集,反过来则更容易建模。差分是一种流行的且广泛使用的数据转换,用于使时间序列数据保持平稳。
通过函数实现:
# difference dataset
def difference(data, order):
return [data[i] - data[i - order] for i in range(order, len(data))]
3.3 问题建模
首先,解压缩超参数列表。
n_input, n_nodes, n_epochs, n_batch, n_diff = config
然后准备数据。包括差分,将数据转换为有监督的格式并分离出数据样本的输入和输出。
# prepare data
if n_diff > 0:
train = difference(train, n_diff)
# transform series into supervised format
data = series_to_supervised(train, n_in=n_input)
# separate inputs and outputs
train_x, train_y = data[:, :-1], data[:, -1]
为了便于演示定义一个非常简单的CNN模型,其中包含一个卷积层和一个最大池化层。
# define model
model = Sequential()
model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_input, n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
一维CNN模型期望数据具有形状为 [样本,时间步长,特征]
,其中特征相当于图片的通道数量,本例中,特征数为1,因为样本中只有一个时间序列。
# reshape input data into [samples, timesteps, features]
n_features = 1
train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], n_features))
训练完成后,进行预测。如果数据存在差异,则必须对差异进行求逆以进行模型预测。
# invert difference
correction = 0.0
if n_diff > 0:
correction = history[-n_diff]
...
# correct forecast if it was differenced
return correction + yhat[0]
这也意味着history必须有所不同,以便用于进行预测的输入数据具有预期的形式。
# calculate difference
history = difference(history, n_diff)
同样,预测时也要求样本具有三维形状:
x_input = array(history[-n_input:]).reshape((1, n_input, 1))
最后,定义配置列表以供模型评估。可以通过定义超参数值列表来尝试将其组合到列表中。为了便于演示,列表中仅使用了较少的参数:
# create a list of configs to try
def model_configs():
# define scope of configs
n_input = [12]
n_filters = [64]
n_kernels = [3, 5]
n_epochs = [100]
n_batch = [1, 150]
n_diff = [0, 12]
# create configs
configs = list()
for a in n_input:
for b in n_filters:
for c in n_kernels:
for d in n_epochs:
for e in n_batch:
for f in n_diff:
cfg = [a,b,c,d,e,f]
configs.append(cfg)
print('Total configs: %d' % len(configs))
return configs
3.4 完整代码
完整示例如下:
# grid search cnn for airline passengers
from math import sqrt
from numpy import array, mean
from pandas import DataFrame, concat, read_csv
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv1D, MaxPooling1D
# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test], data[-n_test:]
# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
agg.dropna(inplace=True)
return agg.values
# root mean squared error or rmse
def measure_rmse(actual, predicted):
return sqrt(mean_squared_error(actual, predicted))
# difference dataset
def difference(data, order):
return [data[i] - data[i - order] for i in range(order, len(data))]
# fit a model
def model_fit(train, config):
# unpack config
n_input, n_filters, n_kernel, n_epochs, n_batch, n_diff = config
# prepare data
if n_diff > 0:
train = difference(train, n_diff)
# transform series into supervised format
data = series_to_supervised(train, n_in=n_input)
# separate inputs and outputs
train_x, train_y = data[:, :-1], data[:, -1]
# reshape input data into [samples, timesteps, features]
n_features = 1
train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], n_features))
# define model
model = Sequential()
model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_input, n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
# fit
model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
return model
# forecast with the fit model
def model_predict(model, history, config):
# unpack config
n_input, _, _, _, _, n_diff = config
# prepare data
correction = 0.0
if n_diff > 0:
correction = history[-n_diff]
history = difference(history, n_diff)
x_input = array(history[-n_input:]).reshape((1, n_input, 1))
# forecast
yhat = model.predict(x_input, verbose=0)
return correction + yhat[0]
# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# fit model
model = model_fit(train, cfg)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# fit model and make forecast for history
yhat = model_predict(model, history, cfg)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# estimate prediction error
error = measure_rmse(test, predictions)
print(' > %.3f' % error)
return error
# score a model, return None on failure
def repeat_evaluate(data, config, n_test, n_repeats=10):
# convert config to a key
key = str(config)
# fit and evaluate the model n times
scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
# summarize score
result = mean(scores)
print('> Model[%s] %.3f' % (key, result))
return (key, result)
# grid search configs
def grid_search(data, cfg_list, n_test):
# evaluate configs
scores = [repeat_evaluate(data, cfg, n_test) for cfg in cfg_list]
# sort configs by error, asc
scores.sort(key=lambda tup: tup[1])
return scores
# create a list of configs to try
def model_configs():
# define scope of configs
n_input = [12]
n_filters = [64]
n_kernels = [3, 5]
n_epochs = [100]
n_batch = [1, 150]
n_diff = [0, 12]
# create configs
configs = list()
for a in n_input:
for b in n_filters:
for c in n_kernels:
for d in n_epochs:
for e in n_batch:
for f in n_diff:
cfg = [a,b,c,d,e,f]
configs.append(cfg)
print('Total configs: %d' % len(configs))
return configs
# define dataset
series = read_csv('airline-passengers.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# model configs
cfg_list = model_configs()
# grid search
scores = grid_search(data, cfg_list, n_test)
print('done')
# list top 10 configs
for cfg, error in scores[:3]:
print(cfg, error)
输出:
Total configs: 8
> 19.830
> 36.538
> 20.831
> 14.879
> 15.430
> 27.859
> 18.334
> 19.638
> 38.763
> 17.863
> Model[[12, 64, 3, 100, 1, 0]] 22.996
> 21.689
> 22.089
> 19.215
> 19.673
> 19.034
> 20.625
> 21.101
> 22.956
> 21.040
> 20.172
> Model[[12, 64, 3, 100, 1, 12]] 20.759
> 76.625
> 92.361
> 74.684
> 79.868
> 68.876
> 67.716
> 69.328
> 84.085
> 83.975
> 81.078
> Model[[12, 64, 3, 100, 150, 0]] 77.860
> 20.030
> 19.528
> 18.952
> 19.272
> 18.609
> 18.864
> 18.208
> 18.807
> 18.917
> 19.947
> Model[[12, 64, 3, 100, 150, 12]] 19.113
> 20.138
> 19.909
> 17.370
> 18.406
> 22.221
> 28.303
> 19.581
> 22.390
> 16.829
> 19.677
> Model[[12, 64, 5, 100, 1, 0]] 20.482
> 19.413
> 19.652
> 17.216
> 17.231
> 18.810
> 20.773
> 18.362
> 19.963
> 18.021
> 20.275
> Model[[12, 64, 5, 100, 1, 12]] 18.972
> 76.514
> 80.932
> 87.354
> 86.463
> 100.270
> 91.679
> 80.649
> 82.810
> 82.999
> 80.925
> Model[[12, 64, 5, 100, 150, 0]] 85.059
> 19.242
> 20.007
> 19.265
> 18.623
> 19.383
> 19.418
> 19.389
> 20.158
> 18.342
> 18.605
> Model[[12, 64, 5, 100, 150, 12]] 19.243
done
[12, 64, 5, 100, 1, 12] 18.97153329122043
[12, 64, 3, 100, 150, 12] 19.113480219230794
[12, 64, 5, 100, 150, 12] 19.243291915179842
其他网络,比如LSTM也是类似的方法,只不过需要调整输入,并将隐藏单元数设置为超参数。
参考:https://machinelearningmastery.com/how-to-grid-search-deep-learning-models-for-time-series-forecasting/