超參數優化是深度學習中的重要組成部分。其原因在於，神經網絡是公認的難以配置，而又有很多參數需要設置。最重要的是，個別模型的訓練非常緩慢。

在這篇文章中，你會了解到如何使用scikit-learn python機器學習庫中的網格搜索功能調整Keras深度學習模型中的超參數。

閱讀本文後，你就會了解：

如何包裝Keras模型以便在scikit-learn中使用，以及如何使用網格搜索。
如何網格搜索常見的神經網絡參數，如學習速率、 dropout 率、epochs 和神經元數量。
如何設計自己的超參數優化實驗。

概述

本文主要想爲大家介紹如何使用scikit-learn網格搜索功能，並給出一套代碼實例。你可以將代碼複製粘貼到自己的項目中，作爲項目起始。

下文所涉及的議題列表：

如何在scikit-learn模型中使用Keras。
如何在scikit-learn模型中使用網格搜索。
如何調優批尺寸和訓練epochs。
如何調優優化算法。
如何調優學習率和動量因子。
如何確定網絡權值初始值。
如何選擇神經元激活函數。
如何調優Dropout正則化。
如何確定隱藏層中的神經元的數量。

如何在scikit-learn模型中使用Keras

通過用KerasClassifier或KerasRegressor類包裝Keras模型，可將其用於scikit-learn。

要使用這些包裝，必須定義一個函數，以便按順序模式創建並返回Keras，然後當構建KerasClassifier類時，把該函數傳遞給build_fn參數。

例如：

def create_model():
    ...
    return model

model = KerasClassifier(build_fn=create_model)
KerasClassifier類的構建器爲可以採取默認參數，並將其被傳遞給model.fit()的調用函數，比如 epochs數目和批尺寸（batch size)。

例如：

def create_model():
    ...
    return model

model = KerasClassifier(build_fn=create_model, nb_epoch=10)
KerasClassifier類的構造也可以使用新的參數，使之能夠傳遞給自定義的create_model()函數。這些新的參數，也必須由使用默認參數的 create_model() 函數的簽名定義。

例如：

def create_model(dropout_rate=0.0):
    ...
    return model

model = KerasClassifier(build_fn=create_model, dropout_rate=0.2)
您可以在Keras API文檔中，瞭解到更多關於scikit-learn包裝器的知識。

如何在scikit-learn模型中使用網格搜索

網格搜索（grid search）是一項模型超參數優化技術。

在scikit-learn中，該技術由GridSearchCV類提供。

當構造該類時，你必須提供超參數字典，以便用來評價param_grid參數。這是模型參數名稱和大量列值的示意圖。

默認情況下，精確度是優化的核心，但其他核心可指定用於GridSearchCV構造函數的score參數。

默認情況下，網格搜索只使用一個線程。在GridSearchCV構造函數中，通過將 n_jobs參數設置爲-1，則進程將使用計算機上的所有內核。這取決於你的Keras後端，並可能干擾主神經網絡的訓練過程。

當構造並評估一個模型中各個參數的組合時，GridSearchCV會起作用。使用交叉驗證評估每個單個模型，且默認使用3層交叉驗證，儘管通過將cv參數指定給 GridSearchCV構造函數時，有可能將其覆蓋。

下面是定義一個簡單的網格搜索示例：

param_grid = dict(nb_epochs=[10,20,30])
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)

一旦完成，你可以訪問網格搜索的輸出，該輸出來自結果對象，由grid.fit()返回。best_score_成員提供優化過程期間觀察到的最好的評分， best_params_描述了已取得最佳結果的參數的組合。

您可以在scikit-learn API文檔中瞭解更多關於GridSearchCV類的知識。

問題描述
現在我們知道了如何使用scikit-learn 的Keras模型，如何使用scikit-learn 的網格搜索。現在一起看看下面的例子。

所有的例子都將在一個小型的標準機器學習數據集上來演示，該數據集被稱爲Pima Indians onset of diabetes 分類數據集。該小型數據集包括了所有容易工作的數值屬性。

下載數據集，並把它放置在你目前工作目錄下，命名爲：pima-indians-diabetes.csv。

當我們按照本文中的例子進行，能夠獲得最佳參數。因爲參數可相互影響，所以這不是網格搜索的最佳方法，但出於演示目的，它是很好的方法。

注意並行化網格搜索
所有示例的配置爲了實現並行化（n_jobs=-1）。

如果顯示像下面這樣的錯誤：

INFO (theano.gof.compilelock): Waiting for existing lock by process ‘55614’ (I am process ‘55613’)
INFO (theano.gof.compilelock): To manually release the lock, delete …
結束進程，並修改代碼，以便不併行地執行網格搜索，設置n_jobs=1。

如何調優批尺寸和訓練epochs

在第一個簡單的例子中，當調整網絡時，我們着眼於調整批尺寸和訓練epochs。

迭代梯度下降的批尺寸大小是權重更新之前顯示給網絡的模式數量。它也是在網絡訓練的優選法，定義一次讀取的模式數並保持在內存中。

訓練epochs是訓練期間整個訓練數據集顯示給網絡的次數。有些網絡對批尺寸大小敏感，如LSTM複發性神經網絡和卷積神經網絡。

在這裏，我們將以20的步長，從10到100逐步評估不同的微型批尺寸。

完整代碼如下：

#Use scikit-learn to grid search the batch size and epochs
import numpy
from sklearn.grid_search import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, nb_epoch=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
for params, mean_score, scores in grid_result.grid_scores_:
    print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))

運行之後輸出如下：

Best: 0.686198 using {'nb_epoch': 100, 'batch_size': 20}
0.348958 (0.024774) with: {'nb_epoch': 10, 'batch_size': 10}
0.348958 (0.024774) with: {'nb_epoch': 50, 'batch_size': 10}
0.466146 (0.149269) with: {'nb_epoch': 100, 'batch_size': 10}
0.647135 (0.021236) with: {'nb_epoch': 10, 'batch_size': 20}
0.660156 (0.014616) with: {'nb_epoch': 50, 'batch_size': 20}
0.686198 (0.024774) with: {'nb_epoch': 100, 'batch_size': 20}
0.489583 (0.075566) with: {'nb_epoch': 10, 'batch_size': 40}
0.652344 (0.019918) with: {'nb_epoch': 50, 'batch_size': 40}
0.654948 (0.027866) with: {'nb_epoch': 100, 'batch_size': 40}
0.518229 (0.032264) with: {'nb_epoch': 10, 'batch_size': 60}
0.605469 (0.052213) with: {'nb_epoch': 50, 'batch_size': 60}
0.665365 (0.004872) with: {'nb_epoch': 100, 'batch_size': 60}
0.537760 (0.143537) with: {'nb_epoch': 10, 'batch_size': 80}
0.591146 (0.094954) with: {'nb_epoch': 50, 'batch_size': 80}
0.658854 (0.054904) with: {'nb_epoch': 100, 'batch_size': 80}
0.402344 (0.107735) with: {'nb_epoch': 10, 'batch_size': 100}
0.652344 (0.033299) with: {'nb_epoch': 50, 'batch_size': 100}
0.542969 (0.157934) with: {'nb_epoch': 100, 'batch_size': 100}

我們可以看到，批尺寸爲20、100 epochs能夠獲得最好的結果，精確度約68％。

如何調優訓練優化算法

Keras提供了一套最先進的不同的優化算法。

在這個例子中，我們調整用來訓練網絡的優化算法，每個都用默認參數。

這個例子有點奇怪，因爲往往你會先選擇一種方法，而不是將重點放在調整問題參數上（參見下一個示例）。

在這裏，我們將評估Keras API支持的整套優化算法。

完整代碼如下：

# Use scikit-learn to grid search the batch size and epochs
import numpy
from sklearn.grid_search import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(optimizer='adam'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)
# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
for params, mean_score, scores in grid_result.grid_scores_:
    print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))

運行之後輸出如下：

Best: 0.704427 using {'optimizer': 'Adam'}
0.348958 (0.024774) with: {'optimizer': 'SGD'}
0.348958 (0.024774) with: {'optimizer': 'RMSprop'}
0.471354 (0.156586) with: {'optimizer': 'Adagrad'}
0.669271 (0.029635) with: {'optimizer': 'Adadelta'}
0.704427 (0.031466) with: {'optimizer': 'Adam'}
0.682292 (0.016367) with: {'optimizer': 'Adamax'}
0.703125 (0.003189) with: {'optimizer': 'Nadam'}

結果表明，ATOM優化算法結果最好，精確度約爲70％。

如何優化學習速率和動量因子？

預先選擇一個優化算法來訓練你的網絡和參數調整是十分常見的。目前，最常用的優化算法是普通的隨機梯度下降法（Stochastic Gradient Descent，SGD），因爲它十分易於理解。在本例中，我們將着眼於優化SGD的學習速率和動量因子（momentum）。

學習速率控制每批（batch）結束時更新的權重，動量因子控制上次權重的更新對本次權重更新的影響程度。

我們選取了一組較小的學習速率和動量因子的取值範圍：從0.2到0.8，步長爲0.2，以及0.9（實際中常用參數值）。

一般來說，在優化算法中包含epoch的數目是一個好主意，因爲每批（batch）學習量（學習速率）、每個 epoch更新的數目（批尺寸）和 epoch的數量之間都具有相關性。

完整代碼如下：

# Use scikit-learn to grid search the learning rate and momentum
import numpy
from sklearn.grid_search import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD
# Function to create model, required for KerasClassifier
def create_model(learn_rate=0.01, momentum=0):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    optimizer = SGD(lr=learn_rate, momentum=momentum)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)
# define the grid search parameters
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
for params, mean_score, scores in grid_result.grid_scores_:
    print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))

運行之後輸出如下：

Best: 0.680990 using {'learn_rate': 0.01, 'momentum': 0.0}
0.348958 (0.024774) with: {'learn_rate': 0.001, 'momentum': 0.0}
0.348958 (0.024774) with: {'learn_rate': 0.001, 'momentum': 0.2}
0.467448 (0.151098) with: {'learn_rate': 0.001, 'momentum': 0.4}
0.662760 (0.012075) with: {'learn_rate': 0.001, 'momentum': 0.6}
0.669271 (0.030647) with: {'learn_rate': 0.001, 'momentum': 0.8}
0.666667 (0.035564) with: {'learn_rate': 0.001, 'momentum': 0.9}
0.680990 (0.024360) with: {'learn_rate': 0.01, 'momentum': 0.0}
0.677083 (0.026557) with: {'learn_rate': 0.01, 'momentum': 0.2}
0.427083 (0.134575) with: {'learn_rate': 0.01, 'momentum': 0.4}
0.427083 (0.134575) with: {'learn_rate': 0.01, 'momentum': 0.6}
0.544271 (0.146518) with: {'learn_rate': 0.01, 'momentum': 0.8}
0.651042 (0.024774) with: {'learn_rate': 0.01, 'momentum': 0.9}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.0}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.2}
0.572917 (0.134575) with: {'learn_rate': 0.1, 'momentum': 0.4}
0.572917 (0.134575) with: {'learn_rate': 0.1, 'momentum': 0.6}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.8}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.9}
0.533854 (0.149269) with: {'learn_rate': 0.2, 'momentum': 0.0}
0.427083 (0.134575) with: {'learn_rate': 0.2, 'momentum': 0.2}
0.427083 (0.134575) with: {'learn_rate': 0.2, 'momentum': 0.4}
0.651042 (0.024774) with: {'learn_rate': 0.2, 'momentum': 0.6}
0.651042 (0.024774) with: {'learn_rate': 0.2, 'momentum': 0.8}
0.651042 (0.024774) with: {'learn_rate': 0.2, 'momentum': 0.9}
0.455729 (0.146518) with: {'learn_rate': 0.3, 'momentum': 0.0}
0.455729 (0.146518) with: {'learn_rate': 0.3, 'momentum': 0.2}
0.455729 (0.146518) with: {'learn_rate': 0.3, 'momentum': 0.4}
0.348958 (0.024774) with: {'learn_rate': 0.3, 'momentum': 0.6}
0.348958 (0.024774) with: {'learn_rate': 0.3, 'momentum': 0.8}
0.348958 (0.024774) with: {'learn_rate': 0.3, 'momentum': 0.9}

可以看到，SGD在該問題上相對錶現不是很好，但當學習速率爲0.01、動量因子爲0.0時可取得最好的結果，正確率約爲68%。

如何調優網絡權值初始化

神經網絡權值初始化一度十分簡單：採用小的隨機數即可。

現在，有許多不同的技術可供選擇。點擊此處查看Keras 提供的清單。

在本例中，我們將着眼於通過評估所有可用的技術，來調優網絡權值初始化的選擇。

我們將在每一層採用相同的權值初始化方法。理想情況下，根據每層使用的激活函數選用不同的權值初始化方法效果可能更好。在下面的例子中，我們在隱藏層使用了整流器（rectifier）。因爲預測是二進制，因此在輸出層使用了sigmoid函數。

完整代碼如下：

# Use scikit-learn to grid search the weight initialization
import numpy
from sklearn.grid_search import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(init_mode='uniform'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, init=init_mode, activation='relu'))
    model.add(Dense(1, init=init_mode, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)
# define the grid search parameters
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']
param_grid = dict(init_mode=init_mode)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
for params, mean_score, scores in grid_result.grid_scores_:
    print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))

運行之後輸出如下：

Best: 0.720052 using {'init_mode': 'uniform'}
0.720052 (0.024360) with: {'init_mode': 'uniform'}
0.348958 (0.024774) with: {'init_mode': 'lecun_uniform'}
0.712240 (0.012075) with: {'init_mode': 'normal'}
0.651042 (0.024774) with: {'init_mode': 'zero'}
0.700521 (0.010253) with: {'init_mode': 'glorot_normal'}
0.674479 (0.011201) with: {'init_mode': 'glorot_uniform'}
0.661458 (0.028940) with: {'init_mode': 'he_normal'}
0.678385 (0.004872) with: {'init_mode': 'he_uniform'}

我們可以看到，當採用均勻權值初始化方案（uniform weight initialization ）時取得最好的結果，可以實現約72%的性能。

如何選擇神經元激活函數

激活函數控制着單個神經元的非線性以及何時激活。

通常來說，整流器（rectifier）的激活功能是最受歡迎的，但應對不同的問題， sigmoid函數和tanh 函數可能是更好的選擇。

在本例中，我們將探討、評估、比較Keras提供的不同類型的激活函數。我們僅在隱層中使用這些函數。考慮到二元分類問題，需要在輸出層使用sigmoid激活函數。

通常而言，爲不同範圍的傳遞函數準備數據是一個好主意，但在本例中我們不會這麼做。

完整代碼如下：

# Use scikit-learn to grid search the activation function
import numpy
from sklearn.grid_search import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(activation='relu'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, init='uniform', activation=activation))
    model.add(Dense(1, init='uniform', activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)
# define the grid search parameters
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
for params, mean_score, scores in grid_result.grid_scores_:
    print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))

運行之後輸出如下：

Best: 0.722656 using {'activation': 'linear'}
0.649740 (0.009744) with: {'activation': 'softmax'}
0.720052 (0.032106) with: {'activation': 'softplus'}
0.688802 (0.019225) with: {'activation': 'softsign'}
0.720052 (0.018136) with: {'activation': 'relu'}
0.691406 (0.019401) with: {'activation': 'tanh'}
0.680990 (0.009207) with: {'activation': 'sigmoid'}
0.691406 (0.014616) with: {'activation': 'hard_sigmoid'}
0.722656 (0.003189) with: {'activation': 'linear'}

令人驚訝的是（至少對我來說是），“線性（linear）”激活函數取得了最好的效果，準確率約爲72%。

如何調優Dropout正則化

在本例中，我們將着眼於調整正則化中的dropout速率，以期限制過擬合（overfitting）和提高模型的泛化能力。爲了得到較好的結果，dropout最好結合一個如最大範數約束之類的權值約束。

瞭解更多dropout在深度學習框架Keras的使用請查看下面這篇文章：

基於Keras/Python的深度學習模型Dropout正則項
它涉及到擬合dropout率和權值約束。我們選定dropout percentages取值範圍是：0.0-0.9（1.0無意義）；最大範數權值約束（ maxnorm weight constraint）的取值範圍是0-5。

完整代碼如下：

# Use scikit-learn to grid search the dropout rate
import numpy
from sklearn.grid_search import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import maxnorm
# Function to create model, required for KerasClassifier
def create_model(dropout_rate=0.0, weight_constraint=0):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, init='uniform', activation='linear', W_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, init='uniform', activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)
# define the grid search parameters
weight_constraint = [1, 2, 3, 4, 5]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
param_grid = dict(dropout_rate=dropout_rate, weight_constraint=weight_constraint)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
for params, mean_score, scores in grid_result.grid_scores_:
    print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))

運行之後輸出如下：

Best: 0.723958 using {'dropout_rate': 0.2, 'weight_constraint': 4}
0.696615 (0.031948) with: {'dropout_rate': 0.0, 'weight_constraint': 1}
0.696615 (0.031948) with: {'dropout_rate': 0.0, 'weight_constraint': 2}
0.691406 (0.026107) with: {'dropout_rate': 0.0, 'weight_constraint': 3}
0.708333 (0.009744) with: {'dropout_rate': 0.0, 'weight_constraint': 4}
0.708333 (0.009744) with: {'dropout_rate': 0.0, 'weight_constraint': 5}
0.710937 (0.008438) with: {'dropout_rate': 0.1, 'weight_constraint': 1}
0.709635 (0.007366) with: {'dropout_rate': 0.1, 'weight_constraint': 2}
0.709635 (0.007366) with: {'dropout_rate': 0.1, 'weight_constraint': 3}
0.695312 (0.012758) with: {'dropout_rate': 0.1, 'weight_constraint': 4}
0.695312 (0.012758) with: {'dropout_rate': 0.1, 'weight_constraint': 5}
0.701823 (0.017566) with: {'dropout_rate': 0.2, 'weight_constraint': 1}
0.710938 (0.009568) with: {'dropout_rate': 0.2, 'weight_constraint': 2}
0.710938 (0.009568) with: {'dropout_rate': 0.2, 'weight_constraint': 3}
0.723958 (0.027126) with: {'dropout_rate': 0.2, 'weight_constraint': 4}
0.718750 (0.030425) with: {'dropout_rate': 0.2, 'weight_constraint': 5}
0.721354 (0.032734) with: {'dropout_rate': 0.3, 'weight_constraint': 1}
0.707031 (0.036782) with: {'dropout_rate': 0.3, 'weight_constraint': 2}
0.707031 (0.036782) with: {'dropout_rate': 0.3, 'weight_constraint': 3}
0.694010 (0.019225) with: {'dropout_rate': 0.3, 'weight_constraint': 4}
0.709635 (0.006639) with: {'dropout_rate': 0.3, 'weight_constraint': 5}
0.704427 (0.008027) with: {'dropout_rate': 0.4, 'weight_constraint': 1}
0.717448 (0.031304) with: {'dropout_rate': 0.4, 'weight_constraint': 2}
0.718750 (0.030425) with: {'dropout_rate': 0.4, 'weight_constraint': 3}
0.718750 (0.030425) with: {'dropout_rate': 0.4, 'weight_constraint': 4}
0.722656 (0.029232) with: {'dropout_rate': 0.4, 'weight_constraint': 5}
0.720052 (0.028940) with: {'dropout_rate': 0.5, 'weight_constraint': 1}
0.703125 (0.009568) with: {'dropout_rate': 0.5, 'weight_constraint': 2}
0.716146 (0.029635) with: {'dropout_rate': 0.5, 'weight_constraint': 3}
0.709635 (0.008027) with: {'dropout_rate': 0.5, 'weight_constraint': 4}
0.703125 (0.011500) with: {'dropout_rate': 0.5, 'weight_constraint': 5}
0.707031 (0.017758) with: {'dropout_rate': 0.6, 'weight_constraint': 1}
0.701823 (0.018688) with: {'dropout_rate': 0.6, 'weight_constraint': 2}
0.701823 (0.018688) with: {'dropout_rate': 0.6, 'weight_constraint': 3}
0.690104 (0.027498) with: {'dropout_rate': 0.6, 'weight_constraint': 4}
0.695313 (0.022326) with: {'dropout_rate': 0.6, 'weight_constraint': 5}
0.697917 (0.014382) with: {'dropout_rate': 0.7, 'weight_constraint': 1}
0.697917 (0.014382) with: {'dropout_rate': 0.7, 'weight_constraint': 2}
0.687500 (0.008438) with: {'dropout_rate': 0.7, 'weight_constraint': 3}
0.704427 (0.011201) with: {'dropout_rate': 0.7, 'weight_constraint': 4}
0.696615 (0.016367) with: {'dropout_rate': 0.7, 'weight_constraint': 5}
0.680990 (0.025780) with: {'dropout_rate': 0.8, 'weight_constraint': 1}
0.699219 (0.019401) with: {'dropout_rate': 0.8, 'weight_constraint': 2}
0.701823 (0.015733) with: {'dropout_rate': 0.8, 'weight_constraint': 3}
0.684896 (0.023510) with: {'dropout_rate': 0.8, 'weight_constraint': 4}
0.696615 (0.017566) with: {'dropout_rate': 0.8, 'weight_constraint': 5}
0.653646 (0.034104) with: {'dropout_rate': 0.9, 'weight_constraint': 1}
0.677083 (0.012075) with: {'dropout_rate': 0.9, 'weight_constraint': 2}
0.679688 (0.013902) with: {'dropout_rate': 0.9, 'weight_constraint': 3}
0.669271 (0.017566) with: {'dropout_rate': 0.9, 'weight_constraint': 4}
0.669271 (0.012075) with: {'dropout_rate': 0.9, 'weight_constraint': 5}

我們可以看到，當 dropout率爲0.2%、最大範數權值約束（ maxnorm weight constraint）取值爲4時，可以取得準確率約爲72%的最好結果。

如何確定隱藏層中的神經元的數量

每一層中的神經元數目是一個非常重要的參數。通常情況下，一層之中的神經元數目控制着網絡的代表性容量，至少是拓撲結構某一節點的容量。

此外，一般來說，一個足夠大的單層網絡是接近於任何神經網絡的，至少在理論上成立。

在本例中，我們將着眼於調整單個隱藏層神經元的數量。取值範圍是：1—30，步長爲5。

一個大型網絡要求更多的訓練，此外，至少批尺寸（batch size）和 epoch的數量應該與神經元的數量優化。

完整代碼如下：

# Use scikit-learn to grid search the number of neurons
import numpy
from sklearn.grid_search import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.constraints import maxnorm
# Function to create model, required for KerasClassifier
def create_model(neurons=1):
    # create model
    model = Sequential()
    model.add(Dense(neurons, input_dim=8, init='uniform', activation='linear', W_constraint=maxnorm(4)))
    model.add(Dropout(0.2))
    model.add(Dense(1, init='uniform', activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)
# define the grid search parameters
neurons = [1, 5, 10, 15, 20, 25, 30]
param_grid = dict(neurons=neurons)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
for params, mean_score, scores in grid_result.grid_scores_:
    print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))

運行之後輸出如下：

Best: 0.714844 using {'neurons': 5}
0.700521 (0.011201) with: {'neurons': 1}
0.714844 (0.011049) with: {'neurons': 5}
0.712240 (0.017566) with: {'neurons': 10}
0.705729 (0.003683) with: {'neurons': 15}
0.696615 (0.020752) with: {'neurons': 20}
0.713542 (0.025976) with: {'neurons': 25}
0.705729 (0.008027) with: {'neurons': 30}

我們可以看到，當網絡中隱藏層內神經元的個數爲5時，可以達到最佳結果，準確性約爲71%。

超參數優化的小技巧
本節羅列了一些神經網絡超參數調整時常用的小技巧。

K層交叉檢驗（k-fold Cross Validation），你可以看到，本文中的不同示例的結果存在一些差異。使用了默認的3層交叉驗證，但也許K=5或者K=10時會更加穩定。認真選擇您的交叉驗證配置，以確保您的結果是穩定的。
審查整個網絡。不要只注意最好的結果，審查整個網絡的結果，並尋找支持配置決策的趨勢。
並行（Parallelize），如果可以，使用全部的CPU，神經網絡訓練十分緩慢，並且我們經常想嘗試不同的參數。參考AWS實例。
使用數據集的樣本。由於神經網路的訓練十分緩慢，嘗試訓練在您訓練數據集中較小樣本，得到總方向的一般參數即可，並非追求最佳的配置。
從粗網格入手。從粗粒度網格入手，並且一旦縮小範圍，就細化爲細粒度網格。
不要傳遞結果。結果通常是特定問題。儘量避免在每一個新問題上都採用您最喜歡的配置。你不可能將一個問題的最佳結果轉移到另一個問題之上。相反地，你應該歸納更廣泛的趨勢，例如層的數目或者是參數之間的關係。
再現性（Reproducibility）是一個問題。在NumPy中，儘管我們爲隨機數發生器設置了種子，但結果並非百分百重現。網格搜索wrapped Keras模型將比本文中所示Keras模型展現更多可重複性（reproducibility）。
總結
在這篇文章中，你可以瞭解到如何使用Keras和scikit-learn/Python調優神經網絡中的超參數。

尤其是可以學到：

如何包裝Keras模型以便在scikit-learn使用以及如何使用網格搜索。
如何網格搜索Keras 模型中不同標準的神經網絡參數。
如何設計自己的超參數優化實驗。

原文來自http://geek.csdn.net/news/detail/95494?ref=myread

keras 深度模型調參

概述

如何在scikit-learn模型中使用Keras

如何在scikit-learn模型中使用網格搜索

如何調優批尺寸和訓練epochs

如何調優訓練優化算法

如何優化學習速率和動量因子？

如何調優網絡權值初始化

如何選擇神經元激活函數

如何調優Dropout正則化

如何確定隱藏層中的神經元的數量

ziw2pdf

apisix~helm方式的部署到k8s

firmeye - IoT固件漏洞挖掘工具

詞嵌入、句向量等方法彙總

keras 深度模型調參

深度學習DL中權重weight初始化方法

基於深度學習分詞

python 多線程

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結