1.顯存佔用問題
由於tensorflow在訓練時默認指定所有GPU的顯存,使用tensorflow後端的keras亦如此
注:雖然佔用了所有GPU的顯存,但實際使用只有指定的GPU。----------(佔着不用)
(1)禁用gpu
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
(2)指定gpu
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
(3)同時指定GPU和顯存佔用比例
import os
import tensorflow as tf
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.8
set_session(tf.Session(config=config))
2.將訓練結果保存爲csv格式
hist = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
callbacks=[ModelCheckpoint('weights/imdb_indrnn_mnist.h5', monitor='val_acc', save_best_only=True, save_weights_only=True, mode='max')])
log = pd.DataFrame(hist.history)
log.to_csv('log.csv')
3.學習率衰減
參考keras官方文檔
ReduceLROnPlateau
keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=10, verbose=0, mode='auto', epsilon=0.0001, cooldown=0, min_lr=0)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.001)
model.fit(X_train, Y_train, callbacks=[reduce_lr])
自定義learning rate(參考https://blog.csdn.net/xiaojiajia007/article/details/77278315)
from keras.callbacks import LearningRateScheduler
def scheduler(epoch):
if epoch%2==0 and epoch!=0:
lr = K.get_value(model.optimizer.lr)
K.set_value(model.optimizer.lr, lr*.9)
print("lr changed to {}".format(lr*.9))
return K.get_value(model.optimizer.lr)
lr_decay = LearningRateScheduler(scheduler)
model.fit_generator(train_gen, (nb_train_samples//batch_size)*batch_size,
nb_epoch=100, verbose=1,
validation_data=valid_gen, nb_val_samples=val_size,
callbacks=[lr_decay])
4.保存權重和保存模型
由於直接保存模型(含權重)往往文件太大,一般我們採用保存權重的方法
(1)保存模型+權重
你可以使用model.save(filepath)
將Keras模型和權重保存在一個HDF5文件中,該文件將包含:
- 模型的結構,以便重構該模型
- 模型的權重
- 訓練配置(損失函數,優化器等)
- 優化器的狀態,以便於從上次訓練中斷的地方開始
使用keras.models.load_model(filepath)
來重新實例化你的模型,如果文件中存儲了訓練配置的話,該函數還會同時完成模型的編譯
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')
(2)僅保存權重
model.save_weights('my_model_weights.h5')
model.load_weights('my_model_weights.h5')