1 序言
本文主要是自己平時訓練參數調整的總結,後續也不斷的完善,由於以前訓練調參過程中,沒有總結總是忘記的參數,這個也自己備忘,如有錯誤或者引用不當,歡迎指正。
Last modified date: 2019-03-01
2 優化器
caffe總共提供了六種優化方法:
- Stochastic Gradient Descent (type: “SGD”),
- AdaDelta (type: “AdaDelta”),
- Adaptive Gradient (type: “AdaGrad”),
- Adam (type: “Adam”),
- Nesterov’s Accelerated Gradient (type: “Nesterov”) and
- RMSprop (type: “RMSProp”)
2.1 SGD
註釋的是可以省略的,有默認值, 但是如果要優化,還是要自己調整。
type: "SGD" #default
#momentum: 0.9
2.2 AdaDelta
type: "AdaDelta"
#momentum: 0.95
#delta: 1e-6
2.3 AdaGrad
type: "AdaGrad"
2.4 Adam
type: "Adam"
#momentum: 0.9
#momentum2: 0.999
#delta: 1e-8
2.5 Nesterov
#momentum: 0.95
#type: "Nesterov"
2.6 RMSProp
type: "RMSProp"
#rms_decay: 0.98
2.7 經驗總結
- Adam通常會取得比較好的結果,同時收斂非常快相比SGD
- L-BFGS適用於全batch做優化的情況
- 有時候可以多種優化方法同時使用,比如使用SGD進行warm up,然後Adam
- 對於比較奇怪的需求,deepbit兩個loss的收斂需要進行控制的情況,比較慢的SGD比較適用
3 學習策略
// The learning rate decay policy. The currently implemented learning rate
// policies are as follows:
// - fixed: always return base_lr.
// - step: return base_lr * gamma ^ (floor(iter / step))
// - exp: return base_lr * gamma ^ iter
// - inv: return base_lr * (1 + gamma * iter) ^ (- power)
// - multistep: similar to step but it allows non uniform steps defined by
// stepvalue
// - poly: the effective learning rate follows a polynomial decay, to be
// zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
// - sigmoid: the effective learning rate follows a sigmod decay
// return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
//
// where base_lr, max_iter, gamma, step, stepvalue and power are defined
// in the solver parameter protocol buffer, and iter is the current iteration.
---------------------
作者:cuijyer
來源:CSDN
原文:https://blog.csdn.net/cuijyer/article/details/78195178
版權聲明:本文爲博主原創文章,轉載請附上博文鏈接!
lr_policy可以設置爲下面這些值,相應的學習率的計算爲:
- fixed: 保持base_lr不變.
- step: 如果設置爲step,則還需要設置一個stepsize, 返回 base_lr * gamma ^ (floor(iter / stepsize)),其中iter表示當前的迭代次數
- exp: 返回base_lr * gamma ^ iter, iter爲當前迭代次數
- inv: 如果設置爲inv,還需要設置一個power, 返回base_lr * (1 + gamma * iter) ^ (- power)
- multistep: 如果設置爲multistep,則還需要設置一個stepvalue。這個參數和step很相似,step是均勻等間隔變化,而multistep則是根據 stepvalue值變化
- poly: 學習率進行多項式誤差, 返回 base_lr (1 - iter/max_iter) ^ (power)
- sigmoid: 學習率進行sigmod衰減,返回 base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
3.1 fixed
base_lr: 0.01
lr_policy: "fixed"
max_iter: 400000
3.2 step
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 30
max_iter: 100
3.3 exp
base_lr: 0.01
lr_policy: "exp"
gamma: 0.1
max_iter: 100
3.4 inv
base_lr: 0.01
lr_policy: "inv"
gamma: 0.1
power: 0.75
max_iter: 10000
3.5 multistep
base_lr: 0.01
lr_policy: "multistep"
gamma: 0.5
stepvalue: 1000
stepvalue: 3000
stepvalue: 4000
stepvalue: 4500
stepvalue: 5000
max_iter: 6000
---------------------
作者:cuijyer
來源:CSDN
原文:https://blog.csdn.net/cuijyer/article/details/78195178
版權聲明:本文爲博主原創文章,轉載請附上博文鏈接!
3.6 poly
base_lr: 0.01
lr_policy: "poly"
power: 0.5
max_iter: 10000
3.7 sigmoid
base_lr: 0.01
lr_policy: "sigmoid"
gamma: -0.001
stepsize: 5000
max_iter: 10000
4 solver文件配置總結
4.1 classify sample
分類訓練中,solver文件配置共有5個主要部分的參數配置,具體內容如下:
- net
- test
- lr_policy
- snapshot
- solver
具體的示例如下圖所示:
# ------- 1. config net ---------
net: "examples/mnist/lenet_train_test.prototxt"
# ------- 2. config test --------
test_iter: 100
test_interval: 500
# ------ 3. config lr_policy --------
base_lr: 1.0
lr_policy: "fixed"
weight_decay: 0.0005
display: 100
max_iter: 10000
# ------4. config snapshot --------
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet_adadelta"
# ----- 5. config solver type -------
solver_mode: GPU
type: "AdaDelta"
delta: 1e-6
momentum: 0.95
4.2 detection sample
檢測訓練中,solver文件配置共有6個主要部分的參數配置,具體內容如下:
- net
- test
- lr_policy
- snapshot
- solve
- other
具體的示例如下圖所示:
# -------1. config net ----------
train_net: "example/MobileNetSSD_train.prototxt"
test_net: "example/MobileNetSSD_test.prototxt"
# -------2. config test --------
test_iter: 673
test_interval: 10000
# -------3. config lr_policy ------
base_lr: 0.0005
display: 10
max_iter: 120000
lr_policy: "multistep"
gamma: 0.5
weight_decay: 0.00005
stepvalue: 20000
stepvalue: 40000
# -------4. config snapshot ------
snapshot: 1000
snapshot_prefix: "snapshot/mobilenet"
snapshot_after_train: true
# -------5. config solver -------
solver_mode: GPU
type: "RMSProp"
# -------6. other -------
eval_type: "detection"
ap_version: "11point"
test_initialization: false
debug_info: false
average_loss: 10
iter_size: 1
5. classify
待後續完善
6. detection
待後續完善
6.1 YOLO
6.2 SSD
References
[1] Caffe學習系列(10):命令行解析 [https://www.cnblogs.com/denny402/p/5076285.html]
[2] Caffe 的深度學習訓練全過程 [https://www.infoq.cn/article/whole-process-of-caffe-depth-learning-training]
[3] Caffe學習系列(8):solver優化方法
[https://www.cnblogs.com/denny402/p/5074212.html]
[4] 圖示caffe的solver中不同的學習策略(lr_policy) [https://blog.csdn.net/cuijyer/article/details/78195178 ]
[5] 預訓練模型 [https://github.com/BVLC/caffe/wiki/Model-Zoo]
[6] Fine-tuning a Pretrained Network for Style Recognition [https://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/02-fine-tuning.ipynb]
[7] http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
[8] caffe中fine-tuning模型三重天(函數詳解、框架簡述)+微調技巧 [https://blog.csdn.net/sinat_26917383/article/details/54999868]
[9] 常見優化算法 (caffe和tensorflow對應參數) [https://blog.csdn.net/csyanbin/article/details/53460627]