總結caffe調參技巧

圖片加載有問題，直接百度雲下載本文.doc文檔：

鏈接：https://pan.baidu.com/s/1mhCxQsg 密碼：igfo

總的來說：

0.data：http://machinelearningmastery.com/improve-deep-learning-performance/

(0.

1.特徵選擇是門學問

)

①learning rate很重要，可以設置小一點，但是訓練時間會加長

②batch size在自己機器允許的條件下，儘可能大一點（設置爲2的次方）

③查看精度的結果圖表，從中獲取信息：

④Sec. 8: Ensemble（怎麼組合多個網絡結構，具體參考周志華的《機器學習》）

⑤微調總體原理：

⑥https://research.fb.com/wp-content/uploads/2017/06/imagenet1kin1h5.pdf?（gpu分佈式訓練）：

Linear ScalingRule: When the minibatch size is multiplied by k, multiply the learning rate byk.（All other hyper-parameters (weight decay, momentum, etc.) are keptunchanged）

具體如下：

一．Caffe結構參數調試技巧

0.數據準備的時候：

①X= X.astype(np.float32)

X, y =shuffle(X, y, random_state=42) # shuffle train data

y =y.astype(np.float32)

②歸一化等：

1. http://blog.csdn.net/u011762313/article/details/47399981

Solver初始化中（Caffe提供了3種Solver方法：Stochastic Gradient Descent（SGD，隨機梯度下降），Adaptive Gradient（ADAGRAD，自適應梯度下降）和Nesterov’sAccelerated Gradient（NESTEROV，Nesterov提出的加速梯度下降）。）

SGD默認的較好的參數構造:

base_lr: 0.01 # 開始學習速率爲：α = 0.01

lr_policy: "step"#學習策略: 每stepsize次迭代之後，將α乘以gamma

gamma: 0.1 # 學習速率變化因子

stepsize: 100000 # 每100K次迭代，下降學習速率

max_iter: 350000 # 訓練的最大迭代次數

momentum: 0.9 #momentum爲：μ = 0.01

其他兩種方法效果也好。

2, https://corpocrat.com/2015/02/24/facial-keypoints-extraction-using-deep-learning-with-caffe/

①We specify RELU (to allowvalues > 0, plus faster converging) ,Dropout layerto prevent overfitting.（Dropout層防止過擬合）

（http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html：）

In conclusion,three types of ReLU variants all consistently outperform the original ReLU inthese three data sets. And PReLU and RReLU seem better choices. Moreover, Heet al. also reported similarconclusions in [4].

②全連接層加入(xavier作用：默認將Blob係數x初始化爲滿足x∼U(−a,+a)的均勻分佈)：

weight_filler{

type:"xavier"

}

bias_filler{

type:"constant"

value: 0.1

}

layer {

name:"relu22"

type:"ReLU"

bottom:"fc6"

top:"fc6"

}

③Layer層具體參數的初始方式：http://blog.csdn.net/wenlin33/article/details/53378613

3.看圖表等結果：

①訓練的時候最好可以可視化一些featuremap

②訓練測試的log保存下來成圖表，根據圖表顯示的精度看參數的效果!

4.小技巧：http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/

①（An overfittingnet can generally be made to perform better by using more training data）

增加數據方法：通過圖片旋轉，翻轉等技巧

②加快網絡訓練速度：Rememberthat in our previous model, we initialized learning rate and momentum with astatic 0.01 and 0.9 respectively. Let's change that such that the learning ratedecreases linearly with the number of epochs, while we let the momentumincrease.

③加載預訓練pre-trained model的權重，來加快本次的訓練速度

④put the BatchNorm layer immediately afterfully connected
layers (or convolutional layers), and before activation

二微調一些深度網絡

①http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html

②https://zhuanlan.zhihu.com/p/22624331

尤其是當我們的數據相對較少時，就更適合選擇這種辦法。既可以有效利用深度神經網絡強大的泛化能力，又可以免去設計複雜的模型以及耗時良久的訓練。目前最強大的模型是ResNet，很多視覺任務都可以通過fine-tuning ResNet得到非常好的performance!

③ 微調總體原理：

很好的博客：

① http://machinelearningmastery.com/improve-deep-learning-performance/

② http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html

③ https://github.com/hwdong/deep-learning/blob/master/deep%20learning%20papers.md