本文分爲兩部分,先寫一個入門的教程,然後再給出自己添加maxout與NIN的layer的方法
(一)
其實在Github上已經有答案了(https://github.com/BVLC/caffe/issues/684)
Here's roughly the process I follow.
- Add a class declaration for your layer to the appropriate one of
common_layers.hpp
,data_layers.hpp
,loss_layers.hpp
,neuron_layers.hpp
, orvision_layers.hpp
. Include an inline implementation oftype
and the*Blobs()
methods to specify blob number requirements. Omit the*_gpu
declarations if you'll only be implementing CPU code. - Implement your layer in
layers/your_layer.cpp
.SetUp
for initialization: reading parameters, allocating buffers, etc.Forward_cpu
for the function your layer computesBackward_cpu
for its gradient
- (Optional) Implement the GPU versions
Forward_gpu
andBackward_gpu
inlayers/your_layer.cu
. - Add your layer to
proto/caffe.proto
, updating the next available ID. Also declare parameters, if needed, in this file. - Make your layer createable by adding it to
layer_factory.cpp
. - Write tests in
test/test_your_layer.cpp
. Usetest/test_gradient_check_util.hpp
to check that your Forward and Backward implementations are in numerical agreement.
1. 首先確定要添加的layer的類型,是common_layer 還是 data_layer 還是loss_layer, neuron_layer, vision_layer ,這裏的Wtf_Layer肯定是屬vision_layer了,所以打開vision_layers.hpp 然後複製convolution_layer的相關代碼,把類名還有構造函數的名字改爲WtfLayer,如果沒有用到GPU運算,那麼把裏面的帶GPU的函數都刪掉
2. 將Wtf_layer.cpp 添加到src\caffe\layers文件夾中,代碼內容複製convolution_layer.cpp 把對應的類名修改(可以搜一下conv關鍵字,然後改爲Wtf)
3. 假如有gpu的代碼就添加響應的Wtf_layer.cu (這裏不添加了)
4. 修改proto/caffe.proto文件,找到LayerType,添加WTF,並更新ID(新的ID應該是34)。假如說Wtf_Layer有參數,比如Convolution肯定是有參數的,那麼添加WtfParameter類
5. 在layer_factory.cpp中添加響應的代碼,就是一堆if ... else的那片代碼
6. 這個可以不做,但是爲了結果還是做一個,就是寫一個測試文件,檢查前向後向傳播的數據是否正確。gradient_check的原理可以參考UFLDL教程的對應章節
之後我會更新我自己寫的maxout_layer的demo,在這立一個flag以鞭策自己完成吧╮(╯▽╰)╭
(二) 如何添加maxout_layer
表示被bengio的maxout給搞鬱悶了,自己擺出一個公式巴拉巴拉說了一堆,結果用到卷積層的maxout卻給的另一種方案,吐槽無力,不過後來又想了下應該是bengio沒表述清楚的問題。
我的maxout的算法思路是這樣的,首先要確定一個group_size變量,表示最大值是在group_size這樣一個規模的集合下挑選出來的,簡而言之就是給定group_size個數,取最大。確定好group_size變量,然後讓卷積層的output_num變爲原來的group_size倍,這樣輸出的featuremap的個數就變爲原來的group_size倍,然後以group_size爲一組劃分這些featuremap,每組裏面挑出響應最大的點構成一個新的featuremap,這樣就得到了maxout層的輸出。
要是還不明白我就拿上面的圖來說一下,上面一共9張圖,相當於卷積層輸出9張featuremap,我們每3個爲一組,那麼maxout層輸出9/3=3張featuremap,對於每組featuremaps,比如我們挑出綠色的三張featuremaps,每張大小爲w*h,那麼聲明一個新的output_featuremap大小爲w*h,遍歷output_featuremap的每個點,要賦的數值爲三張綠色featuremap對應點的最大的那個,也就是三個數裏面選最大的,這樣就輸出了一張output_featuremap,剩下的組類似操作。
我覺得到這應該明白maxout的原理跟算法了吧= =,下面就直接貼代碼了
新建一個maxout_layer.cpp放到src/caffe/layer文件夾下
- #include <cstdio>
- #include <vector>
- #include "caffe/filler.hpp"
- #include "caffe/layer.hpp"
- #include "caffe/util/im2col.hpp"
- #include "caffe/util/math_functions.hpp"
- #include "caffe/vision_layers.hpp"
- namespace caffe {
- template <typename Dtype>
- void MaxoutLayer<Dtype>::SetUp(const vector<Blob<Dtype>*>& bottom,
- vector<Blob<Dtype>*>* top) {
- Layer<Dtype>::SetUp(bottom, top);
- printf("===============================================================has go into setup !==============================================\n");
- MaxoutParameter maxout_param = this->layer_param_.maxout_param();
- // maxout_size
- //CHECK(!maxout_param.has_num_output())
- // << "maxout size are required.";
- //if (maxout_param.has_num_output()) {
- // num_output_ = maxout_param.num_output();
- //}
- num_output_ = this->layer_param_.maxout_param().num_output();
- CHECK_GT(num_output_, 0) << "output number cannot be zero.";
- // bottom ÊÇFEATURE_MAP
- num_ = bottom[0]->num();
- channels_ = bottom[0]->channels();
- height_ = bottom[0]->height();
- width_ = bottom[0]->width();
- // òËÆÏÂÃæÕâžöif²»»áÅÜœøÈ¥
- // TODO: generalize to handle inputs of different shapes.
- for (int bottom_id = 1; bottom_id < bottom.size(); ++bottom_id) {
- CHECK_EQ(num_, bottom[bottom_id]->num()) << "Inputs must have same num.";
- CHECK_EQ(channels_, bottom[bottom_id]->channels())
- << "Inputs must have same channels.";
- CHECK_EQ(height_, bottom[bottom_id]->height())
- << "Inputs must have same height.";
- CHECK_EQ(width_, bottom[bottom_id]->width())
- << "Inputs must have same width.";
- }
- // Set the parameters ž³Öµ²ÎÊý
- CHECK_EQ(channels_ % num_output_, 0)
- << "Number of channel should be multiples of output number.";
- group_size_ = channels_ / num_output_;
- // Figure out the dimensions for individual gemms. ŒÆËãŸØÕóµÄÐÐÁÐ
- // ÆäʵBengioµÄÂÛÎÄÖжÔÓÚK_µÄŽóС¶šÒåºÜÄ£ºý£¬¶ÔÓÚÍŒÏñœöœöÊÇžøÁËe.g.
- // ҲûÓÐ˵µœµ×ʵŒÊÊDz»ÊÇÕâÃŽ²Ù×÷µÄ£¬ÒòΪÈç¹ûÕæµÄÊÇchannelÖ±œÓœøÐбȜÏ
- // ÄÇÃŽŸÍžúÀíÂ۵Ĺ«ÊœÎǺϵIJ»ºÃ£¬µ«ÊÇÄܹ»œâÊÍÍŒÏñ£¬±ÈÈç˵Äóö×îºÃµÄÒ»²ã
- // ¶øÇÒÍŒÏñžú·ÇÍŒÏñµÄmaxoutµÄ×ö·š²îÌ«¶àÁË°¡¡£ŒÙÈç×öµœŒæÈݵĻ°Ö»ÄÜÈÃÇ°Ò»²ã
- // µÄoutput_numÅäºÏmaxout
- //for (int top_id = 0; top_id < top->size(); ++top_id) {
- (*top)[0]->Reshape(num_, num_output_, height_, width_); // œöœöÊǞıäµÄchannelžöÊý
- max_idx_.Reshape(num_, num_output_, height_, width_);
- //}
- }
- template <typename Dtype>
- Dtype MaxoutLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
- vector<Blob<Dtype>*>* top) {
- int featureSize = height_ * width_;
- Dtype* mask = NULL;
- mask = max_idx_.mutable_cpu_data();
- //printf("1.maxout_forward\n");
- const int top_count = (*top)[0]->count();
- caffe_set(top_count, Dtype(0), mask);
- //printf("2.maxout_forward\n");
- for (int i = 0; i < bottom.size(); ++i) {
- const Dtype* bottom_data = bottom[i]->cpu_data();
- Dtype* top_data = (*top)[i]->mutable_cpu_data();
- for (int n = 0; n < num_; n ++) {
- // È¡µÚnÕÅÍŒÏñ
- for (int o = 0; o < num_output_; o ++) {
- for (int g = 0; g < group_size_; g ++) {
- if (g == 0) {
- for (int h = 0; h < height_; h ++) { // Áœ²ãÑ»·Óеã¶ù†ªàÂ
- for (int w = 0; w < width_; w ++) {
- int index = w + h * width_;
- top_data[index] = bottom_data[index];
- mask[index] = index;
- }
- }
- }
- else {
- for (int h = 0; h < height_; h ++) {
- for (int w = 0; w < width_; w ++) {
- int index0 = w + h * width_;
- int index1 = index0 + g * featureSize;
- if (top_data[index0] < bottom_data[index1]) {
- top_data[index0] = bottom_data[index1];
- mask[index0] = index1;
- }
- }
- }
- }
- }
- bottom_data += featureSize * group_size_;
- top_data += featureSize;
- mask += featureSize;
- }
- }
- }
- return Dtype(0.);
- }
- template <typename Dtype>
- void MaxoutLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
- const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
- if (!propagate_down[0]) {
- return;
- }
- const Dtype* top_diff = top[0]->cpu_diff();
- Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
- caffe_set((*bottom)[0]->count(), Dtype(0), bottom_diff);
- const Dtype* mask = max_idx_.mutable_cpu_data();
- int featureSize = height_ * width_;
- for (int i = 0; i < top.size(); i ++) {
- const Dtype* top_diff = top[i]->cpu_diff();
- Dtype* bottom_diff = (*bottom)[i]->mutable_cpu_diff();
- for (int n = 0; n < num_; n ++) {
- // È¡µÚnÕÅÍŒÏñ
- for (int o = 0; o < num_output_; o ++) {
- for (int h = 0; h < height_; h ++) { // Áœ²ãÑ»·Óеã¶ù†ªàÂ
- for (int w = 0; w < width_; w ++) {
- int index = w + h * width_;
- int bottom_index = mask[index];
- bottom_diff[bottom_index] += top_diff[index];
- }
- }
- bottom_diff += featureSize * group_size_;
- top_diff += featureSize;
- mask += featureSize;
- }
- }
- }
- }
- //#ifdef CPU_ONLY
- //STUB_GPU(MaxoutLayer);
- //#endif
- INSTANTIATE_CLASS(MaxoutLayer);
- } // namespace caffe
裏面的亂碼是中文,到了linux裏面就亂碼了,不影響,還一個printf是測試用的(要被笑話用printf了= =)
vision_layers.hpp 裏面添加下面的代碼
- /* MaxoutLayer
- */
- template <typename Dtype>
- class MaxoutLayer : public Layer<Dtype> {
- public:
- explicit MaxoutLayer(const LayerParameter& param)
- : Layer<Dtype>(param) {}
- virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
- vector<Blob<Dtype>*>* top); // 爲什麼需要bottom與top,肯定的啊,
- //要初始化bottom top的形狀
- virtual inline LayerParameter_LayerType type() const {
- return LayerParameter_LayerType_MAXOUT;
- }
- protected:
- virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
- vector<Blob<Dtype>*>* top);
- //virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,
- // vector<Blob<Dtype>*>* top);
- virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
- const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
- //virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
- // const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
- int num_output_;
- int num_;
- int channels_;
- int height_;
- int width_;
- int group_size_;
- Blob<Dtype> max_idx_;
- };
剩下的是layer_factory.cpp 的改動,不說明了,然後是proto文件的改動
- message MaxoutParameter {
- optional uint32 num_output = 1; // The number of outputs for the layer
- }
額,當然還有proto文件的其他改動也不說明了,還有test文件,我沒寫,因爲我自己跑了下demo,沒啥問題,所以代碼可以說是正確的。
不過要說明的是,目前的代碼不能接在全連接層後面,是我裏面有幾句代碼寫的有問題,之後我會改動一下,問題不大。
然後就是NIN的實現了,表示自己寫的渣一樣的代碼啊,效率目測很低。哦對了,這些都是CPU算的,GPU不大會,還沒打算寫。
NIN_layer 的實現
我之前一直以爲Github上的network in network 是有問題的,事實證明,我最後也寫成了Github上面的樣子= =所以大家自行搜索caffe+network in network吧……不過得翻牆下載,所以我就把網絡格式的代碼直接貼出來(cifar10數據庫的網絡結構)
- layers {
- name: "cifar"
- type: DATA
- top: "data"
- top: "label"
- data_param {
- source: "cifar-train-leveldb"
- batch_size: 128
- }
- include: { phase: TRAIN }
- }
- layers {
- name: "cifar"
- type: DATA
- top: "data"
- top: "label"
- data_param {
- source: "cifar-test-leveldb"
- batch_size: 100
- }
- include: { phase: TEST }
- }
- layers {
- name: "conv1"
- type: CONVOLUTION
- bottom: "data"
- top: "conv1"
- blobs_lr: 1
- blobs_lr: 2
- weight_decay: 1.
- weight_decay: 0.
- convolution_param {
- num_output: 192
- pad: 2
- kernel_size: 5
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- }
- }
- }
- layers {
- name: "relu1"
- type: RELU
- bottom: "conv1"
- top: "conv1"
- }
- layers {
- name: "cccp1"
- type: CONVOLUTION
- bottom: "conv1"
- top: "cccp1"
- blobs_lr: 1
- blobs_lr: 2
- weight_decay: 1
- weight_decay: 0
- convolution_param {
- num_output: 160
- group: 1
- kernel_size: 1
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- value: 0
- }
- }
- }
- layers {
- name: "relu_cccp1"
- type: RELU
- bottom: "cccp1"
- top: "cccp1"
- }
- layers {
- name: "cccp2"
- type: CONVOLUTION
- bottom: "cccp1"
- top: "cccp2"
- blobs_lr: 1
- blobs_lr: 2
- weight_decay: 1
- weight_decay: 0
- convolution_param {
- num_output: 96
- group: 1
- kernel_size: 1
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- value: 0
- }
- }
- }
- layers {
- name: "relu_cccp2"
- type: RELU
- bottom: "cccp2"
- top: "cccp2"
- }
- layers {
- name: "pool1"
- type: POOLING
- bottom: "cccp2"
- top: "pool1"
- pooling_param {
- pool: MAX
- kernel_size: 3
- stride: 2
- }
- }
- layers {
- name: "drop3"
- type: DROPOUT
- bottom: "pool1"
- top: "pool1"
- dropout_param {
- dropout_ratio: 0.5
- }
- }
- layers {
- name: "conv2"
- type: CONVOLUTION
- bottom: "pool1"
- top: "conv2"
- blobs_lr: 1
- blobs_lr: 2
- weight_decay: 1.
- weight_decay: 0.
- convolution_param {
- num_output: 192
- pad: 2
- kernel_size: 5
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- }
- }
- }
- layers {
- name: "relu2"
- type: RELU
- bottom: "conv2"
- top: "conv2"
- }
- layers {
- name: "cccp3"
- type: CONVOLUTION
- bottom: "conv2"
- top: "cccp3"
- blobs_lr: 1
- blobs_lr: 2
- weight_decay: 1
- weight_decay: 0
- convolution_param {
- num_output: 192
- group: 1
- kernel_size: 1
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- value: 0
- }
- }
- }
- layers {
- name: "relu_cccp3"
- type: RELU
- bottom: "cccp3"
- top: "cccp3"
- }
- layers {
- name: "cccp4"
- type: CONVOLUTION
- bottom: "cccp3"
- top: "cccp4"
- blobs_lr: 1
- blobs_lr: 2
- weight_decay: 1
- weight_decay: 0
- convolution_param {
- num_output: 192
- group: 1
- kernel_size: 1
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- value: 0
- }
- }
- }
- layers {
- name: "relu_cccp4"
- type: RELU
- bottom: "cccp4"
- top: "cccp4"
- }
- layers {
- name: "pool2"
- type: POOLING
- bottom: "cccp4"
- top: "pool2"
- pooling_param {
- pool: AVE
- kernel_size: 3
- stride: 2
- }
- }
- layers {
- name: "drop6"
- type: DROPOUT
- bottom: "pool2"
- top: "pool2"
- dropout_param {
- dropout_ratio: 0.5
- }
- }
- layers {
- name: "conv3"
- type: CONVOLUTION
- bottom: "pool2"
- top: "conv3"
- blobs_lr: 1.
- blobs_lr: 2.
- weight_decay: 1.
- weight_decay: 0.
- convolution_param {
- num_output: 192
- pad: 1
- kernel_size: 3
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- }
- }
- }
- layers {
- name: "relu3"
- type: RELU
- bottom: "conv3"
- top: "conv3"
- }
- layers {
- name: "cccp5"
- type: CONVOLUTION
- bottom: "conv3"
- top: "cccp5"
- blobs_lr: 1
- blobs_lr: 2
- weight_decay: 1
- weight_decay: 0
- convolution_param {
- num_output: 192
- group: 1
- kernel_size: 1
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- value: 0
- }
- }
- }
- layers {
- name: "relu_cccp5"
- type: RELU
- bottom: "cccp5"
- top: "cccp5"
- }
- layers {
- name: "cccp6"
- type: CONVOLUTION
- bottom: "cccp5"
- top: "cccp6"
- blobs_lr: 0.1
- blobs_lr: 0.1
- weight_decay: 1
- weight_decay: 0
- convolution_param {
- num_output: 10
- group: 1
- kernel_size: 1
- weight_filler {
- type: "gaussian"
- std: 0.05
- }
- bias_filler {
- type: "constant"
- value: 0
- }
- }
- }
- layers {
- name: "relu_cccp6"
- type: RELU
- bottom: "cccp6"
- top: "cccp6"
- }
- layers {
- name: "pool3"
- type: POOLING
- bottom: "cccp6"
- top: "pool3"
- pooling_param {
- pool: AVE
- kernel_size: 8
- stride: 1
- }
- }
- layers {
- name: "accuracy"
- type: ACCURACY
- bottom: "pool3"
- bottom: "label"
- top: "accuracy"
- include: { phase: TEST }
- }
- layers {
- name: "loss"
- type: SOFTMAX_LOSS
- bottom: "pool3"
- bottom: "label"
- top: "loss"
- }
訓練參數
- test_iter: 100
- test_interval: 500
- base_lr: 0.1
- momentum: 0.9
- weight_decay: 0.0001
- lr_policy: "step"
- gamma: 0.1
- stepsize: 100000
- display: 100
- max_iter: 120000
- snapshot: 10000
- snapshot_prefix: "cifar10_nin"
該文章算是告一段落了,剩下的任務就是如何訓練得到state-of-the-art了
裝載自http://blog.csdn.net/kuaitoukid/article/details/41865803。