caffe源碼解析之添加新的Layer(maxout)

本文分爲兩部分，先寫一個入門的教程，然後再給出自己添加maxout與NIN的layer的方法

（一）

其實在Github上已經有答案了（https://github.com/BVLC/caffe/issues/684）

Here's roughly the process I follow.

Add a class declaration for your layer to the appropriate one of common_layers.hpp, data_layers.hpp,loss_layers.hpp, neuron_layers.hpp, or vision_layers.hpp. Include an inline implementation oftype and the *Blobs() methods to specify blob number requirements. Omit the *_gpu declarations if you'll only be implementing CPU code.
Implement your layer in layers/your_layer.cpp.
- SetUp for initialization: reading parameters, allocating buffers, etc.
- Forward_cpu for the function your layer computes
- Backward_cpu for its gradient
(Optional) Implement the GPU versions Forward_gpu and Backward_gpu in layers/your_layer.cu.
Add your layer to proto/caffe.proto, updating the next available ID. Also declare parameters, if needed, in this file.
Make your layer createable by adding it to layer_factory.cpp.
Write tests in test/test_your_layer.cpp. Use test/test_gradient_check_util.hpp to check that your Forward and Backward implementations are in numerical agreement.

上面是一個大致的流程，我就直接翻譯過來吧，因爲我自己琢磨出來的步驟跟這個是一樣的。在這裏，我們就添加一個Wtf_Layer，然後作用跟Convolution_Layer一模一樣。注意這裏的命名方式，Wtf第一個字母大寫，剩下的小寫，算是一個命名規範吧，強迫症表示很舒服。

1. 首先確定要添加的layer的類型，是common_layer 還是 data_layer 還是loss_layer, neuron_layer, vision_layer ，這裏的Wtf_Layer肯定是屬vision_layer了，所以打開vision_layers.hpp 然後複製convolution_layer的相關代碼，把類名還有構造函數的名字改爲WtfLayer，如果沒有用到GPU運算，那麼把裏面的帶GPU的函數都刪掉

2. 將Wtf_layer.cpp 添加到src\caffe\layers文件夾中，代碼內容複製convolution_layer.cpp 把對應的類名修改（可以搜一下conv關鍵字，然後改爲Wtf）

3. 假如有gpu的代碼就添加響應的Wtf_layer.cu （這裏不添加了）

4. 修改proto/caffe.proto文件，找到LayerType，添加WTF，並更新ID（新的ID應該是34）。假如說Wtf_Layer有參數，比如Convolution肯定是有參數的，那麼添加WtfParameter類

5. 在layer_factory.cpp中添加響應的代碼，就是一堆if ... else的那片代碼

6. 這個可以不做，但是爲了結果還是做一個，就是寫一個測試文件，檢查前向後向傳播的數據是否正確。gradient_check的原理可以參考UFLDL教程的對應章節

之後我會更新我自己寫的maxout_layer的demo，在這立一個flag以鞭策自己完成吧╮(╯▽╰)╭

（二）如何添加maxout_layer

表示被bengio的maxout給搞鬱悶了，自己擺出一個公式巴拉巴拉說了一堆，結果用到卷積層的maxout卻給的另一種方案，吐槽無力，不過後來又想了下應該是bengio沒表述清楚的問題。

我的maxout的算法思路是這樣的，首先要確定一個group_size變量，表示最大值是在group_size這樣一個規模的集合下挑選出來的，簡而言之就是給定group_size個數，取最大。確定好group_size變量，然後讓卷積層的output_num變爲原來的group_size倍，這樣輸出的featuremap的個數就變爲原來的group_size倍，然後以group_size爲一組劃分這些featuremap，每組裏面挑出響應最大的點構成一個新的featuremap，這樣就得到了maxout層的輸出。

要是還不明白我就拿上面的圖來說一下，上面一共9張圖，相當於卷積層輸出9張featuremap，我們每3個爲一組，那麼maxout層輸出9/3=3張featuremap，對於每組featuremaps，比如我們挑出綠色的三張featuremaps，每張大小爲w*h，那麼聲明一個新的output_featuremap大小爲w*h，遍歷output_featuremap的每個點，要賦的數值爲三張綠色featuremap對應點的最大的那個，也就是三個數裏面選最大的，這樣就輸出了一張output_featuremap，剩下的組類似操作。

我覺得到這應該明白maxout的原理跟算法了吧= =，下面就直接貼代碼了

新建一個maxout_layer.cpp放到src/caffe/layer文件夾下

[cpp]view
plaincopy

#include <cstdio>  

#include <vector>  

#include "caffe/filler.hpp"  

#include "caffe/layer.hpp"  

#include "caffe/util/im2col.hpp"  

#include "caffe/util/math_functions.hpp"  

#include "caffe/vision_layers.hpp"  

namespace caffe {  

template <typename Dtype>  

void MaxoutLayer<Dtype>::SetUp(const vector<Blob<Dtype>*>& bottom,  

      vector<Blob<Dtype>*>* top) {  

  Layer<Dtype>::SetUp(bottom, top);  

  printf("===============================================================has go into setup !==============================================\n");  

  MaxoutParameter maxout_param = this->layer_param_.maxout_param();  

  // maxout_size  

  //CHECK(!maxout_param.has_num_output())  

  //    << "maxout size are required.";  

  //if (maxout_param.has_num_output()) {  

  //  num_output_ = maxout_param.num_output();  

  //}  

  num_output_ = this->layer_param_.maxout_param().num_output();  

  CHECK_GT(num_output_, 0) << "output number cannot be zero.";  

  // bottom ÊÇFEATURE_MAP  

  num_ = bottom[0]->num();  

  channels_ = bottom[0]->channels();  

  height_ = bottom[0]->height();  

  width_ = bottom[0]->width();  

  // Ã²ËÆÏÂÃæÕâžöif²»»áÅÜœøÈ¥  

  // TODO: generalize to handle inputs of different shapes.  

  for (int bottom_id = 1; bottom_id < bottom.size(); ++bottom_id) {  

    CHECK_EQ(num_, bottom[bottom_id]->num()) << "Inputs must have same num.";  

    CHECK_EQ(channels_, bottom[bottom_id]->channels())  

        << "Inputs must have same channels.";  

    CHECK_EQ(height_, bottom[bottom_id]->height())  

        << "Inputs must have same height.";  

    CHECK_EQ(width_, bottom[bottom_id]->width())  

        << "Inputs must have same width.";  

  }  

  // Set the parameters ž³Öµ²ÎÊý  

  CHECK_EQ(channels_ % num_output_, 0)  

      << "Number of channel should be multiples of output number.";  

  group_size_ = channels_ / num_output_;  

  // Figure out the dimensions for individual gemms. ŒÆËãŸØÕóµÄÐÐÁÐ  

  // ÆäÊµBengioµÄÂÛÎÄÖÐ¶ÔÓÚK_µÄŽóÐ¡¶šÒåºÜÄ£ºý£¬¶ÔÓÚÍŒÏñœöœöÊÇžøÁËe.g.  

  // Ò²Ã»ÓÐËµµœµ×ÊµŒÊÊÇ²»ÊÇÕâÃŽ²Ù×÷µÄ£¬ÒòÎªÈç¹ûÕæµÄÊÇchannelÖ±œÓœøÐÐ±ÈœÏ  

  // ÄÇÃŽŸÍžúÀíÂÛµÄ¹«ÊœÎÇºÏµÄ²»ºÃ£¬µ«ÊÇÄÜ¹»œâÊÍÍŒÏñ£¬±ÈÈçËµÄÃ³ö×îºÃµÄÒ»²ã  

  // ¶øÇÒÍŒÏñžú·ÇÍŒÏñµÄmaxoutµÄ×ö·š²îÌ«¶àÁË°¡¡£ŒÙÈç×öµœŒæÈÝµÄ»°Ö»ÄÜÈÃÇ°Ò»²ã  

  // µÄoutput_numÅäºÏmaxout  

  //for (int top_id = 0; top_id < top->size(); ++top_id) {  

    (*top)[0]->Reshape(num_, num_output_, height_, width_); // œöœöÊÇžÄ±äµÄchannelžöÊý  

    max_idx_.Reshape(num_, num_output_, height_, width_);  

  //}  

}  

template <typename Dtype>  

Dtype MaxoutLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,  

      vector<Blob<Dtype>*>* top) {  

    int featureSize = height_ * width_;  

    Dtype* mask = NULL;  

    mask = max_idx_.mutable_cpu_data();  

//printf("1.maxout_forward\n");  

    const int top_count = (*top)[0]->count();  

    caffe_set(top_count, Dtype(0), mask);  

//printf("2.maxout_forward\n");  

    for (int i = 0; i < bottom.size(); ++i) {  

        const Dtype* bottom_data = bottom[i]->cpu_data();  

        Dtype* top_data = (*top)[i]->mutable_cpu_data();    

        for (int n = 0; n < num_; n ++) {  

            // È¡µÚnÕÅÍŒÏñ  

            for (int o = 0; o < num_output_; o ++) {  

                for (int g = 0; g < group_size_; g ++) {  

                    if (g == 0) {  

                        for (int h = 0; h < height_; h ++) { // Áœ²ãÑ­»·ÓÐµã¶ù†ªàÂ  

                            for (int w = 0; w < width_; w ++) {  

                                int index = w + h * width_;  

                                top_data[index] = bottom_data[index];  

                                mask[index] = index;  

                            }  

                        }  

                    }  

                    else {  

                        for (int h = 0; h < height_; h ++) {  

                            for (int w = 0; w < width_; w ++) {  

                                int index0 = w + h * width_;  

                                int index1 = index0 + g * featureSize;  

                                if (top_data[index0] < bottom_data[index1]) {  

                                    top_data[index0] = bottom_data[index1];  

                                    mask[index0] = index1;  

                                }                                 

                            }  

                        }  

                    }  

                }  

                bottom_data += featureSize * group_size_;  

                top_data += featureSize;  

                mask += featureSize;  

            }  

        }  

    }  

    return Dtype(0.);  

}  

template <typename Dtype>  

void MaxoutLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,  

      const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {  

    if (!propagate_down[0]) {  

        return;  

    }  

    const Dtype* top_diff = top[0]->cpu_diff();  

    Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();  

    caffe_set((*bottom)[0]->count(), Dtype(0), bottom_diff);  

    const Dtype* mask = max_idx_.mutable_cpu_data();  

    int featureSize = height_ * width_;  

    for (int i = 0; i < top.size(); i ++) {  

        const Dtype* top_diff = top[i]->cpu_diff();  

        Dtype* bottom_diff = (*bottom)[i]->mutable_cpu_diff();    

        for (int n = 0; n < num_; n ++) {  

            // È¡µÚnÕÅÍŒÏñ  

            for (int o = 0; o < num_output_; o ++) {  

                for (int h = 0; h < height_; h ++) { // Áœ²ãÑ­»·ÓÐµã¶ù†ªàÂ  

                    for (int w = 0; w < width_; w ++) {  

                        int index = w + h * width_;  

                        int bottom_index = mask[index];  

                        bottom_diff[bottom_index] += top_diff[index];  

                    }  

                }  

                bottom_diff += featureSize * group_size_;  

                top_diff += featureSize;  

                mask += featureSize;  

            }  

        }  

    }  

}  

//#ifdef CPU_ONLY  

//STUB_GPU(MaxoutLayer);  

//#endif  

INSTANTIATE_CLASS(MaxoutLayer);  

}  // namespace caffe

裏面的亂碼是中文，到了linux裏面就亂碼了，不影響，還一個printf是測試用的（要被笑話用printf了= =）

vision_layers.hpp 裏面添加下面的代碼

[cpp]view
plaincopy

/* MaxoutLayer 

*/  

template <typename Dtype>  

class MaxoutLayer : public Layer<Dtype> {  

 public:  

  explicit MaxoutLayer(const LayerParameter& param)  

      : Layer<Dtype>(param) {}  

  virtual void SetUp(const vector<Blob<Dtype>*>& bottom,  

      vector<Blob<Dtype>*>* top); // 爲什麼需要bottom與top，肯定的啊，  

      //要初始化bottom top的形狀  

  virtual inline LayerParameter_LayerType type() const {  

    return LayerParameter_LayerType_MAXOUT;  

  }  

 protected:  

  virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,  

      vector<Blob<Dtype>*>* top);  

  //virtual Dtype Forward_gpu(const vector<Blob<Dtype>*>& bottom,  

  //    vector<Blob<Dtype>*>* top);  

  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,  

      const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);  

  //virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,  

  //    const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);  

  int num_output_;  

  int num_;  

  int channels_;  

  int height_;  

  int width_;  

  int group_size_;  

  Blob<Dtype> max_idx_;  

};

剩下的是layer_factory.cpp 的改動，不說明了，然後是proto文件的改動

[cpp]view
plaincopy

message MaxoutParameter {  

  optional uint32 num_output = 1; // The number of outputs for the layer  

}  

額，當然還有proto文件的其他改動也不說明了，還有test文件，我沒寫，因爲我自己跑了下demo，沒啥問題，所以代碼可以說是正確的。

不過要說明的是，目前的代碼不能接在全連接層後面，是我裏面有幾句代碼寫的有問題，之後我會改動一下，問題不大。

然後就是NIN的實現了，表示自己寫的渣一樣的代碼啊，效率目測很低。哦對了，這些都是CPU算的，GPU不大會，還沒打算寫。

NIN_layer 的實現

我之前一直以爲Github上的network in network 是有問題的，事實證明，我最後也寫成了Github上面的樣子= =所以大家自行搜索caffe+network in network吧……不過得翻牆下載，所以我就把網絡格式的代碼直接貼出來（cifar10數據庫的網絡結構）

[plain]view
plaincopy

layers {  

  name: "cifar"  

  type: DATA  

  top: "data"  

  top: "label"  

  data_param {  

    source: "cifar-train-leveldb"  

    batch_size: 128  

  }  

  include: { phase: TRAIN }  

}  

layers {  

  name: "cifar"  

  type: DATA  

  top: "data"  

  top: "label"  

  data_param {  

    source: "cifar-test-leveldb"  

    batch_size: 100  

  }  

  include: { phase: TEST }  

}  

layers {  

  name: "conv1"  

  type: CONVOLUTION  

  bottom: "data"  

  top: "conv1"  

  blobs_lr: 1  

  blobs_lr: 2  

  weight_decay: 1.  

  weight_decay: 0.  

  convolution_param {  

    num_output: 192  

    pad: 2  

    kernel_size: 5  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

    }  

  }  

}  

layers {  

  name: "relu1"  

  type: RELU  

  bottom: "conv1"  

  top: "conv1"  

}  

layers {  

  name: "cccp1"  

  type: CONVOLUTION  

  bottom: "conv1"  

  top: "cccp1"  

  blobs_lr: 1  

  blobs_lr: 2  

  weight_decay: 1  

  weight_decay: 0  

  convolution_param {  

    num_output: 160  

    group: 1  

    kernel_size: 1  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

      value: 0  

    }  

  }  

}  

layers {  

  name: "relu_cccp1"  

  type: RELU  

  bottom: "cccp1"  

  top: "cccp1"  

}  

layers {  

  name: "cccp2"  

  type: CONVOLUTION  

  bottom: "cccp1"  

  top: "cccp2"  

  blobs_lr: 1  

  blobs_lr: 2  

  weight_decay: 1  

  weight_decay: 0  

  convolution_param {  

    num_output: 96  

    group: 1  

    kernel_size: 1  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

      value: 0  

    }  

  }  

}  

layers {  

  name: "relu_cccp2"  

  type: RELU  

  bottom: "cccp2"  

  top: "cccp2"  

}  

layers {  

  name: "pool1"  

  type: POOLING  

  bottom: "cccp2"  

  top: "pool1"  

  pooling_param {  

    pool: MAX  

    kernel_size: 3  

    stride: 2  

  }  

}  

layers {  

  name: "drop3"  

  type: DROPOUT  

  bottom: "pool1"  

  top: "pool1"  

  dropout_param {  

    dropout_ratio: 0.5  

  }  

}  

layers {  

  name: "conv2"  

  type: CONVOLUTION  

  bottom: "pool1"  

  top: "conv2"  

  blobs_lr: 1  

  blobs_lr: 2  

  weight_decay: 1.  

  weight_decay: 0.  

  convolution_param {  

    num_output: 192  

    pad: 2  

    kernel_size: 5  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

    }  

  }  

}  

layers {  

  name: "relu2"  

  type: RELU  

  bottom: "conv2"  

  top: "conv2"  

}  

layers {  

  name: "cccp3"  

  type: CONVOLUTION  

  bottom: "conv2"  

  top: "cccp3"  

  blobs_lr: 1  

  blobs_lr: 2  

  weight_decay: 1  

  weight_decay: 0  

  convolution_param {  

    num_output: 192  

    group: 1  

    kernel_size: 1  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

      value: 0  

    }  

  }  

}  

layers {  

  name: "relu_cccp3"  

  type: RELU  

  bottom: "cccp3"  

  top: "cccp3"  

}  

layers {  

  name: "cccp4"  

  type: CONVOLUTION  

  bottom: "cccp3"  

  top: "cccp4"  

  blobs_lr: 1  

  blobs_lr: 2  

  weight_decay: 1  

  weight_decay: 0  

  convolution_param {  

    num_output: 192  

    group: 1  

    kernel_size: 1  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

      value: 0  

    }  

  }  

}  

layers {  

  name: "relu_cccp4"  

  type: RELU  

  bottom: "cccp4"  

  top: "cccp4"  

}  

layers {  

  name: "pool2"  

  type: POOLING  

  bottom: "cccp4"  

  top: "pool2"  

  pooling_param {  

    pool: AVE  

    kernel_size: 3  

    stride: 2  

  }  

}  

layers {  

  name: "drop6"  

  type: DROPOUT  

  bottom: "pool2"  

  top: "pool2"  

  dropout_param {  

    dropout_ratio: 0.5  

  }  

}  

layers {  

  name: "conv3"  

  type: CONVOLUTION  

  bottom: "pool2"  

  top: "conv3"  

  blobs_lr: 1.  

  blobs_lr: 2.  

  weight_decay: 1.  

  weight_decay: 0.  

  convolution_param {  

    num_output: 192  

    pad: 1  

    kernel_size: 3  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

    }  

  }  

}  

layers {  

  name: "relu3"  

  type: RELU  

  bottom: "conv3"  

  top: "conv3"  

}  

layers {  

  name: "cccp5"  

  type: CONVOLUTION  

  bottom: "conv3"  

  top: "cccp5"  

  blobs_lr: 1  

  blobs_lr: 2  

  weight_decay: 1  

  weight_decay: 0  

  convolution_param {  

    num_output: 192  

    group: 1  

    kernel_size: 1  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

      value: 0  

    }  

  }  

}  

layers {  

  name: "relu_cccp5"  

  type: RELU  

  bottom: "cccp5"  

  top: "cccp5"  

}  

layers {  

  name: "cccp6"  

  type: CONVOLUTION  

  bottom: "cccp5"  

  top: "cccp6"  

  blobs_lr: 0.1  

  blobs_lr: 0.1  

  weight_decay: 1  

  weight_decay: 0  

  convolution_param {  

    num_output: 10  

    group: 1  

    kernel_size: 1  

    weight_filler {  

      type: "gaussian"  

      std: 0.05  

    }  

    bias_filler {  

      type: "constant"  

      value: 0  

    }  

  }  

}  

layers {  

  name: "relu_cccp6"  

  type: RELU  

  bottom: "cccp6"  

  top: "cccp6"  

}  

layers {  

  name: "pool3"  

  type: POOLING  

  bottom: "cccp6"  

  top: "pool3"  

  pooling_param {  

    pool: AVE  

    kernel_size: 8  

    stride: 1  

  }  

}  

layers {  

  name: "accuracy"  

  type: ACCURACY  

  bottom: "pool3"  

  bottom: "label"  

  top: "accuracy"  

  include: { phase: TEST }  

}  

layers {  

  name: "loss"  

  type: SOFTMAX_LOSS  

  bottom: "pool3"  

  bottom: "label"  

  top: "loss"  

}

訓練參數

[plain]view
plaincopy

test_iter: 100  

test_interval: 500  

base_lr: 0.1  

momentum: 0.9  

weight_decay: 0.0001  

lr_policy: "step"  

gamma: 0.1  

stepsize: 100000  

display: 100  

max_iter: 120000  

snapshot: 10000  

snapshot_prefix: "cifar10_nin"