本文分爲兩部分,先寫一個入門的教程,然後再給出自己添加maxout與NIN的layer的方法
(一)
其實在Github上已經有答案了(https://github.com/BVLC/caffe/issues/684)
Here's roughly the process I follow.
-
Add a class declaration for your layer to the appropriate one of
common_layers.hpp
, data_layers.hpp
,loss_layers.hpp
, neuron_layers.hpp
,
or vision_layers.hpp
.
Include an inline implementation oftype
and
the *Blobs()
methods
to specify blob number requirements. Omit the *_gpu
declarations
if you'll only be implementing CPU code. -
Implement your layer in
layers/your_layer.cpp
.
-
SetUp
for
initialization: reading parameters, allocating buffers, etc. -
Forward_cpu
for
the function your layer computes -
Backward_cpu
for
its gradient
-
(Optional) Implement the GPU versions
Forward_gpu
and Backward_gpu
in layers/your_layer.cu
. -
Add your layer to
proto/caffe.proto
,
updating the next available ID. Also declare parameters, if needed, in this file. -
Make your layer createable by adding it to
layer_factory.cpp
. -
Write tests in
test/test_your_layer.cpp
.
Use test/test_gradient_check_util.hpp
to
check that your Forward and Backward implementations are in numerical agreement.
上面是一個大致的流程,我就直接翻譯過來吧,因爲我自己琢磨出來的步驟跟這個是一樣的。在這裏,我們就添加一個Wtf_Layer,然後作用跟Convolution_Layer一模一樣。注意這裏的命名方式,Wtf第一個字母大寫,剩下的小寫,算是一個命名規範吧,強迫症表示很舒服。
1. 首先確定要添加的layer的類型,是common_layer 還是 data_layer 還是loss_layer, neuron_layer, vision_layer ,這裏的Wtf_Layer肯定是屬vision_layer了,所以打開vision_layers.hpp
然後複製convolution_layer的相關代碼,把類名還有構造函數的名字改爲WtfLayer,如果沒有用到GPU運算,那麼把裏面的帶GPU的函數都刪掉
2. 將Wtf_layer.cpp 添加到src\caffe\layers文件夾中,代碼內容複製convolution_layer.cpp 把對應的類名修改(可以搜一下conv關鍵字,然後改爲Wtf)
3. 假如有gpu的代碼就添加響應的Wtf_layer.cu (這裏不添加了)
4. 修改proto/caffe.proto文件,找到LayerType,添加WTF,並更新ID(新的ID應該是34)。假如說Wtf_Layer有參數,比如Convolution肯定是有參數的,那麼添加WtfParameter類
5. 在layer_factory.cpp中添加響應的代碼,就是一堆if ... else的那片代碼
6. 這個可以不做,但是爲了結果還是做一個,就是寫一個測試文件,檢查前向後向傳播的數據是否正確。gradient_check的原理可以參考UFLDL教程的對應章節
之後我會更新我自己寫的maxout_layer的demo,在這立一個flag以鞭策自己完成吧╮(╯▽╰)╭
(二) 如何添加maxout_layer
表示被bengio的maxout給搞鬱悶了,自己擺出一個公式巴拉巴拉說了一堆,結果用到卷積層的maxout卻給的另一種方案,吐槽無力,不過後來又想了下應該是bengio沒表述清楚的問題。
我的maxout的算法思路是這樣的,首先要確定一個group_size變量,表示最大值是在group_size這樣一個規模的集合下挑選出來的,簡而言之就是給定group_size個數,取最大。確定好group_size變量,然後讓卷積層的output_num變爲原來的group_size倍,這樣輸出的featuremap的個數就變爲原來的group_size倍,然後以group_size爲一組劃分這些featuremap,每組裏面挑出響應最大的點構成一個新的featuremap,這樣就得到了maxout層的輸出。
![]()
要是還不明白我就拿上面的圖來說一下,上面一共9張圖,相當於卷積層輸出9張featuremap,我們每3個爲一組,那麼maxout層輸出9/3=3張featuremap,對於每組featuremaps,比如我們挑出綠色的三張featuremaps,每張大小爲w*h,那麼聲明一個新的output_featuremap大小爲w*h,遍歷output_featuremap的每個點,要賦的數值爲三張綠色featuremap對應點的最大的那個,也就是三個數裏面選最大的,這樣就輸出了一張output_featuremap,剩下的組類似操作。
我覺得到這應該明白maxout的原理跟算法了吧= =,下面就直接貼代碼了
新建一個maxout_layer.cpp放到src/caffe/layer文件夾下
-
#include <cstdio>
-
-
#include <vector>
-
-
-
-
#include "caffe/filler.hpp"
-
-
#include "caffe/layer.hpp"
-
-
#include "caffe/util/im2col.hpp"
-
-
#include "caffe/util/math_functions.hpp"
-
-
#include "caffe/vision_layers.hpp"
-
-
-
-
namespace caffe {
-
-
-
-
template <typename Dtype>
-
-
void MaxoutLayer<Dtype>::SetUp(const vector<Blob<Dtype>*>& bottom,
-
-
vector<Blob<Dtype>*>* top) {
-
-
Layer<Dtype>::SetUp(bottom, top);
-
-
printf("===============================================================has go into setup !==============================================\n");
-
-
MaxoutParameter maxout_param = this->layer_param_.maxout_param();
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
num_output_ = this->layer_param_.maxout_param().num_output();
-
-
CHECK_GT(num_output_, 0) << "output number cannot be zero.";
-
-
-
-
-
-
num_ = bottom[0]->num();
-
-
channels_ = bottom[0]->channels();
-
-
height_ = bottom[0]->height();
-
-
width_ = bottom[0]->width();
-
-
-
-
-
-
-
-
for (int bottom_id = 1; bottom_id < bottom.size(); ++bottom_id) {
-
-
CHECK_EQ(num_, bottom[bottom_id]->num()) << "Inputs must have same num.";
-
-
CHECK_EQ(channels_, bottom[bottom_id]->channels())
-
-
<< "Inputs must have same channels.";
-
-
CHECK_EQ(height_, bottom[bottom_id]->height())
-
-
<< "Inputs must have same height.";
-
-
CHECK_EQ(width_, bottom[bottom_id]->width())
-
-
<< "Inputs must have same width.";
-
-
}
-
-
-
-
-
-
CHECK_EQ(channels_ % num_output_, 0)
-
-
<< "Number of channel should be multiples of output number.";
-
-
-
-
group_size_ = channels_ / num_output_;
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
(*top)[0]->Reshape(num_, num_output_, height_, width_);
-
max_idx_.Reshape(num_, num_output_, height_, width_);
-
-
-
-
}
-
-
-
-
-
-
template <typename Dtype>
-
-
Dtype MaxoutLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
-
-
vector<Blob<Dtype>*>* top) {
-
-
-
-
int featureSize = height_ * width_;
-
-
Dtype* mask = NULL;
-
-
mask = max_idx_.mutable_cpu_data();
-
-
-
-
-
-
const int top_count = (*top)[0]->count();
-
-
caffe_set(top_count, Dtype(0), mask);
-
-
-
-
-
-
-
-
-
for (int i = 0; i < bottom.size(); ++i) {
-
-
const Dtype* bottom_data = bottom[i]->cpu_data();
-
-
Dtype* top_data = (*top)[i]->mutable_cpu_data();
-
-
-
-
for (int n = 0; n < num_; n ++) {
-
-
-
-
for (int o = 0; o < num_output_; o ++) {
-
-
for (int g = 0; g < group_size_; g ++) {
-
-
if (g == 0) {
-
-
for (int h = 0; h < height_; h ++) {
-
-
for (int w = 0; w < width_; w ++) {
-
-
int index = w + h * width_;
-
-
top_data[index] = bottom_data[index];
-
-
mask[index] = index;
-
-
}
-
-
}
-
-
}
-
-
else {
-
-
for (int h = 0; h < height_; h ++) {
-
-
for (int w = 0; w < width_; w ++) {
-
-
int index0 = w + h * width_;
-
-
int index1 = index0 + g * featureSize;
-
-
if (top_data[index0] < bottom_data[index1]) {
-
-
top_data[index0] = bottom_data[index1];
-
-
mask[index0] = index1;
-
-
}
-
-
}
-
-
}
-
-
}
-
-
}
-
-
bottom_data += featureSize * group_size_;
-
-
top_data += featureSize;
-
-
mask += featureSize;
-
-
}
-
-
}
-
-
}
-
-
-
-
return Dtype(0.);
-
-
}
-
-
-
-
template <typename Dtype>
-
-
void MaxoutLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
-
-
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom) {
-
-
if (!propagate_down[0]) {
-
-
return;
-
-
}
-
-
const Dtype* top_diff = top[0]->cpu_diff();
-
-
Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
-
-
caffe_set((*bottom)[0]->count(), Dtype(0), bottom_diff);
-
-
const Dtype* mask = max_idx_.mutable_cpu_data();
-
-
int featureSize = height_ * width_;
-
-
-
for (int i = 0; i < top.size(); i ++) {
-
-
const Dtype* top_diff = top[i]->cpu_diff();
-
-
Dtype* bottom_diff = (*bottom)[i]->mutable_cpu_diff();
-
-
-
-
for (int n = 0; n < num_; n ++) {
-
-
-
-
for (int o = 0; o < num_output_; o ++) {
-
-
for (int h = 0; h < height_; h ++) {
-
-
for (int w = 0; w < width_; w ++) {
-
-
int index = w + h * width_;
-
-
int bottom_index = mask[index];
-
-
bottom_diff[bottom_index] += top_diff[index];
-
-
}
-
-
}
-
-
bottom_diff += featureSize * group_size_;
-
-
top_diff += featureSize;
-
-
mask += featureSize;
-
-
}
-
-
}
-
-
}
-
-
}
-
-
-
-
-
-
-
-
-
-
-
-
INSTANTIATE_CLASS(MaxoutLayer);
-
-
-
-
}
裏面的亂碼是中文,到了linux裏面就亂碼了,不影響,還一個printf是測試用的(要被笑話用printf了= =)
vision_layers.hpp 裏面添加下面的代碼
-
-
-
template <typename Dtype>
-
class MaxoutLayer : public Layer<Dtype> {
-
public:
-
explicit MaxoutLayer(const LayerParameter& param)
-
: Layer<Dtype>(param) {}
-
virtual void SetUp(const vector<Blob<Dtype>*>& bottom,
-
vector<Blob<Dtype>*>* top);
-
-
virtual inline LayerParameter_LayerType type() const {
-
return LayerParameter_LayerType_MAXOUT;
-
}
-
-
protected:
-
virtual Dtype Forward_cpu(const vector<Blob<Dtype>*>& bottom,
-
vector<Blob<Dtype>*>* top);
-
-
-
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
-
const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
-
-
-
-
int num_output_;
-
int num_;
-
int channels_;
-
int height_;
-
int width_;
-
int group_size_;
-
Blob<Dtype> max_idx_;
-
-
};
剩下的是layer_factory.cpp 的改動,不說明了,然後是proto文件的改動
-
message MaxoutParameter {
-
optional uint32 num_output = 1;
-
}
額,當然還有proto文件的其他改動也不說明了,還有test文件,我沒寫,因爲我自己跑了下demo,沒啥問題,所以代碼可以說是正確的。
不過要說明的是,目前的代碼不能接在全連接層後面,是我裏面有幾句代碼寫的有問題,之後我會改動一下,問題不大。
然後就是NIN的實現了,表示自己寫的渣一樣的代碼啊,效率目測很低。哦對了,這些都是CPU算的,GPU不大會,還沒打算寫。
NIN_layer 的實現
我之前一直以爲Github上的network in network 是有問題的,事實證明,我最後也寫成了Github上面的樣子= =所以大家自行搜索caffe+network in network吧……不過得翻牆下載,所以我就把網絡格式的代碼直接貼出來(cifar10數據庫的網絡結構)
訓練參數
-
test_iter: 100
-
test_interval: 500
-
base_lr: 0.1
-
momentum: 0.9
-
weight_decay: 0.0001
-
lr_policy: "step"
-
gamma: 0.1
-
stepsize: 100000
-
display: 100
-
max_iter: 120000
-
snapshot: 10000
-
snapshot_prefix: "cifar10_nin"
該文章算是告一段落了,剩下的任務就是如何訓練得到state-of-the-art了