Caffe源碼導讀（7）：LRN層的實現

LRN全稱爲Local Response Normalization，即局部響應歸一化層，具體實現在CAFFE_ROOT/src/caffe/layers/lrn_layer.cpp和同一目錄下lrn_layer.cu中。

該層需要參數有：

norm_region：選擇對相鄰通道間歸一化還是通道內空間區域歸一化，默認爲ACROSS_CHANNELS，即通道間歸一化；

local_size：兩種表示（1）通道間歸一化時表示求和的通道數；（2）通道內歸一化時表示求和區間的邊長；默認值爲5；

alpha：縮放因子（詳細見後面），默認值爲1；

beta：指數項（詳細見後面），默認值爲5；

局部響應歸一化層完成一種“臨近抑制”操作，對局部輸入區域進行歸一化。

在通道間歸一化模式中，局部區域範圍在相鄰通道間，但沒有空間擴展（即尺寸爲 local_size x 1 x 1）；

在通道內歸一化模式中，局部區域在空間上擴展，但只針對獨立通道進行（即尺寸爲 1 x local_size x local_size）；

每個輸入值都將除以

【卜居注：寫作時的 Caffe 版本較舊，新版 Caffe 已經增加參數 k，變爲 (k + (alpha / n) ……），感謝 @雲峯同學指出】

其中n爲局部尺寸大小local_size, alpha和beta前面已經定義。

求和將在當前值處於中間位置的局部區域內進行（如果有必要則進行補零）。

下面我們看Caffe代碼如何實現。打開CAFFE_ROOT/include/caffe/vision_layers.hpp，從第242行開始看起：

  // Forward declare PoolingLayer and SplitLayer for use in LRNLayer.
template <typename Dtype> class PoolingLayer;
template <typename Dtype> class SplitLayer;


/**
 * @brief Normalize the input in a local region across or within feature maps.
 *
 * TODO(dox): thorough documentation for Forward, Backward, and proto params.
 */
template <typename Dtype>
class LRNLayer : public Layer<Dtype> {
 public:
  explicit LRNLayer(const LayerParameter& param)
      : Layer<Dtype>(param) {}
  virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top);
  virtual void Reshape(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top);


  virtual inline LayerParameter_LayerType type() const {
    return LayerParameter_LayerType_LRN;
  }
  virtual inline int ExactNumBottomBlobs() const { return 1; }
  virtual inline int ExactNumTopBlobs() const { return 1; }


 protected:
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top);
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top);
  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

  virtual void CrossChannelForward_cpu(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top);
  virtual void CrossChannelForward_gpu(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top);
  virtual void WithinChannelForward(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top);
  virtual void CrossChannelBackward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
  virtual void CrossChannelBackward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);
  virtual void WithinChannelBackward(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, vector<Blob<Dtype>*>* bottom);

  int size_;
  int pre_pad_;
  Dtype alpha_;
  Dtype beta_;
  int num_;
  int channels_;
  int height_;
  int width_;

  // Fields used for normalization ACROSS_CHANNELS
  // scale_ stores the intermediate summing results
  Blob<Dtype> scale_;

  // Fields used for normalization WITHIN_CHANNEL
  shared_ptr<SplitLayer<Dtype> > split_layer_;
  vector<Blob<Dtype>*> split_top_vec_;
  shared_ptr<PowerLayer<Dtype> > square_layer_;
  Blob<Dtype> square_input_;
  Blob<Dtype> square_output_;
  vector<Blob<Dtype>*> square_bottom_vec_;
  vector<Blob<Dtype>*> square_top_vec_;
  shared_ptr<PoolingLayer<Dtype> > pool_layer_;
  Blob<Dtype> pool_output_;
  vector<Blob<Dtype>*> pool_top_vec_;
  shared_ptr<PowerLayer<Dtype> > power_layer_;
  Blob<Dtype> power_output_;
  vector<Blob<Dtype>*> power_top_vec_;
  shared_ptr<EltwiseLayer<Dtype> > product_layer_;
  Blob<Dtype> product_input_;
  vector<Blob<Dtype>*> product_bottom_vec_;
};

內容較多，可能看一眼記不住所有的成員變量和函數，但記住一點，凡是Layer類型肯定都包含Forward()和Backward()，以及LayerSetUp()和Reshape()，這些在頭文件中不必細看。關注的是以“_”結尾的成員變量，這些是和算法息息相關的。

很高興看到了num_, height_, width_, channels_，這四個變量定義了該層輸入圖像的尺寸信息，是一個num_ x channels_ x height_ x width_的四維Blob矩陣（想不通？就當作視頻流吧，前兩維是寬高，第三維是顏色，第四維是時間）。

另外看到了alpha_, beta_, 這兩個就是我們上面公式中的參數。

公式中的n（local_size）在類中用size_表示。

上面提到過需要補零，所以定義了pre_pad_變量。

在ACROSS_CHANNELS模式下，我們只需要用到scale_這個Blob矩陣，後面定義都可以忽略了~~好開森~~

讀完了頭文件中的聲明，是不是覺得挺簡單？我們接着看下實現細節，打開CAFFE_ROOT/src/caffe/layers/lrn_layer.cpp，從頭看起，第一個實現函數爲LayerSetUp()，代碼如下：

template <typename Dtype>
void LRNLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  size_ = this->layer_param_.lrn_param().local_size();
  CHECK_EQ(size_ % 2, 1) << "LRN only supports odd values for local_size";
  pre_pad_ = (size_ - 1) / 2;
  alpha_ = this->layer_param_.lrn_param().alpha();
  beta_ = this->layer_param_.lrn_param().beta();
  if (this->layer_param_.lrn_param().norm_region() ==
      LRNParameter_NormRegion_WITHIN_CHANNEL) {
    // Set up split_layer_ to use inputs in the numerator and denominator.
    split_top_vec_.clear();
    split_top_vec_.push_back(&product_input_);
    split_top_vec_.push_back(&square_input_);
    LayerParameter split_param;
    split_layer_.reset(new SplitLayer<Dtype>(split_param));
    split_layer_->SetUp(bottom, &split_top_vec_);
    // Set up square_layer_ to square the inputs.
    square_bottom_vec_.clear();
    square_top_vec_.clear();
    square_bottom_vec_.push_back(&square_input_);
    square_top_vec_.push_back(&square_output_);
    LayerParameter square_param;
    square_param.mutable_power_param()->set_power(Dtype(2));
    square_layer_.reset(new PowerLayer<Dtype>(square_param));
    square_layer_->SetUp(square_bottom_vec_, &square_top_vec_);
    // Set up pool_layer_ to sum over square neighborhoods of the input.
    pool_top_vec_.clear();
    pool_top_vec_.push_back(&pool_output_);
    LayerParameter pool_param;
    pool_param.mutable_pooling_param()->set_pool(
        PoolingParameter_PoolMethod_AVE);
    pool_param.mutable_pooling_param()->set_pad(pre_pad_);
    pool_param.mutable_pooling_param()->set_kernel_size(size_);
    pool_layer_.reset(new PoolingLayer<Dtype>(pool_param));
    pool_layer_->SetUp(square_top_vec_, &pool_top_vec_);
    // Set up power_layer_ to compute (1 + alpha_/N^2 s)^-beta_, where s is
    // the sum of a squared neighborhood (the output of pool_layer_).
    power_top_vec_.clear();
    power_top_vec_.push_back(&power_output_);
    LayerParameter power_param;
    power_param.mutable_power_param()->set_power(-beta_);
    power_param.mutable_power_param()->set_scale(alpha_);
    power_param.mutable_power_param()->set_shift(Dtype(1));
    power_layer_.reset(new PowerLayer<Dtype>(power_param));
    power_layer_->SetUp(pool_top_vec_, &power_top_vec_);
    // Set up a product_layer_ to compute outputs by multiplying inputs by the
    // inverse demoninator computed by the power layer.
    product_bottom_vec_.clear();
    product_bottom_vec_.push_back(&product_input_);
    product_bottom_vec_.push_back(&power_output_);
    LayerParameter product_param;
    EltwiseParameter* eltwise_param = product_param.mutable_eltwise_param();
    eltwise_param->set_operation(EltwiseParameter_EltwiseOp_PROD);
    product_layer_.reset(new EltwiseLayer<Dtype>(product_param));
    product_layer_->SetUp(product_bottom_vec_, top);
  }
}

這個函數實現了參數的初始化過程。首先從layer_param_對象中提取出size_的值，並檢查是否爲奇數，如果不是則報錯；之後用size_計算pre_pad_的值，在前後各補一半0。接着alpha_和beta_也被初始化。如果是WITHIN_CHANNEL模式，那麼還需要初始化一系列中間子層，這裏我們不關心，因爲我們用ACROSS_CHANNELS模式。這麼簡單，還是好開森~~

接下來看Reshape()函數的實現：

template <typename Dtype>
void LRNLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      vector<Blob<Dtype>*>* top) {
  num_ = bottom[0]->num();
  channels_ = bottom[0]->channels();
  height_ = bottom[0]->height();
  width_ = bottom[0]->width();
  switch (this->layer_param_.lrn_param().norm_region()) {
  case LRNParameter_NormRegion_ACROSS_CHANNELS:
    (*top)[0]->Reshape(num_, channels_, height_, width_);
    scale_.Reshape(num_, channels_, height_, width_);
    break;
  case LRNParameter_NormRegion_WITHIN_CHANNEL:
    split_layer_->Reshape(bottom, &split_top_vec_);
    square_layer_->Reshape(square_bottom_vec_, &square_top_vec_);
    pool_layer_->Reshape(square_top_vec_, &pool_top_vec_);
    power_layer_->Reshape(pool_top_vec_, &power_top_vec_);
    product_layer_->Reshape(product_bottom_vec_, top);
    break;
  }
}

首先根據bottom的尺寸初始化了num_, channels_, height_, width_這四個尺寸參數，之後根據歸一化模式進行不同設置。在ACROSS_CHANNELS模式中，將top尺寸設置爲和bottom一樣大（num_, channels_, height_, width_)，然後將scale_的尺寸也設置爲一樣大，這樣我們在進行歸一化時，只要逐點將scale_值乘以bottom值，就得到相應的top值。scale_值需要根據文章開頭的計算公式得到，我們進一步考察怎麼實現。

看下一個函數：

template <typename Dtype>
void LRNLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  switch (this->layer_param_.lrn_param().norm_region()) {
  case LRNParameter_NormRegion_ACROSS_CHANNELS:
    CrossChannelForward_cpu(bottom, top);
    break;
  case LRNParameter_NormRegion_WITHIN_CHANNEL:
    WithinChannelForward(bottom, top);
    break;
  default:
    LOG(FATAL) << "Unknown normalization region.";
  }
}

很簡單，根據歸一化模式調用相應的Forward函數。我們這裏看CrossChannelForward_cpu()這個函數，代碼如下：

template <typename Dtype>
void LRNLayer<Dtype>::CrossChannelForward_cpu(
    const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = (*top)[0]->mutable_cpu_data();
  Dtype* scale_data = scale_.mutable_cpu_data();//用指針獲取每個Blob對象的內存地址，便於後面操作
  // start with the constant value
  for (int i = 0; i < scale_.count(); ++i) {//初始化值爲1.0
    scale_data[i] = 1.;
  }
  Blob<Dtype> padded_square(1, channels_ + size_ - 1, height_, width_);//補零後的Blob，第三維尺寸比bottom大了size_ - 1；
  Dtype* padded_square_data = padded_square.mutable_cpu_data();
  caffe_set(padded_square.count(), Dtype(0), padded_square_data);//先清零
  Dtype alpha_over_size = alpha_ / size_;//預先計算公式中的alpha/n
  // go through the images
  for (int n = 0; n < num_; ++n) {//bottom的第四維尺寸num_，需要分解爲單個來做歸一化
    // compute the padded square
    caffe_sqr(channels_ * height_ * width_,
        bottom_data + bottom[0]->offset(n),
        padded_square_data + padded_square.offset(0, pre_pad_));//計算bottom的平方，放入padded_square矩陣中，前pre_pad_個位置依舊0
    // Create the first channel scale
    for (int c = 0; c < size_; ++c) {//對n個通道平方求和並乘以預先算好的（alpha/n）,累加至scale_中（實現計算 1 + sum_under_i(x_i^2)）
      caffe_axpy<Dtype>(height_ * width_, alpha_over_size,
          padded_square_data + padded_square.offset(0, c),
          scale_data + scale_.offset(n, 0));
    }
    for (int c = 1; c < channels_; ++c) {//這裏使用了類似FIFO的形式計算其餘scale_參數，每次向後移動一個單位，加頭去尾，避免重複計算求和
      // copy previous scale
      caffe_copy<Dtype>(height_ * width_,
          scale_data + scale_.offset(n, c - 1),
          scale_data + scale_.offset(n, c));
      // add head
      caffe_axpy<Dtype>(height_ * width_, alpha_over_size,
          padded_square_data + padded_square.offset(0, c + size_ - 1),
          scale_data + scale_.offset(n, c));
      // subtract tail
      caffe_axpy<Dtype>(height_ * width_, -alpha_over_size,
          padded_square_data + padded_square.offset(0, c - 1),
          scale_data + scale_.offset(n, c));
    }
  }

  // In the end, compute output
  caffe_powx<Dtype>(scale_.count(), scale_data, -beta_, top_data);//計算求指數，由於將除法轉換爲乘法，故指數變負
  caffe_mul<Dtype>(scale_.count(), top_data, bottom_data, top_data);//bottom .* scale_ -> top
}

可能你對caffe_axpy, caffe_sqr, caffe_powx, caffe_mul還不熟悉，其實都是很簡單的數學計算，在CAFFE_ROOT/include/caffe/util/math_functions.hpp中有聲明。

template <typename Dtype>
void caffe_axpy(const int N, const Dtype alpha, const Dtype* X,
    Dtype* Y);

實現如下操作：Y = alpha * X + Y；其中X, Y爲N個元素的向量。

template <typename Dtype>
void caffe_powx(const int n, const Dtype* a, const Dtype b, Dtype* y);

實現如下操作：y = a^b，其中a, y爲n個元素的向量，b爲標量。

其餘請自己推導。

Caffe源碼導讀（7）：LRN層的實現

Nvidia Pascal GPU 架構詳解

NESASM教程——第七天——使用內存

Caffe源碼導讀（7）：LRN層的實現

Caffe代碼導讀（0）：路線圖

NESASM教程——第五天——主角出場

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結