caffe源碼閱讀《二》layer

首先layer這個類是一個基類,所以他是沒有cpp實現的,可以看一下它的cpp代碼

#include "caffe/layer.hpp"

namespace caffe {

INSTANTIATE_CLASS(Layer);

}  // namespace caffe

除了註冊函數之外一無所有,所以他的全部寶藏都在hpp文件裏,我們直接在include目錄查看他的hpp文件。

打開hpp文件,首先看到的是一個很有意思的註釋

/**
 Forward declare boost::thread instead of including boost/thread.hpp
 to avoid a boost/NVCC issues (#1009, #1010) on OSX.
 */

它說用boost::thread這個聲明來代替引用boost/thread.hpp頭文件,以解決boost/NVCC的issues(#1009, #1010),有興趣的可以去github上看看這兩個issue。
它緊接着是

namespace boost { class mutex; }

聲明瞭這個類,這個類就是和那個問題有關,如果遇到了有興趣的可以去研究一下。
好進入正題,這裏有個簡短的介紹。

/**
 * @brief An interface for the units of computation which can be composed into a
 *        Net.
 *
 * Layer%s must implement a Forward function, in which they take their input
 * (bottom) Blob%s (if any) and compute their output Blob%s (if any).
 * They may also implement a Backward function, in which they compute the error
 * gradients with respect to their input Blob%s, given the error gradients with
 * their output Blob%s.
 */
  • 介紹了一下這個layer類,它是把各種簡短的單元連接成一個網絡的接口。layer這個類的子類必須實現Forward函數。
  • 用bottom Blob作爲輸入,計算輸出的blob。
  • 它們也許會實現一個Backward函數,這裏用的是也許了,也許的意思就是說也可以不用實現反傳函數。因爲不是每個層都會有反傳的,最典型的就是輸入層,它已經是最上層了,所以不需要反傳給任何層。
  • 用輸入的blob去計算梯度誤差,然後把這個梯度誤差傳給輸出的blob

接着往下看,就要進入這個類的實現了

template <typename Dtype>

這裏定義了一個模板類型,叫Dtype,這個Dtype呢其實就是個float它是在之前的函數註冊的時候指定的,可以選擇float或者double,這個是caffe的工廠模式的性質,之後統一介紹。

構造函數Layer

  explicit Layer(const LayerParameter& param)

這裏聲明瞭一個顯式類型的構造函數,C++裏面有一種用法,就是聲明一個layer的對象直接等於一個LayerParameter的參數,它就可以直接構造這個對象了。這個就叫做隱式轉換,如果沒有聲明explicit的構造函數全部默認爲隱式轉換,這樣的話假設有多個構造函數,但是參數類型模棱兩可時候就很容易出現歧義了,比方說intchar類型,本身就有很大的交集,這樣你直接一個類=一個字母,它不知道是根據ascii碼調用int類型的函數還是根據字符調用字符類型的函數。所以加上一個explicit,強制執行顯示聲明,必須老老實實的類名 對象名(參數)。

他的實現

拷貝phase

phase_ = param.phase();

拷貝blob

      if (layer_param_.blobs_size() > 0) {
        blobs_.resize(layer_param_.blobs_size());
        for (int i = 0; i < layer_param_.blobs_size(); ++i) {
          blobs_[i].reset(new Blob<Dtype>());
          blobs_[i]->FromProto(layer_param_.blobs(i));
        }
      }

把上面一層的blob和下面一層的blob都拷貝進來。

Setup函數

註釋裏說了

  /**
   * @brief Implements common layer setup functionality.
   *
   * @param bottom the preshaped input blobs
   * @param top
   *     the allocated but unshaped output blobs, to be shaped by Reshape
   *
   * Checks that the number of bottom and top blobs is correct.
   * Calls LayerSetUp to do special layer setup for individual layer types,
   * followed by Reshape to set up sizes of top blobs and internal buffers.
   * Sets up the loss weight multiplier blobs for any non-zero loss weights.
   * This method may not be overridden.
   */

簡單的實現了公有的層的setup函數,一共有兩個參數

  • bottom層的已經預分配形狀的輸入blob
  • top層,還沒有分配形狀的blob,需要在這裏對它進行塑性
    也就是說他的輸入是固定的,但是輸出是需要根據輸入以及這一層的參數決定的,比方說卷積層就要根據步長來決定輸出縮小了幾倍。
    它的具體步驟在這個註釋裏也寫到了。
  • 檢查bottom層和top層的數量是否正確。
  • 調用LayerSetup去對每一層做特定化的setup,針對不同層使用不同的方法。根據上一層的形狀去設置下一層的blob的大小,還有緩存空間等等都給分配好。
  • 設置loss的權重和做乘法用的單元的blob
  • 這個函數就是個抽象的流程,你最好不要重寫這個Setup函數了。
    看看他的實現
  void SetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
    CheckBlobCounts(bottom, top);
    LayerSetUp(bottom, top);
    Reshape(bottom, top);
    SetLossWeights(top);
  }

這個函數就是個大綱,我們嚴格遵循最後的原則,想要實現就實現函數內容的具體的步驟,它的這個大綱就不要動了。

LayerSetUp 函數

首先還是看註釋

  /**
   * @brief Does layer-specific setup: your layer should implement this function
   *        as well as Reshape.
   *
   * @param bottom
   *     the preshaped input blobs, whose data fields store the input data for
   *     this layer
   * @param top
   *     the allocated but unshaped output blobs
   *
   * This method should do one-time layer specific setup. This includes reading
   * and processing relevent parameters from the <code>layer_param_</code>.
   * Setting up the shapes of top blobs and internal buffers should be done in
   * <code>Reshape</code>, which will be called before the forward pass to
   * adjust the top blob sizes.
   */

他首先說自己是個明確的setup函數,然後又說你的層應該實現這個函數,那意思就是說這裏還是隻是一個聲明,作爲一個虛函數,具體的需要到具體的類裏面去實現它。
然後看他的兩個參數,和Setup是一樣的參數,都是傳入一個blob,然後去分配傳出的blob。

Reshape 函數

這個Reshape也是一個抽象函數,看他的註釋

  /**
   * @brief Adjust the shapes of top blobs and internal buffers to accommodate
   *        the shapes of the bottom blobs.
   *
   * @param bottom the input blobs, with the requested input shapes
   * @param top the top blobs, which should be reshaped as needed
   *
   * This method should reshape top blobs as needed according to the shapes
   * of the bottom (input) blobs, as well as reshaping any internal buffers
   * and making any other necessary adjustments so that the layer can
   * accommodate the bottom blobs.
  */

連參數都是一樣的,他的作用是調整top blob的形狀和分配bottom blob的緩存區大小。

Forward 函數

前傳函數,這個是重頭戲了

  /**
   * @brief Given the bottom blobs, compute the top blobs and the loss.
   *
   * @param bottom
   *     the input blobs, whose data fields store the input data for this layer
   * @param top
   *     the preshaped output blobs, whose data fields will store this layers'
   *     outputs
   * \return The total loss from the layer.
   *
   * The Forward wrapper calls the relevant device wrapper function
   * (Forward_cpu or Forward_gpu) to compute the top blob values given the
   * bottom blobs.  If the layer has any non-zero loss_weights, the wrapper
   * then computes and returns the loss.
   *
   * Your layer should implement Forward_cpu and (optionally) Forward_gpu.
   */

他的作用是根據bottom層計算出top層的blob和loss
參數也是同樣的這兩個,返回值是這一層總的loss。
Forward函數是根據設備調用不同的函數,也就是cpu就調用cpu的函數,gpu就調用gpu的函數,通過輸入計算輸出的值,如果這個層有任何非零的loss權重那麼這個層就要算一個loss出來返回。你的層必須實現Forward_cpu,如果有必要的話也可以實現Forward_gpu
所以他這個函數也是一個虛函數,就是等着你去實現它。

Backward 函數

接着是反傳函數,這個反傳函數也是個虛函數,我們還是先來看他的註釋

  /**
   * @brief Given the top blob error gradients, compute the bottom blob error
   *        gradients.
   *
   * @param top
   *     the output blobs, whose diff fields store the gradient of the error
   *     with respect to themselves
   * @param propagate_down
   *     a vector with equal length to bottom, with each index indicating
   *     whether to propagate the error gradients down to the bottom blob at
   *     the corresponding index
   * @param bottom
   *     the input blobs, whose diff fields will store the gradient of the error
   *     with respect to themselves after Backward is run
   *
   * The Backward wrapper calls the relevant device wrapper function
   * (Backward_cpu or Backward_gpu) to compute the bottom blob diffs given the
   * top blob diffs.
   *
   * Your layer should implement Backward_cpu and (optionally) Backward_gpu.
   */

簡介:給一個top blob的梯度誤差來計算bottom blob的梯度誤差

參數

  • top層的blob
  • propagate_down bool類型的變量,用來決定要不要反傳的狀態
  • bottom層的blob

返回成員變量的函數

有很多返回成員變量的函數,他們的名字就是變量的名字,例如blobslayer_param

  /**
   * @brief Returns the vector of learnable parameter blobs.
   */
  vector<shared_ptr<Blob<Dtype> > >& blobs() {
    return blobs_;
  }

  /**
   * @brief Returns the layer parameter.
   */
  const LayerParameter& layer_param() const { return layer_param_; }

ToProto 函數

他也是一個虛函數

  /**
   * @brief Writes the layer parameter to a protocol buffer
   */
  virtual void ToProto(LayerParameter* param, bool write_diff = false);

它的作用是把緩存裏的參數寫道proto文件裏

loss 函數

他的作用就是返回對應blob的loss了

  /**
   * @brief Returns the scalar loss associated with a top blob at a given index.
   */
  inline Dtype loss(const int top_index) const {
    return (loss_.size() > top_index) ? loss_[top_index] : Dtype(0);
  }

可以看到他做了一個範圍的判斷,如果輸入還沒有loss多,那麼就把多出來的loss設置成0

set_loss 函數

顧名思義啊,這個函數就是用來設置loss的初始值的。

  /**
   * @brief Sets the loss associated with a top blob at a given index.
   */
  inline void set_loss(const int top_index, const Dtype value) {
    if (loss_.size() <= top_index) {
      loss_.resize(top_index + 1, Dtype(0));
    }
    loss_[top_index] = value;
  }

首先resize了loss_這個vector的size,保證和輸入的是一致的,然後就開始賦值了。

返回各種狀態的函數

  /**
   * @brief Returns the layer type.
   */
  virtual inline const char* type() const { return ""; }

  /**
   * @brief Returns the exact number of bottom blobs required by the layer,
   *        or -1 if no exact number is required.
   *
   * This method should be overridden to return a non-negative value if your
   * layer expects some exact number of bottom blobs.
   */
  virtual inline int ExactNumBottomBlobs() const { return -1; }
  /**
   * @brief Returns the minimum number of bottom blobs required by the layer,
   *        or -1 if no minimum number is required.
   *
   * This method should be overridden to return a non-negative value if your
   * layer expects some minimum number of bottom blobs.
   */
  virtual inline int MinBottomBlobs() const { return -1; }
  /**
   * @brief Returns the maximum number of bottom blobs required by the layer,
   *        or -1 if no maximum number is required.
   *
   * This method should be overridden to return a non-negative value if your
   * layer expects some maximum number of bottom blobs.
   */
  virtual inline int MaxBottomBlobs() const { return -1; }
  /**
   * @brief Returns the exact number of top blobs required by the layer,
   *        or -1 if no exact number is required.
   *
   * This method should be overridden to return a non-negative value if your
   * layer expects some exact number of top blobs.
   */
  virtual inline int ExactNumTopBlobs() const { return -1; }
  /**
   * @brief Returns the minimum number of top blobs required by the layer,
   *        or -1 if no minimum number is required.
   *
   * This method should be overridden to return a non-negative value if your
   * layer expects some minimum number of top blobs.
   */
  virtual inline int MinTopBlobs() const { return -1; }
  /**
   * @brief Returns the maximum number of top blobs required by the layer,
   *        or -1 if no maximum number is required.
   *
   * This method should be overridden to return a non-negative value if your
   * layer expects some maximum number of top blobs.
   */
  virtual inline int MaxTopBlobs() const { return -1; }
  /**
   * @brief Returns true if the layer requires an equal number of bottom and
   *        top blobs.
   *
   * This method should be overridden to return true if your layer expects an
   * equal number of bottom and top blobs.
   */
  virtual inline bool EqualNumBottomTopBlobs() const { return false; }

  /**
   * @brief Return whether "anonymous" top blobs are created automatically
   *        by the layer.
   *
   * If this method returns true, Net::Init will create enough "anonymous" top
   * blobs to fulfill the requirement specified by ExactNumTopBlobs() or
   * MinTopBlobs().
   */
  virtual inline bool AutoTopBlobs() const { return false; }

  /**
   * @brief Return whether to allow force_backward for a given bottom blob
   *        index.
   *
   * If AllowForceBackward(i) == false, we will ignore the force_backward
   * setting and backpropagate to blob i only if it needs gradient information
   * (as is done when force_backward == false).
   */
  virtual inline bool AllowForceBackward(const int bottom_index) const {
    return true;

param_propagate_down 函數

判斷是否需要計算梯度的一個函數

  /**
   * @brief Specifies whether the layer should compute gradients w.r.t. a
   *        parameter at a particular index given by param_id.
   *
   * You can safely ignore false values and always compute gradients
   * for all parameters, but possibly with wasteful computation.
   */
  inline bool param_propagate_down(const int param_id) {
    return (param_propagate_down_.size() > param_id) ?
        param_propagate_down_[param_id] : false;
  }

如果需要計算梯度的話就返回true,後面計算梯度的時候就每一個參數都去判斷一下,如果需要就算一下梯度,如果不需要就不算。

set_param_propagate_down 函數

設置是否需要返回梯度

  /**
   * @brief Sets whether the layer should compute gradients w.r.t. a
   *        parameter at a particular index given by param_id.
   */
  inline void set_param_propagate_down(const int param_id, const bool value) {
    if (param_propagate_down_.size() <= param_id) {
      param_propagate_down_.resize(param_id + 1, true);
    }
    param_propagate_down_[param_id] = value;
  }

上面那個函數是判斷是否返回梯度,這個函數就是設置要不要返回。

CheckBlobCounts 函數

檢查bottom blob的各個數字是否和top blob匹配


  /**
   * Called by the parent Layer's SetUp to check that the number of bottom
   * and top Blobs provided as input match the expected numbers specified by
   * the {ExactNum,Min,Max}{Bottom,Top}Blobs() functions.
   */
  virtual void CheckBlobCounts(const vector<Blob<Dtype>*>& bottom,
                               const vector<Blob<Dtype>*>& top) {
    if (ExactNumBottomBlobs() >= 0) {
      CHECK_EQ(ExactNumBottomBlobs(), bottom.size())
          << type() << " Layer takes " << ExactNumBottomBlobs()
          << " bottom blob(s) as input.";
    }
    if (MinBottomBlobs() >= 0) {
      CHECK_LE(MinBottomBlobs(), bottom.size())
          << type() << " Layer takes at least " << MinBottomBlobs()
          << " bottom blob(s) as input.";
    }
    if (MaxBottomBlobs() >= 0) {
      CHECK_GE(MaxBottomBlobs(), bottom.size())
          << type() << " Layer takes at most " << MaxBottomBlobs()
          << " bottom blob(s) as input.";
    }
    if (ExactNumTopBlobs() >= 0) {
      CHECK_EQ(ExactNumTopBlobs(), top.size())
          << type() << " Layer produces " << ExactNumTopBlobs()
          << " top blob(s) as output.";
    }
    if (MinTopBlobs() >= 0) {
      CHECK_LE(MinTopBlobs(), top.size())
          << type() << " Layer produces at least " << MinTopBlobs()
          << " top blob(s) as output.";
    }
    if (MaxTopBlobs() >= 0) {
      CHECK_GE(MaxTopBlobs(), top.size())
          << type() << " Layer produces at most " << MaxTopBlobs()
          << " top blob(s) as output.";
    }
    if (EqualNumBottomTopBlobs()) {
      CHECK_EQ(bottom.size(), top.size())
          << type() << " Layer produces one top blob as output for each "
          << "bottom blob input.";
    }
  }

SetLossWeights 函數

設置loss的權重,它是把計算出來的top diff更新進來,就是每次反向傳播時候更新權重用的

  /**
   * Called by SetUp to initialize the weights associated with any top blobs in
   * the loss function. Store non-zero loss weights in the diff blob.
   */
  inline void SetLossWeights(const vector<Blob<Dtype>*>& top) {
    const int num_loss_weights = layer_param_.loss_weight_size();
    if (num_loss_weights) {
      CHECK_EQ(top.size(), num_loss_weights) << "loss_weight must be "
          "unspecified or specified once per top blob.";
      for (int top_id = 0; top_id < top.size(); ++top_id) {
        const Dtype loss_weight = layer_param_.loss_weight(top_id);
        if (loss_weight == Dtype(0)) { continue; }
        this->set_loss(top_id, loss_weight);
        const int count = top[top_id]->count();
        Dtype* loss_multiplier = top[top_id]->mutable_cpu_diff();
        caffe_set(count, loss_weight, loss_multiplier);
      }
    }
  }

可以看到它是從top[top_id]->mutable_cpu_diff()這裏面讀取的殘差,這個就是caffe存中間結果的地方,讀出來以後就更新到這一層就作爲永久的權重了,直到下次更新改變。

Forward 函數

layer這個類結束之後它也實現了基本的Forward函數

// Forward and backward wrappers. You should implement the cpu and
// gpu specific implementations instead, and should not change these
// functions.

這裏註釋也解釋了一下,這裏還是隻打了個基本框架,你還是需要自己實現一個cpu或者gpu的實現來實現他的過程,但是他的這個基類函數千萬不要動。
來看它這個函數

  Dtype loss = 0;
  Reshape(bottom, top);
  switch (Caffe::mode())

首先reshape了top的大小,但是爲什麼傳入了bottom呢,因爲top的大小跟bottom是無關的,但是緩存區的大小是根據bottom來計算的。就比如bottom大小是3,top是5,那麼緩存區就應該是3*5=15.

  case Caffe::CPU:
    Forward_cpu(bottom, top);
    for (int top_id = 0; top_id < top.size(); ++top_id) {
      if (!this->loss(top_id)) { continue; }
      const int count = top[top_id]->count();
      const Dtype* data = top[top_id]->cpu_data();
      const Dtype* loss_weights = top[top_id]->cpu_diff();
      loss += caffe_cpu_dot(count, data, loss_weights);
    }
    break;

緊接着調用Forward_cpu這個函數,完了之後再把殘差算出來。算的方法就是用這次新計算出來的殘差點乘top層的data求和。

  case Caffe::GPU:
    Forward_gpu(bottom, top);
#ifndef CPU_ONLY
    for (int top_id = 0; top_id < top.size(); ++top_id) {
      if (!this->loss(top_id)) { continue; }
      const int count = top[top_id]->count();
      const Dtype* data = top[top_id]->gpu_data();
      const Dtype* loss_weights = top[top_id]->gpu_diff();
      Dtype blob_loss = 0;
      caffe_gpu_dot(count, data, loss_weights, &blob_loss);
      loss += blob_loss;
    }
#endif
    break;
  default:
    LOG(FATAL) << "Unknown caffe mode.";
  }
  return loss;
}

如果用了gpu的話也一樣,只不過調用的是Forward_gpu這個函數,最後返回總的loss就可以了。

Backward 函數

已經前傳完之後還實現了反傳函數

template <typename Dtype>
inline void Layer<Dtype>::Backward(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
  switch (Caffe::mode()) {
  case Caffe::CPU:
    Backward_cpu(top, propagate_down, bottom);
    break;
  case Caffe::GPU:
    Backward_gpu(top, propagate_down, bottom);
    break;
  default:
    LOG(FATAL) << "Unknown caffe mode.";
  }
}

這個反傳函數也是一個大綱,它直接調用了Backward_cpu或者Backward_gpu就完了,因爲反向傳播每一層的差別太大了,所以他就實現的非常抽象了,全權交給子類來實現。

ToProto 函數

// Serialize LayerParameter to protocol buffer
template <typename Dtype>
void Layer<Dtype>::ToProto(LayerParameter* param, bool write_diff) {
  param->Clear();
  param->CopyFrom(layer_param_);
  param->clear_blobs();
  for (int i = 0; i < blobs_.size(); ++i) {
    blobs_[i]->ToProto(param->add_blobs(), write_diff);
  }
}

這個函數就是把blob寫道protobuf裏面

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章