caffe代碼閱讀:layer類和net類

這兩個類是caffe框架的基石,從名字上就看得出來,深度學習就是圍繞這兩個東西展開的,還是從代碼去看具體實現。


1.layer

layer類有五大種類,每個種類裏又有詳細按作用區分,但全是從一個基類Layer繼承過來,下面是具體的五類

Data Layers

Common Layers

Activation / Neuron Layers

Loss Layers

Vision Layers


基類layer裏的主要成員變量和函數,結合caffe的英文註釋看。

 protected:
  /** The protobuf that stores the layer parameters */
  LayerParameter layer_param_;
  /** The phase: TRAIN or TEST */
  Phase phase_;
  /** The vector that stores the learnable parameters as a set of blobs. */
  vector<shared_ptr<Blob<Dtype> > > blobs_;//blobs_[0]是weights,blobs_[1]是bias
  /** Vector indicating whether to compute the diff of each param blob. */
  vector<bool> param_propagate_down_;//是否根據反饋更新

  /** The vector that indicates whether each top blob has a non-zero weight in
   *  the objective function. */
  vector<Dtype> loss_;//這個應該是隻有最後的softmax之類的層纔會有,每個輸出對應的具體loss值

  /** Device context */
  DeviceContext *device_context_;

  /** @brief Using the CPU device, compute the layer output. */
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
                           const vector<Blob<Dtype>*>& top) = 0;
  /**
   * @brief Using the GPU device, compute the layer output.
   *        Fall back to Forward_cpu() if unavailable.
   */
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
                           const vector<Blob<Dtype>*>& top) {
    // LOG(WARNING) << "Using CPU code as backup.";
    Forward_cpu(bottom, top);
  }

  /**
   * @brief Using the CPU device, compute the gradients for any parameters and
   *        for the bottom blobs if propagate_down is true.
   */
  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
                            const vector<bool>& propagate_down,
                            const vector<Blob<Dtype>*>& bottom) = 0;
  /**
   * @brief Using the GPU device, compute the gradients for any parameters and
   *        for the bottom blobs if propagate_down is true.
   *        Fall back to Backward_cpu() if unavailable.
   */
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
                            const vector<bool>& propagate_down,
                            const vector<Blob<Dtype>*>& bottom) {
    // LOG(WARNING) << "Using CPU code as backup.";
    Backward_cpu(top, propagate_down, bottom);
  }

  /**
   * Called by the parent Layer's SetUp to check that the number of bottom
   * and top Blobs provided as input match the expected numbers specified by
   * the {ExactNum,Min,Max}{Bottom,Top}Blobs() functions.
   */
  virtual void CheckBlobCounts(const vector<Blob<Dtype>*>& bottom,
                               const vector<Blob<Dtype>*>& top) {
<span style="white-space:pre">	</span>//就是在檢查bottom和top的size是否正確,具體代碼就刪掉了
  }

  /**
   * Called by SetUp to initialize the weights associated with any top blobs in
   * the loss function. Store non-zero loss weights in the diff blob.
   */
  inline void SetLossWeights(const vector<Blob<Dtype>*>& top) {
    const int num_loss_weights = layer_param_.loss_weight_size();
    if (num_loss_weights) {
      CHECK_EQ(top.size(), num_loss_weights) << "loss_weight must be "
      "unspecified or specified once per top blob.";
      for (int top_id = 0; top_id < top.size(); ++top_id) {
        const Dtype loss_weight = layer_param_.loss_weight(top_id);
        if (loss_weight == Dtype(0)) {continue;}
        this->set_loss(top_id, loss_weight);
        const int count = top[top_id]->count();
        Dtype* loss_multiplier = top[top_id]->mutable_cpu_diff();
        caffe_set(count, loss_weight, loss_multiplier);
      }
    }
  }

SetUp函數的具體實現

  /**
   * @brief Implements common layer setup functionality.
   *
   * @param bottom the preshaped input blobs
   * @param top
   *     the allocated but unshaped output blobs, to be shaped by Reshape
   *
   * Checks that the number of bottom and top blobs is correct.
   * Calls LayerSetUp to do special layer setup for individual layer types,
   * followed by Reshape to set up sizes of top blobs and internal buffers.
   * Sets up the loss weight multiplier blobs for any non-zero loss weights.
   * This method may not be overridden.
   */  
void SetUp(const vector<Blob<Dtype>*>& bottom,
             const vector<Blob<Dtype>*>& top) {
    CheckBlobCounts(bottom, top);//檢查size大小
    LayerSetUp(bottom, top);//調用子類實現的接口<pre name="code" class="cpp"><pre name="code" class="cpp">    //這裏多說一下,以BaseConvolutionLayer的LayerSetUp爲例,主要做了兩件事,(1)根據layer_param_設置pad size、kernel size等等   (2)如果blobs_沒初始化(size==0),那麼用filler去填充blobs_(參數在layer_param_裏)

<pre name="code" class="cpp">    Reshape(bottom,top);//根據已經分配好、shape好的bottom去shape分配好的top</span>
    SetLossWeights(top);//設置loss(僅爲不爲0的blob設置)
  }




2.net

基本參數,看註釋就行了

  /// @brief The network name
  string name_;
  /// @brief The phase: TRAIN or TEST
  Phase phase_;
  /// @brief Individual layers in the net
  vector<shared_ptr<Layer<Dtype> > > layers_;
  vector<string> layer_names_;
  map<string, int> layer_names_index_;
  vector<bool> layer_need_backward_;
  /// @brief the blobs storing intermediate results between the layer.
  vector<shared_ptr<Blob<Dtype> > > blobs_;
  vector<string> blob_names_;
  map<string, int> blob_names_index_;
  vector<bool> blob_need_backward_;
  /// bottom_vecs stores the vectors containing the input for each layer.
  /// They don't actually host the blobs (blobs_ does), so we simply store
  /// pointers.
  vector<vector<Blob<Dtype>*> > bottom_vecs_;
  vector<vector<int> > bottom_id_vecs_;
  vector<vector<bool> > bottom_need_backward_;
  /// top_vecs stores the vectors containing the output for each layer
  vector<vector<Blob<Dtype>*> > top_vecs_;
  vector<vector<int> > top_id_vecs_;

其它的函數最主要就是前向反饋,這不用寫了,搞懂Init()就行了。

直接摘一段這函數的介紹,代碼不貼了(出自http://blog.csdn.net/u014114990/article/details/47415051)


Init(const NetParameter& in_param) 
功能:初始化網絡 
輸入:NetParameter& in_param 
輸出:無 
步驟: 
<1> 調用InsertSplits()函數從in_param讀入新網絡到param 
<2> 定義name_,blob_name_to_idx,available_blobs,num_layers 
<3> param.input_size()返回輸入層blob的個數; 
param.input(i)表示第i個blob的名字; 
param.layers_size()返回網絡的層數。 
<4> 對每一個輸入層的blob:

  1. 產生一塊和當前blob一樣大的空間 e.g. imput_dim=[12 55 66 39 20 24 48 64]表示第一個blob的四個維數爲 12 55 66 39,第二個爲 20 24 48 64 接着blob_pointer指向這塊空間
  2. blob_pointer壓到blobs_中 vector<shared_ptr<Blob<Dtype>>> blobs_
  3. blob_name壓到blob_names_中 vector<string> blob_names_
  4. param.force_backward()壓到blob_need_backward_中 
    vector<bool> blob_need_backward_
  5. i 壓到 net_input_blob_indices_中 net_input_blob_indices_ -> vector
  6. blob_pointer.get() 壓到 net_input_blobs_中 
    注意與blobs_的區別 
    vector<shared_ptr<Blob<Dtype>>> blobs_ 
    vector<Blob<Dtype>*> net_input_blobs_ 
    shared_ptr類型的參數調用.get()則得到Blob*類型
  7. map<string, int> blob_name_to_idx
  8. 初始化爲輸入層的每個blob的名字 set<string> available_blobs
  9. 計算所需內存 memory_used += blob_pointer->count()

<5> 存每一層的輸入blob指針 vector<vector<Blob<Dtype>*> > bottom_vecs_ 
存每一層輸入(bottom)的id vector<vector<int> > bottom_id_vecs_ 
存每一層輸出(top)的blob vector<vector<Blob<Dtype>*> > top_vecs_ 
用網絡的層數param.layers_size()去初始化上面四個變量 
vector<vector<int> > top_id_vecs_ 
<6> 對第i層(很大的一個for循環):

  1. param.layers(i)返回的是關於第當前層的參數: 
    layer_param = param.layers(i)
  2. 把當前層的參數轉換爲shared_ptr<Layer<Dtype>>,並壓入到layers_中
  3. 把當前層的名字壓入到layer_names_:vector<string> layer_names_
  4. 判斷當前層是否需要反饋 need_backward = param.force_backward()
  5. 下面開始產生當前層:分爲處理bottom的blob和top的blob兩個步驟 
    對第j個bottom的blob:

    • layer_param.bottom_size()存的是當前層的輸入blob數量
    • layer_param.bottom(j)存的是第j個輸入blob的名字
    • 讀取當前blob的id,其中blob_name_to_idx在輸入層初始化過了 
      blob_name_to_idx[blob_name] = i
    • 輸出當前blob的名字
    • 存入第j個輸入blob的指針bottom_vecs_[i].push_back(blobs_[blob_id].get())
    • 存入第j個輸入blob的id bottom_id_vecs_[i].push_back(blob_id)
    • 更新need_backward
    • 從available_blobs中刪除第j個blob的名字

    對第j個top的blob:

    • layer_param.top_size()存的是當前層的輸出blob數量
    • layer_param.top(j)存的是第j個輸出blob的名字
    • 判斷是否進行同址計算
    • 輸出當前blob的名字
    • 定義一塊新的blob空間,用blob_pointer指向這塊空間
    • 把這個指針存入到blobs_中
    • 把blob_name、force_backward、idx存入對應的容器中
    • 向available_blobs插入當前blob的名字
    • top_vecs_[i]對於第i層,插入當前blob的指針
    • top_id_vecs_[i]對於第i層,插入當前blob的id
  6. 輸出當前層位於top的blob的信息
  7. 計算所需內存
  8. 判斷當前層i是否需要backward

<7> 所有名字在available_blobs中的blob爲當前層的輸出blob, 
存入net_output_blobs_中 
<8> 建立每個blob的name和index的對應關係map:blob_names_index_ 
<9> 建立每個層的name和index的對應關係map:layer_names_index_ 
<10> 調用GetLearningRateAndWeightDecay函數

3.solver

這個比較少,懶得在開一篇,寫在這吧


Solver是整個訓練過程的解決方案,它的作用包括:

     (1)創建訓練網絡、對網絡進行評估;

     (2) 調用forward/backward迭代優化和更新參數;

     (3) 定期評估測試網絡;

Solver的每一次迭代執行:

      (1) 調用網絡forward計算輸出和loss;

      (2) 調用網絡backward計算梯度;

      (3) 按照solver方法,採用漸變進行參數更新;

      (4) 按照學習率、歷史和方法更新solver狀態。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章