這兩個類是caffe框架的基石,從名字上就看得出來,深度學習就是圍繞這兩個東西展開的,還是從代碼去看具體實現。
1.layer
layer類有五大種類,每個種類裏又有詳細按作用區分,但全是從一個基類Layer繼承過來,下面是具體的五類
Data Layers
Common Layers
Activation / Neuron LayersLoss Layers
Vision Layers
基類layer裏的主要成員變量和函數,結合caffe的英文註釋看。
protected:
/** The protobuf that stores the layer parameters */
LayerParameter layer_param_;
/** The phase: TRAIN or TEST */
Phase phase_;
/** The vector that stores the learnable parameters as a set of blobs. */
vector<shared_ptr<Blob<Dtype> > > blobs_;//blobs_[0]是weights,blobs_[1]是bias
/** Vector indicating whether to compute the diff of each param blob. */
vector<bool> param_propagate_down_;//是否根據反饋更新
/** The vector that indicates whether each top blob has a non-zero weight in
* the objective function. */
vector<Dtype> loss_;//這個應該是隻有最後的softmax之類的層纔會有,每個輸出對應的具體loss值
/** Device context */
DeviceContext *device_context_;
/** @brief Using the CPU device, compute the layer output. */
virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) = 0;
/**
* @brief Using the GPU device, compute the layer output.
* Fall back to Forward_cpu() if unavailable.
*/
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
// LOG(WARNING) << "Using CPU code as backup.";
Forward_cpu(bottom, top);
}
/**
* @brief Using the CPU device, compute the gradients for any parameters and
* for the bottom blobs if propagate_down is true.
*/
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) = 0;
/**
* @brief Using the GPU device, compute the gradients for any parameters and
* for the bottom blobs if propagate_down is true.
* Fall back to Backward_cpu() if unavailable.
*/
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
// LOG(WARNING) << "Using CPU code as backup.";
Backward_cpu(top, propagate_down, bottom);
}
/**
* Called by the parent Layer's SetUp to check that the number of bottom
* and top Blobs provided as input match the expected numbers specified by
* the {ExactNum,Min,Max}{Bottom,Top}Blobs() functions.
*/
virtual void CheckBlobCounts(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
<span style="white-space:pre"> </span>//就是在檢查bottom和top的size是否正確,具體代碼就刪掉了
}
/**
* Called by SetUp to initialize the weights associated with any top blobs in
* the loss function. Store non-zero loss weights in the diff blob.
*/
inline void SetLossWeights(const vector<Blob<Dtype>*>& top) {
const int num_loss_weights = layer_param_.loss_weight_size();
if (num_loss_weights) {
CHECK_EQ(top.size(), num_loss_weights) << "loss_weight must be "
"unspecified or specified once per top blob.";
for (int top_id = 0; top_id < top.size(); ++top_id) {
const Dtype loss_weight = layer_param_.loss_weight(top_id);
if (loss_weight == Dtype(0)) {continue;}
this->set_loss(top_id, loss_weight);
const int count = top[top_id]->count();
Dtype* loss_multiplier = top[top_id]->mutable_cpu_diff();
caffe_set(count, loss_weight, loss_multiplier);
}
}
}
SetUp函數的具體實現
/**
* @brief Implements common layer setup functionality.
*
* @param bottom the preshaped input blobs
* @param top
* the allocated but unshaped output blobs, to be shaped by Reshape
*
* Checks that the number of bottom and top blobs is correct.
* Calls LayerSetUp to do special layer setup for individual layer types,
* followed by Reshape to set up sizes of top blobs and internal buffers.
* Sets up the loss weight multiplier blobs for any non-zero loss weights.
* This method may not be overridden.
*/
void SetUp(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
CheckBlobCounts(bottom, top);//檢查size大小
LayerSetUp(bottom, top);//調用子類實現的接口<pre name="code" class="cpp"><pre name="code" class="cpp"> //這裏多說一下,以BaseConvolutionLayer的LayerSetUp爲例,主要做了兩件事,(1)根據layer_param_設置pad size、kernel size等等 (2)如果blobs_沒初始化(size==0),那麼用filler去填充blobs_(參數在layer_param_裏)
<pre name="code" class="cpp"> Reshape(bottom,top);//根據已經分配好、shape好的bottom去shape分配好的top</span>
SetLossWeights(top);//設置loss(僅爲不爲0的blob設置)
}
2.net
基本參數,看註釋就行了
/// @brief The network name
string name_;
/// @brief The phase: TRAIN or TEST
Phase phase_;
/// @brief Individual layers in the net
vector<shared_ptr<Layer<Dtype> > > layers_;
vector<string> layer_names_;
map<string, int> layer_names_index_;
vector<bool> layer_need_backward_;
/// @brief the blobs storing intermediate results between the layer.
vector<shared_ptr<Blob<Dtype> > > blobs_;
vector<string> blob_names_;
map<string, int> blob_names_index_;
vector<bool> blob_need_backward_;
/// bottom_vecs stores the vectors containing the input for each layer.
/// They don't actually host the blobs (blobs_ does), so we simply store
/// pointers.
vector<vector<Blob<Dtype>*> > bottom_vecs_;
vector<vector<int> > bottom_id_vecs_;
vector<vector<bool> > bottom_need_backward_;
/// top_vecs stores the vectors containing the output for each layer
vector<vector<Blob<Dtype>*> > top_vecs_;
vector<vector<int> > top_id_vecs_;
其它的函數最主要就是前向反饋,這不用寫了,搞懂Init()就行了。
直接摘一段這函數的介紹,代碼不貼了(出自http://blog.csdn.net/u014114990/article/details/47415051)
Init(const NetParameter& in_param)
功能:初始化網絡
輸入:NetParameter& in_param
輸出:無
步驟:
<1> 調用InsertSplits()函數從in_param讀入新網絡到param
<2> 定義name_,blob_name_to_idx,available_blobs,num_layers
<3> param.input_size()返回輸入層blob的個數;
param.input(i)表示第i個blob的名字;
param.layers_size()返回網絡的層數。
<4> 對每一個輸入層的blob:
- 產生一塊和當前blob一樣大的空間 e.g. imput_dim=[12 55 66 39 20 24 48 64]表示第一個blob的四個維數爲 12 55 66 39,第二個爲 20 24 48 64 接着blob_pointer指向這塊空間
- blob_pointer壓到blobs_中
vector<shared_ptr<Blob<Dtype>>> blobs_
- blob_name壓到blob_names_中
vector<string> blob_names_
- param.force_backward()壓到blob_need_backward_中
vector<bool> blob_need_backward_
- i 壓到 net_input_blob_indices_中 net_input_blob_indices_ -> vector
- blob_pointer.get() 壓到 net_input_blobs_中
注意與blobs_的區別vector<shared_ptr<Blob<Dtype>>> blobs_
vector<Blob<Dtype>*> net_input_blobs_
shared_ptr類型的參數調用.get()則得到Blob*類型 map<string, int> blob_name_to_idx
- 初始化爲輸入層的每個blob的名字
set<string> available_blobs
- 計算所需內存
memory_used += blob_pointer->count()
<5> 存每一層的輸入blob指針 vector<vector<Blob<Dtype>*> > bottom_vecs_
存每一層輸入(bottom)的id vector<vector<int> > bottom_id_vecs_
存每一層輸出(top)的blob vector<vector<Blob<Dtype>*> > top_vecs_
用網絡的層數param.layers_size()去初始化上面四個變量 vector<vector<int> > top_id_vecs_
<6> 對第i層(很大的一個for循環):
- param.layers(i)返回的是關於第當前層的參數:
layer_param = param.layers(i)
- 把當前層的參數轉換爲
shared_ptr<Layer<Dtype>>
,並壓入到layers_中 - 把當前層的名字壓入到layer_names_:
vector<string> layer_names_
- 判斷當前層是否需要反饋
need_backward = param.force_backward()
-
下面開始產生當前層:分爲處理bottom的blob和top的blob兩個步驟
對第j個bottom的blob:- layer_param.bottom_size()存的是當前層的輸入blob數量
- layer_param.bottom(j)存的是第j個輸入blob的名字
- 讀取當前blob的id,其中blob_name_to_idx在輸入層初始化過了
blob_name_to_idx[blob_name] = i
- 輸出當前blob的名字
- 存入第j個輸入blob的指針
bottom_vecs_[i].push_back(blobs_[blob_id].get())
- 存入第j個輸入blob的id
bottom_id_vecs_[i].push_back(blob_id)
- 更新need_backward
- 從available_blobs中刪除第j個blob的名字
對第j個top的blob:
- layer_param.top_size()存的是當前層的輸出blob數量
- layer_param.top(j)存的是第j個輸出blob的名字
- 判斷是否進行同址計算
- 輸出當前blob的名字
- 定義一塊新的blob空間,用blob_pointer指向這塊空間
- 把這個指針存入到blobs_中
- 把blob_name、force_backward、idx存入對應的容器中
- 向available_blobs插入當前blob的名字
- top_vecs_[i]對於第i層,插入當前blob的指針
- top_id_vecs_[i]對於第i層,插入當前blob的id
- 輸出當前層位於top的blob的信息
- 計算所需內存
- 判斷當前層i是否需要backward
<7> 所有名字在available_blobs中的blob爲當前層的輸出blob,
存入net_output_blobs_
中
<8> 建立每個blob的name和index的對應關係map:blob_names_index_
<9> 建立每個層的name和index的對應關係map:layer_names_index_
<10> 調用GetLearningRateAndWeightDecay函數
3.solver
這個比較少,懶得在開一篇,寫在這吧
(1)創建訓練網絡、對網絡進行評估;
(2) 調用forward/backward迭代優化和更新參數;
(3) 定期評估測試網絡;
(1) 調用網絡forward計算輸出和loss;
(2) 調用網絡backward計算梯度;
(3) 按照solver方法,採用漸變進行參數更新;
(4) 按照學習率、歷史和方法更新solver狀態。