距離筆者接觸深度學習已經將近半年了,在這段時間中,筆者最先接觸的是lenet網絡,然後就學習了2015-2016年非常火爆的fast-rcnn與faster-rcnn,到最近自己利用深度學習搞事情,筆者的最大感受是,經過一些例子的實踐,已經對深度學習有了大概的瞭解,但是離熟練上手還有很可觀的距離,這時,筆者不由得想起來一句老話:read the fxxx source code。因此,筆者開始學習caffe的源碼,並打算將學習心得通過博文與大家分享。
在寫作博文的過程中,筆者作爲一個初級菜鳥,難免會有一些錯誤與疏漏,歡迎各位讀者朋友留言指正,筆者一定做到熱情回覆,積極討論,各位的支持與熱忱正是鼓舞筆者的動力,下面開始正篇!
在初學者使用caffe訓練神經網絡的過程中,無論是參照現有的例子還是獨立配置,往往都有以下幾步:準備數據集->配置網絡結構->配置網絡訓練參數->對網絡進行訓練->使用模型接口文件調用與使用訓練完成的深度神經網絡。那麼在以上的過程中,由於caffe封裝的規範性與層次性,我們是對caffe源碼不完全瞭解的,而涉及更高層的任務,比如需要自己實現一個網絡層,或者需要實現自己定義的損失函數,只熟悉caffe配置就不夠了,那麼,需要的是對caffe結構以及源碼的瞭解,才能更深入地理解與利用caffe。那麼,我們應該從什麼地方開始入手掌握caffe呢?筆者建議從caffe.cpp開始出發,該文件的位置位於caffe目錄下的./tools/caffe.cpp。每次我們啓動腳本文件或者從命令行輸入命令開始訓練深度神經網絡時,總是從這個文件開始對命令進行解析並執行,因此,caffe.cpp正是迷宮的入口,因此,筆者打算從caffe.cpp開始解析,下面貼出先貼出代碼並給出筆者的註釋。
#ifdef WITH_PYTHON_LAYER
#include "boost/python.hpp"
namespace bp = boost::python;
#endif
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <cstring>
#include <map>
#include <string>
#include <vector>
#include "boost/algorithm/string.hpp"
#include "caffe/caffe.hpp"
#include "caffe/util/signal_handler.h"
using caffe::Blob;
using caffe::Caffe;
using caffe::Net;
using caffe::Layer;
using caffe::Solver;
using caffe::shared_ptr;
using caffe::string;
using caffe::Timer;
using caffe::vector;
using std::ostringstream;
DEFINE_string(gpu, "",
"Optional; run in GPU mode on given device IDs separated by ','."
"Use '-gpu all' to run on all available GPUs. The effective training "
"batch size is multiplied by the number of devices.");
DEFINE_string(solver, "",
"The solver definition protocol buffer text file.");
DEFINE_string(model, "",
"The model definition protocol buffer text file.");
DEFINE_string(phase, "",
"Optional; network phase (TRAIN or TEST). Only used for 'time'.");
DEFINE_int32(level, 0,
"Optional; network level.");
DEFINE_string(stage, "",
"Optional; network stages (not to be confused with phase), "
"separated by ','.");
DEFINE_string(snapshot, "",
"Optional; the snapshot solver state to resume training.");
DEFINE_string(weights, "",
"Optional; the pretrained weights to initialize finetuning, "
"separated by ','. Cannot be set simultaneously with snapshot.");
DEFINE_int32(iterations, 50,
"The number of iterations to run.");
DEFINE_string(sigint_effect, "stop",
"Optional; action to take when a SIGINT signal is received: "
"snapshot, stop or none.");
DEFINE_string(sighup_effect, "snapshot",
"Optional; action to take when a SIGHUP signal is received: "
"snapshot, stop or none.");
// A simple registry for caffe commands.
typedef int (*BrewFunction)();
typedef std::map<caffe::string, BrewFunction> BrewMap;
BrewMap g_brew_map;
/*下面的#define RegisterBrewFunction(func)宏定義的作用是將參數func轉化爲字符串,並存儲在g_brew_map
這個容器中,而func對應了四種值:train/test/time/device_query,這四個函數標誌了四個功能的入口,請大家
留意下方對應的那四個函數的尾部就可得知。*/
#define RegisterBrewFunction(func) \
namespace { \
class __Registerer_##func { \
public: /* NOLINT */ \
__Registerer_##func() { \
g_brew_map[#func] = &func; \
} \
}; \
__Registerer_##func g_registerer_##func; \
}
static BrewFunction GetBrewFunction(const caffe::string& name) {
if (g_brew_map.count(name)) {
return g_brew_map[name];//這裏返回了容器中與參數名稱匹配的函數入口。
} else {
LOG(ERROR) << "Available caffe actions:";
for (BrewMap::iterator it = g_brew_map.begin();
it != g_brew_map.end(); ++it) {
LOG(ERROR) << "\t" << it->first;
}
LOG(FATAL) << "Unknown action: " << name;
return NULL; // not reachable, just to suppress old compiler warnings.
}
}
// Parse GPU ids or use all available devices
static void get_gpus(vector<int>* gpus) {//在這裏查詢gpu的信息
if (FLAGS_gpu == "all") {
int count = 0;
#ifndef CPU_ONLY
CUDA_CHECK(cudaGetDeviceCount(&count));
#else
NO_GPU;
#endif
for (int i = 0; i < count; ++i) {
gpus->push_back(i);
}
} else if (FLAGS_gpu.size()) {
vector<string> strings;
boost::split(strings, FLAGS_gpu, boost::is_any_of(","));
for (int i = 0; i < strings.size(); ++i) {
gpus->push_back(boost::lexical_cast<int>(strings[i]));
}
} else {
CHECK_EQ(gpus->size(), 0);
}
}
// Parse phase from flags
caffe::Phase get_phase_from_flags(caffe::Phase default_value) {
if (FLAGS_phase == "")
return default_value;
if (FLAGS_phase == "TRAIN")
return caffe::TRAIN;
if (FLAGS_phase == "TEST")
return caffe::TEST;
LOG(FATAL) << "phase must be \"TRAIN\" or \"TEST\"";
return caffe::TRAIN; // Avoid warning
}
// Parse stages from flags
vector<string> get_stages_from_flags() {
vector<string> stages;
boost::split(stages, FLAGS_stage, boost::is_any_of(","));
return stages;
}
// caffe commands to call by
// caffe <command> <args>
//
// To add a command, define a function "int command()" and register it with
// RegisterBrewFunction(action);
// Device Query: show diagnostic information for a GPU device.
int device_query() {
LOG(INFO) << "Querying GPUs " << FLAGS_gpu;
vector<int> gpus;
get_gpus(&gpus);
for (int i = 0; i < gpus.size(); ++i) {
caffe::Caffe::SetDevice(gpus[i]);
caffe::Caffe::DeviceQuery();
}
return 0;
}
RegisterBrewFunction(device_query);//如上文所說,RegisterBrewFunction將此函數入口添加進了g_brew_map
// Load the weights from the specified caffemodel(s) into the train and
// test nets.
void CopyLayers(caffe::Solver<float>* solver, const std::string& model_list) {
std::vector<std::string> model_names;
boost::split(model_names, model_list, boost::is_any_of(",") );
for (int i = 0; i < model_names.size(); ++i) {
LOG(INFO) << "Finetuning from " << model_names[i];
solver->net()->CopyTrainedLayersFrom(model_names[i]);
for (int j = 0; j < solver->test_nets().size(); ++j) {
solver->test_nets()[j]->CopyTrainedLayersFrom(model_names[i]);
}
}
}
// Translate the signal effect the user specified on the command-line to the
// corresponding enumeration.
caffe::SolverAction::Enum GetRequestedAction(
const std::string& flag_value) {
if (flag_value == "stop") {
return caffe::SolverAction::STOP;
}
if (flag_value == "snapshot") {
return caffe::SolverAction::SNAPSHOT;
}
if (flag_value == "none") {
return caffe::SolverAction::NONE;
}
LOG(FATAL) << "Invalid signal effect \""<< flag_value << "\" was specified";
}
// Train / Finetune a model.
//分析train()函數
int train() {
//train()函數首先檢測FLAGS_solver.size()是否爲零,爲零的話表示用戶沒有傳入solver文件
CHECK_GT(FLAGS_solver.size(), 0) << "Need a solver definition to train.";
/*然後做的一件事就是檢查參數裏面--weights和--snapshot有沒有同時出現,因爲--weights是
在從頭啓動訓練的時候需要的參數,表示對模型的finetune,而--snapshot表示的是繼續訓練模型,
這種情況對應於用戶之前暫停了模型訓練,現在繼續訓練。因此不再需要weight參數。*/
CHECK(!FLAGS_snapshot.size() || !FLAGS_weights.size())
<< "Give a snapshot to resume training or weights to finetune "
"but not both.";
vector<string> stages = get_stages_from_flags();
//下面兩行代碼是去獲取並解析用戶定義的solver.prototxt
caffe::SolverParameter solver_param;
caffe::ReadSolverParamsFromTextFileOrDie(FLAGS_solver, &solver_param);
solver_param.mutable_train_state()->set_level(FLAGS_level);
for (int i = 0; i < stages.size(); i++) {
solver_param.mutable_train_state()->add_stage(stages[i]);
}
/*下面是去查詢用戶配置的GPU信息,用戶可以在輸入命令行的時候配置gpu信息,也可以在solver.prototxt
文件中定義GPU信息,如果用戶在solver.prototxt裏面配置了GPU的id,則將該id寫入FLAGS_gpu中,如果用戶
只是說明了使用gpu模式,而沒有詳細指定使用的gpu的id,則將gpu的id默認爲0。*/
// If the gpus flag is not provided, allow the mode and device to be set
// in the solver prototxt.
if (FLAGS_gpu.size() == 0
&& solver_param.solver_mode() == caffe::SolverParameter_SolverMode_GPU) {
if (solver_param.has_device_id()) {
FLAGS_gpu = "" +
boost::lexical_cast<string>(solver_param.device_id());
} else { // Set default GPU if unspecified
FLAGS_gpu = "" + boost::lexical_cast<string>(0);
}
}
/*在以下部分覈驗gpu檢測結果,如果沒有gpu信息,那麼則使用cpu訓練,否則,就開始一些GPU訓練的初始化工作*/
vector<int> gpus;
get_gpus(&gpus);
if (gpus.size() == 0) {
LOG(INFO) << "Use CPU.";
Caffe::set_mode(Caffe::CPU);
} else {
ostringstream s;
for (int i = 0; i < gpus.size(); ++i) {
s << (i ? ", " : "") << gpus[i];
}
LOG(INFO) << "Using GPUs " << s.str();
#ifndef CPU_ONLY
cudaDeviceProp device_prop;
for (int i = 0; i < gpus.size(); ++i) {
cudaGetDeviceProperties(&device_prop, gpus[i]);
LOG(INFO) << "GPU " << gpus[i] << ": " << device_prop.name;
}
#endif
solver_param.set_device_id(gpus[0]);
Caffe::SetDevice(gpus[0]);
Caffe::set_mode(Caffe::GPU);
Caffe::set_solver_count(gpus.size());
}
caffe::SignalHandler signal_handler(
GetRequestedAction(FLAGS_sigint_effect),
GetRequestedAction(FLAGS_sighup_effect));
/*下面就開始構造網絡訓練器solver,調用SolverRegistry的CreateSolver函數得到一個solver,在初始化solver的過程中,
使用了之前解析好的用戶定義的solver.prototxt文件,solver負擔了整個網絡的訓練責任,詳細結構後話解析*/
shared_ptr<caffe::Solver<float> >
solver(caffe::SolverRegistry<float>::CreateSolver(solver_param));
solver->SetActionFunction(signal_handler.GetActionFunction());
/*在這裏查詢了一下用戶有沒有定義snapshot參數和weights參數,因爲如果定義了這兩個參數,代表用戶可能會希望從之前的
中斷訓練處繼續訓練或者借用其他模型初始化網絡,caffe在對兩個參數相關的內容進行處理時都要用到solver指針*/
if (FLAGS_snapshot.size()) {
LOG(INFO) << "Resuming from " << FLAGS_snapshot;
solver->Restore(FLAGS_snapshot.c_str());
} else if (FLAGS_weights.size()) {
CopyLayers(solver.get(), FLAGS_weights);
}
/*如果有不止一塊gpu參與訓練,那麼將開啓多gpu訓練模式*/
if (gpus.size() > 1) {
caffe::P2PSync<float> sync(solver, NULL, solver->param());
sync.Run(gpus);
} else {
LOG(INFO) << "Starting Optimization";
/*使用Solve()接口正式開始優化網絡*/
solver->Solve();
}
LOG(INFO) << "Optimization Done.";
return 0;
}
RegisterBrewFunction(train);
// Test: score a model.
int test() {
CHECK_GT(FLAGS_model.size(), 0) << "Need a model definition to score.";
CHECK_GT(FLAGS_weights.size(), 0) << "Need model weights to score.";
vector<string> stages = get_stages_from_flags();
// Set device id and mode
vector<int> gpus;
get_gpus(&gpus);
if (gpus.size() != 0) {
LOG(INFO) << "Use GPU with device ID " << gpus[0];
#ifndef CPU_ONLY
cudaDeviceProp device_prop;
cudaGetDeviceProperties(&device_prop, gpus[0]);
LOG(INFO) << "GPU device name: " << device_prop.name;
#endif
Caffe::SetDevice(gpus[0]);
Caffe::set_mode(Caffe::GPU);
} else {
LOG(INFO) << "Use CPU.";
Caffe::set_mode(Caffe::CPU);
}
// Instantiate the caffe net.
Net<float> caffe_net(FLAGS_model, caffe::TEST, FLAGS_level, &stages);
caffe_net.CopyTrainedLayersFrom(FLAGS_weights);
LOG(INFO) << "Running for " << FLAGS_iterations << " iterations.";
vector<int> test_score_output_id;
vector<float> test_score;
float loss = 0;
for (int i = 0; i < FLAGS_iterations; ++i) {
float iter_loss;
const vector<Blob<float>*>& result =
caffe_net.Forward(&iter_loss);
loss += iter_loss;
int idx = 0;
for (int j = 0; j < result.size(); ++j) {
const float* result_vec = result[j]->cpu_data();
for (int k = 0; k < result[j]->count(); ++k, ++idx) {
const float score = result_vec[k];
if (i == 0) {
test_score.push_back(score);
test_score_output_id.push_back(j);
} else {
test_score[idx] += score;
}
const std::string& output_name = caffe_net.blob_names()[
caffe_net.output_blob_indices()[j]];
LOG(INFO) << "Batch " << i << ", " << output_name << " = " << score;
}
}
}
loss /= FLAGS_iterations;
LOG(INFO) << "Loss: " << loss;
for (int i = 0; i < test_score.size(); ++i) {
const std::string& output_name = caffe_net.blob_names()[
caffe_net.output_blob_indices()[test_score_output_id[i]]];
const float loss_weight = caffe_net.blob_loss_weights()[
caffe_net.output_blob_indices()[test_score_output_id[i]]];
std::ostringstream loss_msg_stream;
const float mean_score = test_score[i] / FLAGS_iterations;
if (loss_weight) {
loss_msg_stream << " (* " << loss_weight
<< " = " << loss_weight * mean_score << " loss)";
}
LOG(INFO) << output_name << " = " << mean_score << loss_msg_stream.str();
}
return 0;
}
RegisterBrewFunction(test);
// Time: benchmark the execution time of a model.
int time() {
CHECK_GT(FLAGS_model.size(), 0) << "Need a model definition to time.";
caffe::Phase phase = get_phase_from_flags(caffe::TRAIN);
vector<string> stages = get_stages_from_flags();
// Set device id and mode
vector<int> gpus;
get_gpus(&gpus);
if (gpus.size() != 0) {
LOG(INFO) << "Use GPU with device ID " << gpus[0];
Caffe::SetDevice(gpus[0]);
Caffe::set_mode(Caffe::GPU);
} else {
LOG(INFO) << "Use CPU.";
Caffe::set_mode(Caffe::CPU);
}
// Instantiate the caffe net.
Net<float> caffe_net(FLAGS_model, phase, FLAGS_level, &stages);
// Do a clean forward and backward pass, so that memory allocation are done
// and future iterations will be more stable.
LOG(INFO) << "Performing Forward";
// Note that for the speed benchmark, we will assume that the network does
// not take any input blobs.
float initial_loss;
caffe_net.Forward(&initial_loss);
LOG(INFO) << "Initial loss: " << initial_loss;
LOG(INFO) << "Performing Backward";
caffe_net.Backward();
const vector<shared_ptr<Layer<float> > >& layers = caffe_net.layers();
const vector<vector<Blob<float>*> >& bottom_vecs = caffe_net.bottom_vecs();
const vector<vector<Blob<float>*> >& top_vecs = caffe_net.top_vecs();
const vector<vector<bool> >& bottom_need_backward =
caffe_net.bottom_need_backward();
LOG(INFO) << "*** Benchmark begins ***";
LOG(INFO) << "Testing for " << FLAGS_iterations << " iterations.";
Timer total_timer;
total_timer.Start();
Timer forward_timer;
Timer backward_timer;
Timer timer;
std::vector<double> forward_time_per_layer(layers.size(), 0.0);
std::vector<double> backward_time_per_layer(layers.size(), 0.0);
double forward_time = 0.0;
double backward_time = 0.0;
for (int j = 0; j < FLAGS_iterations; ++j) {
Timer iter_timer;
iter_timer.Start();
forward_timer.Start();
for (int i = 0; i < layers.size(); ++i) {
timer.Start();
layers[i]->Forward(bottom_vecs[i], top_vecs[i]);
forward_time_per_layer[i] += timer.MicroSeconds();
}
forward_time += forward_timer.MicroSeconds();
backward_timer.Start();
for (int i = layers.size() - 1; i >= 0; --i) {
timer.Start();
layers[i]->Backward(top_vecs[i], bottom_need_backward[i],
bottom_vecs[i]);
backward_time_per_layer[i] += timer.MicroSeconds();
}
backward_time += backward_timer.MicroSeconds();
LOG(INFO) << "Iteration: " << j + 1 << " forward-backward time: "
<< iter_timer.MilliSeconds() << " ms.";
}
LOG(INFO) << "Average time per layer: ";
for (int i = 0; i < layers.size(); ++i) {
const caffe::string& layername = layers[i]->layer_param().name();
LOG(INFO) << std::setfill(' ') << std::setw(10) << layername <<
"\tforward: " << forward_time_per_layer[i] / 1000 /
FLAGS_iterations << " ms.";
LOG(INFO) << std::setfill(' ') << std::setw(10) << layername <<
"\tbackward: " << backward_time_per_layer[i] / 1000 /
FLAGS_iterations << " ms.";
}
total_timer.Stop();
LOG(INFO) << "Average Forward pass: " << forward_time / 1000 /
FLAGS_iterations << " ms.";
LOG(INFO) << "Average Backward pass: " << backward_time / 1000 /
FLAGS_iterations << " ms.";
LOG(INFO) << "Average Forward-Backward: " << total_timer.MilliSeconds() /
FLAGS_iterations << " ms.";
LOG(INFO) << "Total Time: " << total_timer.MilliSeconds() << " ms.";
LOG(INFO) << "*** Benchmark ends ***";
return 0;
}
RegisterBrewFunction(time);
int main(int argc, char** argv) {
//主函數入口,首先進行gflags的一些初始化,設置並打印版本信息,用戶信息等。
// Print output to stderr (while still logging).
FLAGS_alsologtostderr = 1;
// Set version
gflags::SetVersionString(AS_STRING(CAFFE_VERSION));
// Usage message.
gflags::SetUsageMessage("command line brew\n"
"usage: caffe <command> <args>\n\n"
"commands:\n"
" train train or finetune a model\n"
" test score a model\n"
" device_query show GPU diagnostic information\n"
" time benchmark model execution time");
// Run tool or show usage.
/*下面進行的是對gflags和glog的一些初始化,GlobalInit函數定義在了caffe安裝目錄./src/caffe/common.cpp中,
在下面貼出該函數的代碼
void GlobalInit(int* pargc, char*** pargv) {
// Google flags.
::gflags::ParseCommandLineFlags(pargc, pargv, true);
// Google logging.
::google::InitGoogleLogging(*(pargv)[0]);
// Provide a backtrace on segfault.
::google::InstallFailureSignalHandler();
}在該函數中,ParseCommandLineFlags函數對gflags的參數進行了初始化,InitGoogleLogging函數初始化谷歌日誌系統,
而InstallFailureSignalHandler註冊信號處理句柄*/
caffe::GlobalInit(&argc, &argv);
if (argc == 2) {
#ifdef WITH_PYTHON_LAYER
try {
#endif
/*上面完成了一些初始化工作,而真正的程序入口就是下面這個GetBrewFunction函數,這個函數的主要功能爲去查找g_brew_map容器,
並在其中找到與caffe::string(argv[1])相匹配的函數並返回該函數的入口,那麼,g_brew_map容器裏面裝的是什麼呢?這個時候就要
看看上面的#define RegisterBrewFunction(func)。*/
/*在看完#define RegisterBrewFunction(func)之後,我們轉向上文閱讀一下GetBrewFunction的定義*/
return GetBrewFunction(caffe::string(argv[1]))();
#ifdef WITH_PYTHON_LAYER
} catch (bp::error_already_set) {
PyErr_Print();
return 1;
}
#endif
} else {
gflags::ShowUsageWithFlagsRestrict(argv[0], "tools/caffe");
}
}
以上的代碼段給出了caffe.cpp的大致結構,我們可以看到,caffe.cpp文件對我們用戶自定義的過各種參數文件進行了提取,初始化了各種文件,並提供了最重要的開始訓練的接口,總體而言,代碼的精幹部分抽出來是以下結構:
mian函數->GetBrewFunction函數->train函數
其中,首先從main函數出發,main函數裏面在進行簡短的對gflags與glog進行初始化以後,就開始進入了GetBrewFunction環節,在這個環節中,caffe要弄明白用戶是要幹什麼?是要進行網絡的訓練,還是網絡的測試,還是時間的測試或者對服務的查詢,而搞清楚了用戶想要幹什麼之後,就可以返回相應的函數接口進行操作了,在以上的代碼中我們分析了最重要的train函數,裏面進行了一系列對網絡訓練的初始化,並按照用戶自己定義的solver.prototxt中配置的各種文件通過solve()接口進行網絡的優化。
還有一個需要注意的地方是,caffe架構中大量使用了gflags和glog,前者用於進行命令行參數的解析,而後者則是一個有效的日誌記錄工具,請大家在閱讀caffe源碼前對這兩個工具作適量的瞭解。
caffe.cpp解析到此告一段落,總的來說,caffe.cpp提供了對整體網絡進行操作的接口,就像燎原的火星一樣,從這個文件開始,整個caffe架構將逐步透明。
筆者作爲一個菜鳥,在寫作博客時難免會有錯誤與疏漏,歡迎各位讀者朋友批評指正,更盼望讀者朋友能提出中肯的意見,筆者一定虛心接受!
在文章的末尾貼出對筆者幫助比較大的兩篇博客:
1)一路顛簸:點擊打開鏈接
2)湯旭前輩的學習總結:點擊打開鏈接
歡迎閱讀筆者後續解析caffe源碼的博客,各位讀者朋友的支持與鼓勵是我最大的動力!
written by jiong
只有在當下不夠努力,纔會懷念過去