mxnet是最近流行的深度學習框架之一,使用起來體驗不錯,不過平常都是用python接口寫程序,本文介紹如何在linux下從源碼編譯mxnet並使用其C++接口編程。
本文所使用的環境是ubunt14.04,g++4.8,如果是其他類unix發行版(fedora,mac os)同理。
目標
- 編譯出libmxnet.a,libmxnet.so,本文只編譯cpu版
- 鏈接共享庫,調用C++接口
構建依賴
最小構建條件:
- 最新支持C++11的編譯器
- blas庫(比如libblas,atlas,openblas),opencv庫
可選條件:
- CUDA Toolkit>=7.0(nvidia GPU, compute capability>=2.0)
- cudnn>=3,加速GPU computation
編譯步驟
獲取源碼:
git clone --recursive https://github.com/dmlc/mxnet
可選:默認在master分支,可以切換到穩定的分支再進行後續操作,本文基於v1.6.x
方法一:直接使用make
安裝依賴:
sudo apt-get update
sudo apt-get install -y build-essential git libatlas-base-dev libopencv-dev
如果需要使用GPU版,還需要安裝cudnn,cuda
修改配置:
找到源碼目錄 mxnet/make 裏面的config.mk文件進行如下修改
必選,生成op.h文件,這一步很關鍵
USE_CPP_PACKAGE=1
可選,cuda相關
USE_CUDA = 1
USE_CUDA_PATH = /usr/local/cuda # 根據實際目錄
USE_CUDNN = 1
構建mxnet:
cd mxnet
cp make/config.mk .
make -j4
編譯完成後,會在mxnet根目錄的lib目錄生成libmxnet.a和libmxnet.so
方法二:使用cmake
安裝依賴:
sudo apt-get install libopenblas-dev libopencv-dev
修改編譯選項:
找到mxnet源碼根目錄的CMakeLists.txt
找到對應處,關閉cuda相關項,開啓cpp項(生成op.h文件,這一步很關鍵),改爲以下
mxnet_option(USE_CUDA "Build with CUDA support" OFF)
mxnet_option(USE_OLDCMAKECUDA "Build with old cmake cuda" OFF)
mxnet_option(USE_NCCL "Use NVidia NCCL with CUDA" OFF)
mxnet_option(USE_OPENCV "Build with OpenCV support" ON)
mxnet_option(USE_OPENMP "Build with Openmp support" ON)
mxnet_option(USE_CUDNN "Build with cudnn support" OFF) # one could set CUDNN_ROOT for search path
mxnet_option(USE_CPP_PACKAGE "Build C++ Package" ON)
構建mxnet:
cd mxnet
mkdir build
cd build
cmake ..
make -j4
編譯完成後,會在build目錄生成libmxnet.a和libmxnet.so
其實,openblas和opencv庫也可以通過源碼編譯安裝,然後指定庫的path,使得mxnet的cmake中find_package成功即可,這樣可以避免把依賴庫裝到系統目錄
注意:
- 一定要使用git --recursive方式下載代碼,因爲很多目錄裏面代碼是遞歸下載的,如果直接在github網頁點擊下載會導致代碼補全,缺少很多庫。
- 編譯前確保安裝了python,並且python命令在環境變量中
- 如果報internal compiler error: Killed (program cc1plus)這個錯誤,是因爲內存不足,解決方法:stackoverflow
編寫程序
在前面的鏈接庫(靜態和動態)編譯好後,建立一個cpp工程,引入mxnet相關頭文件和共享庫,編寫測試
這裏跑一個機器學習的經典例子:mnist手寫數據識別訓練,需要預先下載好mnist數據集(地址:mnist)
工程結構:
mxnet_cpp_test項目目錄
.
├── CMakeLists.txt
└── src
└── main.cpp
CMakeLists.txt
project(mxnet_cpp_test)
cmake_minimum_required(VERSION 2.8)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -std=c++11 -W")
include_directories(
/home/user/codetest/mxnet/include
/home/user/codetest/mxnet/cpp-package/include
# 下面兩個include在高版本中不需要
/home/user/codetest/mxnet/dmlc-core/include
/home/user/codetest/mxnet/nnvm/include
)
aux_source_directory(./src DIR_SRCS)
link_directories(/home/user/codetest/mxnet/lib) # mxnet庫所在的目錄
add_executable(mxnet_cpp_test ${DIR_SRCS})
target_link_libraries(mxnet_cpp_test mxnet)
main.cpp
#include <chrono>
#include "mxnet-cpp/MxNetCpp.h"
using namespace std;
using namespace mxnet::cpp;
Symbol mlp(const vector<int> &layers)
{
auto x = Symbol::Variable("X");
auto label = Symbol::Variable("label");
vector<Symbol> weights(layers.size());
vector<Symbol> biases(layers.size());
vector<Symbol> outputs(layers.size());
for (size_t i = 0; i < layers.size(); ++i)
{
weights[i] = Symbol::Variable("w" + to_string(i));
biases[i] = Symbol::Variable("b" + to_string(i));
Symbol fc = FullyConnected(
i == 0 ? x : outputs[i - 1], // data
weights[i],
biases[i],
layers[i]);
outputs[i] = i == layers.size() - 1 ? fc : Activation(fc, ActivationActType::kRelu);
}
return SoftmaxOutput(outputs.back(), label);
}
int main(int argc, char** argv)
{
const int image_size = 28;
const vector<int> layers{ 128, 64, 10 };
const int batch_size = 100;
const int max_epoch = 10;
const float learning_rate = 0.1;
const float weight_decay = 1e-2;
auto train_iter = MXDataIter("MNISTIter")
.SetParam("image", "./mnist_data/train-images.idx3-ubyte")
.SetParam("label", "./mnist_data/train-labels.idx1-ubyte")
.SetParam("batch_size", batch_size)
.SetParam("flat", 1)
.CreateDataIter();
auto val_iter = MXDataIter("MNISTIter")
.SetParam("image", "./mnist_data/t10k-images.idx3-ubyte")
.SetParam("label", "./mnist_data/t10k-labels.idx1-ubyte")
.SetParam("batch_size", batch_size)
.SetParam("flat", 1)
.CreateDataIter();
auto net = mlp(layers);
Context ctx = Context::cpu(); // Use CPU for training
//Context ctx = Context::gpu();
std::map<string, NDArray> args;
args["X"] = NDArray(Shape(batch_size, image_size*image_size), ctx);
args["label"] = NDArray(Shape(batch_size), ctx);
// Let MXNet infer shapes other parameters such as weights
net.InferArgsMap(ctx, &args, args);
// Initialize all parameters with uniform distribution U(-0.01, 0.01)
auto initializer = Uniform(0.01);
for (auto& arg : args)
{
// arg.first is parameter name, and arg.second is the value
initializer(arg.first, &arg.second);
}
// Create sgd optimizer
Optimizer* opt = OptimizerRegistry::Find("sgd");
opt->SetParam("rescale_grad", 1.0 / batch_size)
->SetParam("lr", learning_rate)
->SetParam("wd", weight_decay);
// Create executor by binding parameters to the model
auto *exec = net.SimpleBind(ctx, args);
auto arg_names = net.ListArguments();
// Start training
for (int iter = 0; iter < max_epoch; ++iter)
{
int samples = 0;
train_iter.Reset();
auto tic = chrono::system_clock::now();
while (train_iter.Next())
{
samples += batch_size;
auto data_batch = train_iter.GetDataBatch();
// Set data and label
data_batch.data.CopyTo(&args["X"]);
data_batch.label.CopyTo(&args["label"]);
// Compute gradients
exec->Forward(true);
exec->Backward();
// Update parameters
for (size_t i = 0; i < arg_names.size(); ++i)
{
if (arg_names[i] == "X" || arg_names[i] == "label") continue;
opt->Update(i, exec->arg_arrays[i], exec->grad_arrays[i]);
}
}
auto toc = chrono::system_clock::now();
Accuracy acc;
val_iter.Reset();
while (val_iter.Next())
{
auto data_batch = val_iter.GetDataBatch();
data_batch.data.CopyTo(&args["X"]);
data_batch.label.CopyTo(&args["label"]);
// Forward pass is enough as no gradient is needed when evaluating
exec->Forward(false);
acc.Update(data_batch.label, exec->outputs[0]);
}
float duration = chrono::duration_cast<chrono::milliseconds>(toc - tic).count() / 1000.0;
LG << "Epoch: " << iter << " " << samples / duration << " samples/sec Accuracy: " << acc.Get();
}
delete exec;
MXNotifyShutdown();
return 0;
}
使用cmake構建makefile,編譯時會默認鏈接libmxnet.so(也可以指定libmxnet.a),在編譯生成的目錄中需要將mnist數據拷貝進去,供程序load進來進行訓練
編譯生成目錄結構:
.
├── Makefile
├── mnist_data
│ ├── t10k-images.idx3-ubyte
│ ├── t10k-labels.idx1-ubyte
│ ├── train-images.idx3-ubyte
│ └── train-labels.idx1-ubyte
├── mxnet_cpp_test
運行結果:
MNISTIter: load 60000 images, shuffle=1, shape=(100,784)
MNISTIter: load 10000 images, shuffle=1, shape=(100,784)
Epoch: 0 25178.3 samples/sec Accuracy: 0.1135
Epoch: 1 24340.8 samples/sec Accuracy: 0.536
Epoch: 2 25031.3 samples/sec Accuracy: 0.8278
Epoch: 3 25466.9 samples/sec Accuracy: 0.8729
Epoch: 4 25370 samples/sec Accuracy: 0.9042
Epoch: 5 23819 samples/sec Accuracy: 0.9159
Epoch: 6 24067.4 samples/sec Accuracy: 0.9229
Epoch: 7 26513.5 samples/sec Accuracy: 0.9287
Epoch: 8 25575.4 samples/sec Accuracy: 0.9335
Epoch: 9 25619.1 samples/sec Accuracy: 0.9371
可以感受到用C++直接寫mxnet訓練程序運行速度非常快