之前所有寫的tensorflow相關的東西都是CPU下的。現在公司一臺有Nvidia GTX 1060的電腦空餘,於是在這臺電腦上重裝ubuntu後開始編譯tensorflow_cc.so的GPU版本並使用。仔細說來有ABCDEF六步驟如下:
A---install bazel (參考https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu)
1 apt-get install pkg-config zip g++ zlib1g-dev unzip python3
2 google手動下載bazel in https://github.com/bazelbuild/bazel/releases
3 chmod +x bazel-<version>-installer-linux-x86_64.sh
./bazel-<version>-installer-linux-x86_64.sh --user
4 在/etc/bash.bashrc設置環境變量like the following shows:
export BAZELPATH="$PATH:$HOME/bin"
source /etc/bash.bashrc
B---GPU support
0 我使用的都是root用戶,建議你也這樣做。computer user is root,not guest.
1 lspci | grep -i nvidia
2 gcc --version
3 https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html 在這個鏈接找到適合自己的CUDA Toolkit release.我下載的是https://developer.nvidia.com/cuda-10.1-download-archive-base?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
4 uname -r
5 apt-get install linux-headers-$(uname -r)
期間我的報錯:Could not get lock /var/lib/dpkg/lock - open (11: Resource temporarily unavailable)
我的解決辦法:https://blog.csdn.net/u011596455/article/details/60322568
6 禁用nouveau驅動 https://blog.csdn.net/wd1603926823/article/details/77473746
7 關閉X-Window,很簡單:sudo service lightdm stop,然後切換到tty1:Ctrl+Alt+F1即可
8 sh cuda_10.1.105_418.39_linux.run
9 service lightdm start
10 nvidia-smi 能看到自己的gpu信息
11 將nvidia驅動工具加入環境變量/etc/bash.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64
export PATH=$PATH:/usr/local/cuda-9.0/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-9.0
source /etc/bash.bashrc
nvcc --version
12 以上都成功則證明你已安裝好Nvidia GPU driver和CUDA Toolkit(including CUPTI).查看NVIDIA驅動版本信息 cat /proc/driver/nvidia/version
13 https://www.cnblogs.com/eugene0/p/11587987.html這裏有一張參照表
14 https://tensorflow.google.cn/install/gpu :根據這個鏈接我們還沒安裝cuDNN.
15 dpkg -i libcudnn7_7.6.4.38-1+cuda10.1_amd64.deb
16 dpkg -i libcudnn7-dev_7.6.4.38-1+cuda10.1_amd64.deb
17 dpkg -i libcudnn7-doc_7.6.4.38-1+cuda10.1_amd64.deb (according to https://blog.csdn.net/dudu815110/article/details/88592558)至此CUDNN已成功安裝
C---tensorflow
1 下載想要的tensorflow版本(There are many different versions in https://github.com/tensorflow/tensorflow/releases)
2 ./configure
其中這一項一定要選擇Yes,其它按默認的走即可
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
3 apt install git
4 bazel build --config=opt --config=cuda //tensorflow:libtensorflow_cc.so
期間我的報錯了:ERROR: Analysis of target '//tensorflow:libtensorflow_cc.so' failed; build aborted: no such package '@local_config_git//': Traceback (most recent call last):
File "/home/jumper/workspace/tensorflow-2.0.0/third_party/git/git_configure.bzl", line 61
_fail(result.stderr)
File "/home/jumper/workspace/tensorflow-2.0.0/third_party/git/git_configure.bzl", line 14, in _fail
fail(("%sGit Configuration Error:%s %...)))
Git Configuration Error: Traceback (most recent call last):File "/root/.cache/bazel/_bazel_root/ec80c569286571968027dca7bea4db07/external/org_tensorflow/tensorflow/tools/git/gen_git_source.py", line 29, in <module>
from builtins import bytes # pylint: disable=redefined-builtin
ImportError: No module named builtins
解決辦法1: https://blog.csdn.net/sinat_28442665/article/details/85325232對我來說沒用
解決辦法2:apt install python-pip
pip install future
bazel build --config=opt --config=cuda //tensorflow:libtensorflow_cc.so
5 在你的tensorflow路徑下運行: ./tensorflow/contrib/makefile/build_all_linux.sh
期間的報錯信息:tensorflow/contrib/makefile/download_dependencies.sh: line 75: curl: command not found
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
解決辦法:apt-get install curl
然後繼續在你的tensorflow路徑下運行: ./tensorflow/contrib/makefile/build_all_linux.sh
又報錯 ./autogen.sh: 37: ./autogen.sh: autoreconf: not found
解決辦法: apt-get install autoconf
apt-get install automake
apt-get install libtool
6 如需Eigen庫,則進入tensorflow/contrib/makefile/downloads/eigen,執行:
mkdir build
cd build
cmake ..
make
sudo make install
安裝完畢後,在usr/local/include目錄下會出現eigen3文件夾。
7 整理庫和頭文件
a:5個so庫在bazel-bin/tensorflow文件夾下libtensorflow_cc.so libtensorflow_cc.so.2 libtensorflow_cc.so.2.0.0 libtensorflow_framework.so.2 libtensorflow_framework.so.2.0.0 (something like these)。
b:將5個頭文件文件夾整理出來:
<you_path>/tensorflow (只需tensorflow目錄下tensorflow和third_party兩個文件夾)
<you_path>/tensorflow/bazel-genfiles
<you_path>/tensorflow/tensorflow/contrib/makefile/downloads/nsync/public
<you_path>/tensorflow/tensorflow/contrib/makefile/gen/protobuf/include
8 將庫路徑加入系統 (我的庫路徑是:/home/jumper/workspace/tensorflow-gpu/lib)加在文件/etc/ld.so.conf的後面
9 ldconfig
10 ldconfig -v 運行這句可以看到剛剛的庫路徑。
如果出現 '/home/jumper/workspace/tensorflow-gpu/lib...libtensorflow_cc.so:No such file or directory',意味着7-a步驟不正確重新試着放入正確的庫。這裏就是試,放哪幾個庫。
11 將你剛剛放好的庫,我的是 /home/jumper/workspace/tensorflow-gpu/lib下面的所有庫複製一份到/usr/lib
12 ldconfig
13 gcc -ltensorflow_cc --verbose
不會出現如'can not find libtensorflow_cc'.之類的報錯說明成功
D--install eclipse(網上大把資料)
E--run a demo testing tensorflow-gpu(demo如下)
#include "tensorflow/core/framework/graph.pb.h"
#include <tensorflow/core/public/session_options.h>
#include <tensorflow/core/protobuf/meta_graph.pb.h>
#include <fstream>
#include <utility>
#include <vector>
#include <Eigen/Core>
#include <Eigen/Dense>
#include "tensorflow/cc/ops/const_op.h"
#include "tensorflow/cc/ops/image_ops.h"
//#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/graph.pb.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"
#include "tensorflow/core/graph/graph_def_builder.h"
#include "tensorflow/core/lib/core/errors.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/lib/io/path.h"
#include "tensorflow/core/lib/strings/stringprintf.h"
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/platform/init_main.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/util/command_line_flags.h"
//using namespace tensorflow;
#define MODELGRAPHRECT_PATH "/home/SSD/EcologyAnalysis/math/font/cnnmodel/model.meta"
#define MODELRECT_PATH "/home/SSD/EcologyAnalysis/math/font/cnnmodel/model"
//#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
using namespace cv;
using namespace std;
int getPredictLabel(tensorflow::Tensor &probabilities,int &output_class_id,double &output_prob)
{
int ndim2 = probabilities.shape().dims(); // Get the dimension of the tensor
auto tmap = probabilities.tensor<float, 2>(); // Tensor Shape: [batch_size, target_class_num]
int output_dim = probabilities.shape().dim_size(1); // Get the target_class_num from 1st dimension
std::vector<double> tout;
// Argmax: Get Final Prediction Label and Probability
for (int j = 0; j < output_dim; j++)
{
//std::cout << "Class " << j << " prob:" << tmap(0, j) << "," << std::endl;
if (tmap(0, j) >= output_prob) {
output_class_id = j;
output_prob = tmap(0, j);
}
}
return 0;
}
int mainnet()
{
tensorflow::Session* session_rect;
/////////CNN initiation--Wang Dan 20190710 for rect algaes
tensorflow::Status statusrect = NewSession(tensorflow::SessionOptions(), &session_rect);
if (!statusrect.ok())
{
std::cout << "ERROR: NewSession() for rect algaes failed..." << std::endl;
return -1;
}
tensorflow::MetaGraphDef graphdefrect;
tensorflow::Status status_loadrect = ReadBinaryProto(tensorflow::Env::Default(), MODELGRAPHRECT_PATH, &graphdefrect); //從meta文件中讀取圖模型;
if (!status_loadrect.ok()) {
std::cout << "ERROR: Loading model for rect algaes failed..." << std::endl;
std::cout << status_loadrect.ToString() << "\n";
return -1;
}
tensorflow::Status status_createrect = session_rect->Create(graphdefrect.graph_def()); //將模型導入會話Session中;
if (!status_createrect.ok()) {
std::cout << "ERROR: Creating graph for rect algaes in session failed..." << status_createrect.ToString() << std::endl;
return -1;
}
// 讀入預先訓練好的模型的權重
tensorflow::Tensor checkpointPathTensorRect(tensorflow::DT_STRING, tensorflow::TensorShape());
checkpointPathTensorRect.scalar<std::string>()() = MODELRECT_PATH;
statusrect = session_rect->Run(
{{ graphdefrect.saver_def().filename_tensor_name(), checkpointPathTensorRect },},
{},{graphdefrect.saver_def().restore_op_name()},nullptr);
if (!statusrect.ok())
{
throw runtime_error("Error loading checkpoint for rect algaes ...");
}
int rectzao_rows=96;//48;
int rectzao_cols=224;//80;
char srcfile[200];
char tmpfile[200];
for(int index=1;index<1001;index++)
{
sprintf(srcfile, "/media/root/Windows3/projects/Ecology/images/resultimgs/temp1/%d.jpg", index);
Mat src=imread(srcfile,0);
if(!src.data)
{
continue;
}
//CNN start...20190710 wd
tensorflow::Tensor resized_tensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({1,rectzao_rows,rectzao_cols,1}));
float *imgdata = resized_tensor.flat<float>().data();
cv::Mat cnninputImg(rectzao_rows, rectzao_cols, CV_32FC1, imgdata);
cv::Mat srccnn(rectzao_rows, rectzao_cols, CV_8UC1);
cv::resize(src,srccnn,cv::Size(rectzao_cols,rectzao_rows));
srccnn.convertTo(cnninputImg, CV_32FC1);
//對圖像做預處理
cnninputImg=cnninputImg/255;
//CNN input
vector<std::pair<string, tensorflow::Tensor> > inputs;
std::string Input1Name = "input";
inputs.push_back(std::make_pair(Input1Name, resized_tensor));
tensorflow::Tensor is_training_val(tensorflow::DT_BOOL,tensorflow::TensorShape());
is_training_val.scalar<bool>()()=false;
std::string Input2Name = "is_training";
inputs.push_back(std::make_pair(Input2Name, is_training_val));
//CNN predict
vector<tensorflow::Tensor> outputs;
string output="output";
cv::TickMeter timer;
timer.start();
tensorflow::Status status_run = session_rect->Run(inputs, {output}, {}, &outputs);
timer.stop();
//cout<<"time is "<<timer.getTimeMilli()<<" ms!"<<endl;
if (!status_run.ok()) {
std::cout << "ERROR: RUN failed in PreAlgaeRecognitionProcess()..." << std::endl;
std::cout << status_run.ToString() << "\n";
}
int label=-1;
double prob=0.0;
getPredictLabel(outputs[0],label,prob);
//CNN end...
cout<<"image "<<index<<" label is "<<label<<" ; time is "<<timer.getTimeMilli()<<" ms!"<<endl;
timer.reset();
}
return 0;
}
也可以自己寫一個demo。將你整理好的頭文件和庫加到這個工程。
出現報錯如下:
tensorflow/core/framework/device_attributes.pb.h: No such file or directory
google/protobuf/port_def.inc: No such file or directory
tensorflow/core/framework/graph.pb.h: No such file or directory
tensorflow/core/framework/node_def.pb.h: No such file or directory
tensorflow/core/framework/attr_value.pb.h: No such file or directory
tensorflow/core/framework/tensor.pb.h: No such file or directory
tensorflow/core/framework/resource_handle.pb.h: No such file or directory
tensorflow/core/framework/tensor_shape.pb.h: No such file or directory
tensorflow/core/framework/types.pb.h: No such file or directory
tensorflow/core/framework/function.pb.h: No such file or directory
tensorflow/core/framework/op_def.pb.h: No such file or directory
tensorflow/core/framework/versions.pb.h: No such file or directory
unsupported/Eigen/CXX11/Tensor: No such file or directory
這種就是因爲剛剛你整理的頭文件不夠正確。方法就是把之前你編譯的大的tensorflow路徑下的對應的這些報錯的頭文件放入你整理好的頭文件的對應位置。比如這最後一個報錯我就是複製/usr/local/include/eigen3下的unsupported文件夾到 我的/home/jumper/workspace/tensorflow-gpu/include路徑下。
3 /home/jumper/workspace/tensorflow-gpu/include/unsupported/Eigen/CXX11/Tensor:14:31: fatal error: ../../../Eigen/Core: No such file or directory
solution:copy the Eigen folder in your primer path(mine is /home/jumper/workspace/tensorflow-2.0.0/tensorflow/contrib/makefile/downloads/eigen) to your path (mine is /home/jumper/workspace/tensorflow-gpu/include)
4 /home/jumper/workspace/tensorflow-gpu/include/tensorflow/core/framework/allocator.h:24:38: fatal error: absl/strings/string_view.h: No such file or directory
solution:copy the absl folder in your primer path (mine is /home/jumper/workspace/tensorflow-2.0.0/tensorflow/contrib/makefile/downloads/absl) to your path (mine is /home/jumper/workspace/tensorflow-gpu/include)
5 tensorflow/core/lib/core/error_codes.pb.h: No such file or directory
solution:copy error_codes.pb.h in your primer path (mine is /home/jumper/workspace/tensorflow-2.0.0/tensorflow/contrib/makefile/gen/host_obj/tensorflow/core/lib/core) to your path (mine is /home/jumper/workspace/tensorflow-gpu/include/tensorflow/core/lib/core)
6 tensorflow/core/protobuf/config.pb.h: No such file or directory
solution:/home/jumper/workspace/tensorflow-2.0.0/tensorflow/contrib/makefile/gen/host_obj/tensorflow/core/protobuf to /home/jumper/workspace/tensorflow-gpu/include/tensorflow/core/protobuf
7 tensorflow/core/framework/cost_graph.pb.h: No such file or directory
solution:...
...like the following above.
諸如此類,看solution中寫了我都是這樣複製的。
8 在工程setting中加入-std=c++11好像是tensorflow2.0必須要有這個才能正常編譯
將libtensorflow_cc.so庫也加入工程
9 編譯時報錯Building target: testTensorflowGpu
Invoking: GCC C++ Linker
g++ -std=c++11 -L/home/jumper/workspace/tensorflow-gpu/lib -o "testTensorflowGpu" ./src/testgpu.o -lpthread -ltensorflow_cc
/usr/bin/ld: ./src/testgpu.o: undefined reference to symbol '_ZN10tensorflow15ReadBinaryProtoEPNS_3EnvERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPN6google8protobuf11MessageLiteE'
//home/jumper/workspace/tensorflow-gpu/lib/libtensorflow_framework.so.2: error adding symbols: DSO missing from command line
makefile:45: recipe for target 'testTensorflowGpu' failed
collect2: error: ld returned 1 exit status
make: *** [testTensorflowGpu] Error 1
解決辦法:
ln -s libtensorflow_framework.so.2 libtensorflow_framework.so
ldconfig
將剛剛軟鏈接的這個庫libtensorflow_framework.so也加到工程中編譯,不再報錯,編譯成功。
F--編譯成功運行demo
2019-10-28 14:08:16.082950: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
2019-10-28 14:08:16.083187: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1e5a0d0 executing computations on platform Host. Devices:
2019-10-28 14:08:16.083200: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-10-28 14:08:16.084910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-10-28 14:08:16.138433: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.138899: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f0ad30 executing computations on platform CUDA. Devices:
2019-10-28 14:08:16.138913: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1
2019-10-28 14:08:16.139006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.139385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2019-10-28 14:08:16.139592: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-10-28 14:08:16.140711: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-10-28 14:08:16.141492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-10-28 14:08:16.141673: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-10-28 14:08:16.142814: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-10-28 14:08:16.143635: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-10-28 14:08:16.146235: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-28 14:08:16.146303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.146719: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.147077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-10-28 14:08:16.147099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2019-10-28 14:08:16.147663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-28 14:08:16.147673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-10-28 14:08:16.147677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-10-28 14:08:16.147848: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.148226: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-28 14:08:16.148627: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5454 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-10-28 14:08:19.537588: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
image 1 label is 0 ; time is 2120.38 ms!
image 2 label is 10 ; time is 3.45043 ms!
image 3 label is 25 ; time is 3.21582 ms!
image 4 label is 10 ; time is 3.46827 ms!
image 6 label is 10 ; time is 3.21907 ms!
image 7 label is 0 ; time is 3.10698 ms!
可以看到CUDA也有一個初始化過程,往往第一次耗時較長。
可以看到這個版本GPU此時跑到了74%。CPU版本的運行時GPU怎麼浮動都始終低於10%。
我對比了一下,同一臺電腦同樣的圖片程序模型用tensorflow-cpu和tensorflow-gpu的動態庫的耗時。
tensorflow-cpu下預測分類耗時:4.13min
tensorflow-gpu下預測分類耗時:2.83min
可以看到還是提速了一些的。
附上參考的一些鏈接:
tensorflow各個版本需要的CUDA版本以及Cudnn的對應關係
https://blog.csdn.net/qq_27825451/article/details/89082978
https://blog.csdn.net/dragonchow123/article/details/80682787
https://tensorflow.google.cn/install/source
c++ tensorflow接口GPU使用
https://blog.csdn.net/luoyexuge/article/details/81877069
https://www.cnblogs.com/lvchaoshun/p/6614048.html
https://www.jianshu.com/p/31b00ec5bc74
https://blog.csdn.net/wanzhen4330/article/details/81699769
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#pre-installation-actions
https://blog.csdn.net/u014475479/article/details/81702392
https://blog.csdn.net/caroline_wendy/article/details/80868120
附上小不點的美貌及我種的蘑菇