Jetson tx2 上源碼安裝 pytorch1.0.0(真. 血淚史)

Jetson tx2 上源碼安裝 pytorch1.0.0(真. 血淚史)

本篇以在python3.5安裝過程爲例。在安裝之前說明以下:

重點一:平臺及cuda cudnn的安裝問題

Jetson TX2平臺版本:Jetpack 3.3, cuda 9.0.252, cudnn7.1.5, TensorRT4.0.2, python2.7/python3.5

系統內核:tegra-ubuntu 4.4.38-tegra aarch64

Linux系統版本:Ubuntu16.04,cmake 3.15.6 (TX2刷機完原始的cmake是3.5.1版本,由於後面自己搗鼓的時候說最好安裝3.9.0以上版本cmake,所以我就直接升級到新版本了)

在源碼安裝pytorch的時候會使用到cuda及cudnn,首先檢查自己Jetson TX2上的cuda cudnn 是不是從jetpack安裝的,如果不是那麼就需要注意了!!!Jetson TX2的CPU是基於ARM的,所以安裝的cuda及cudnn都必須是ARM版本的(即aarch64),Jetson TX2上cuda及cudnn的安裝可以參考這篇:Jetson TX2 安裝 cuda9.0 及 cudnn7 超詳細(真實親測


重點二:pytorch源碼下載問題

1、pytorch不同版本對應着不同cuda版本

  1. 在pytorch的github上直接下載的是最新版的pytorch,本文寫於2020.1.14,現在使用
git clone  http://github.com/pytorch/pytorch  

下載得到的是 pytorch 1.4.0a0,想要安裝這個版本的pytorch需要的平臺需要安裝cuda9.2及以上,對於我現在的平臺是不匹配的,pytorch與cuda的對應可以在pytorch的官網上找到:

pytorch版本與cuda的對應可以參考pytorch官網:
pytorch歷史版本:https://pytorch.org/get-started/previous-versions/
pytorch最新版本:https://pytorch.org/get-started/locally/

如果你的平臺跟我的一樣,那麼我推薦 pytorch1.0.0版本。怎麼才能下載到自己想要版本的pytorch呢?建議大家好看這個鏈接:如何下載自己想要版本的pytorch

2、pytorch github源碼中的third_party文件夾是鏈接沒有文件

在pytorch github上,文件夾裏面雖然顯示是有內容的,但是其實是相關子項目鏈接,直接下載pytorch源碼是不能將第三方庫一起下載下來的。所以,在下載pytorch的時候需要注意。推薦使用下面的命令下載:

git clone --recursive --branch v1.0.0 http://github.com/pytorch/pytorch

一定要加上 --recursive 用於循環克隆git子項目

重點三:We should turn-off NCCL support since it is only available on the desktop GPU.

見 https://devtalk.nvidia.com/default/topic/1042821/jetson-tx2/pytorch-install-broken/
在編譯中出現下面的錯誤,就是因爲沒有關閉 NCCL,具體的關閉方法下面會講到

nvlink error : entry function '_Z28ncclAllReduceLLKernel_sum_i88ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_i328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_f168ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_u328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_f328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_u648ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z28ncclAllReduceLLKernel_sum_u88ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
Makefile:83: recipe for target '/home/nvidia/jetson-reinforcement/build/pytorch/third_party/build/nccl/obj/collectives/device/devlink.o' failed
make[5]: *** [/home/nvidia/jetson-reinforcement/build/pytorch/third_party/build/nccl/obj/collectives/device/devlink.o] Error 255
Makefile:45: recipe for target 'devicelib' failed
make[4]: *** [devicelib] Error 2
Makefile:24: recipe for target 'src.build' failed
make[3]: *** [src.build] Error 2
CMakeFiles/nccl.dir/build.make:60: recipe for target 'lib/libnccl.so' failed
make[2]: *** [lib/libnccl.so] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/nccl.dir/all' failed
make[1]: *** [CMakeFiles/nccl.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2


所以我下載了pytorch1.0.0,下面開始安裝

一、確定 python 及 pip 命令的版本

在jetson tx2刷機之後自帶有python2.7和python3.5兩個版本的python,所以在使用命令的時候需要注意系統中默認的python和pip是哪個版本的,比如在我的平臺上:

$ pip --version
顯示:
pip 19.3.1 from /home/nvidia/.local/lib/python2.7/site-packages/pip (python 2.7)

說明默認的 pip 綁定的是python2.7版本,想要把pytorch安裝在python3.5上的話,需要使用python3 和 pip3 命令,當然也可以修改系統默認python 和 pip綁定的版本(自行查找方法)。

二、安裝依賴及必要組件

sudo apt install libopenblas-dev libatlas-dev liblapack-dev
sudo apt install liblapacke-dev checkinstall # For OpenCV
sudo apt-get install python3-pip
 
pip3 install --upgrade pip3==9.0.1
sudo apt-get install python3-dev
 
sudo pip3 install numpy scipy
sudo pip3 install pyyaml
sudo pip3 install scikit-build
sudo apt-get -y install cmake
sudo apt install libffi-dev
sudo pip3 install cffi

安裝完之後,我們添加cudnn的lib和include路徑

sudo gedit ~/.bashrc
export CUDNN_LIB_DIR=/usr/lib/aarch64-linux-gnu
export CUDNN_INCLUDE_DIR=/usr/include
source ~/.bashrc

三、下載 pytorch 1.0.0 源碼及修改

根據 本文剛開始所說,以及這個鏈接:如何下載自己想要版本的pytorch 下載好pytorch1.0.0的源碼,然後開始關閉 NCCL

注意,在編譯之前我們必須先關閉程序中的NCCL

#sudo gedit /pytorch/CMakeList.txt
#   > CmakeLists.txt : Change NCCL to 'Off' on line 98

#sudo gedit /pytorch/setup.py
#   > setup.py: Add USE_NCCL = False below line 200

#sudo gedit /pytorch/tools/setup_helpers/nccl.py
#   > nccl.py : Change USE_SYSTEM_NCCL to 'False' on line 13
#               Change NCCL to 'False' on line 78

#sudo gedit /pytorch/torch/csrc/cuda/nccl.h
#   > nccl.h : Comment self-include on line 8
#              Comment entire code from line 21 to 28

#sudo gedit torch/csrc/distributed/c10d/ddp.cpp
#   > ddp.cpp : Comment nccl.h include on line 6
#               Comment torch::cuda::nccl::reduce on line 163

修改完成後開始編譯過程。然後執行下面的命令:

cd pytorch

git submodule update --init --recursive # 如果這個命令報錯,那先執行   git init  即可

sudo pip3 install -U setuptools
sudo pip3 install -r requirements.txt

四、編譯

首先,先開啓TX2的最大功率模式,這樣可以使我們的編譯速度稍微快一些:

sudo nvpmodel -m 0         # 切換工作模式到最大
sudo  ~/jetson_clocks.sh   # 強制開啓風扇最大轉速 
sudo pip3 install scikit-build --user
sudo ldconfig

export USE_NCCL=0
export USE_DISTRIBUTED=1
export USE_OPENCV=ON
export USE_CUDNN=1
export USE_CUDA=1
export ONNX_ML=1 

然後開始編譯:

sudo python3 setup.py bdist_wheel   # 這一步其實是在編譯生成 wheel 文件,存在 /pytorch/disk 下

漫長的編譯完成後,再執行下面命令:

sudo DEBUG=1 python3 setup.py build develop  
# 如果在執行這一句的時候顯示 tx2內存不足,那麼就可以 現將 /pytorch/disk 下的 wheel 文件拷貝出來,
# 再執行  sudo python3 setup.py clean  清除編譯的內容,然後 cd 到wheel拷貝出來目錄下,執行下面的命令安裝:
# sudo pip3 install torch-1.0.0a0-cp35-cp35m-linux_aarch64.whl

同樣漫長的編譯完成後,再執行後續的安裝命令:

sudo apt clean
sudo apt-get install libjpeg-dev zlib1g-dev

cd ~
git clone https://github.com/python-pillow/Pillow.git
cd Pillow/
sudo python3 setup.py install
sudo apt-get install python3-sklearn
sudo pip3 install pandas Cython scikit-image 

sudo pip3 --no-cache-dir install torchvision

安裝過程中 error 集錦及解決方案

error 1:

./caffe2/operators/quantized/int8_utils.h:4:38: fatal error: gemmlowp/public/gemmlowp.h: No such file or directory compilation terminated.
詳細信息如下:

In file included from ../caffe2/operators/quantized/int8_concat_op.h:7:0,
                 from ../caffe2/operators/quantized/int8_concat_op.cc:1:
../caffe2/operators/quantized/int8_utils.h:4:38: fatal error: gemmlowp/public/gemmlowp.h: No such file or directory
compilation terminated.
[1669/2643] Building CXX object caffe2/CMakeFile...ators/rnn/recurrent_network_blob_fetcher_op.cc.o
ninja: build stopped: subcommand failed.
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use-qnnpack caffe2'

解決方案
報錯信息中可見,是由於 caffe2/operators/quantized/int8_utils.h 找不到 gemmlowp/public/gemmlowp.h這個文件,但是這個文件真實存在,所以解決辦法就是,在 int8_utils.h 中的引用方式

#include <gemmlowp/public/gemmlowp.h>
變成
#include "third_party/gemmlowp/public/gemmlowp.h"

同時,將 caffe2/operators/quantized/int8_simd.h 文件中的頭文件也進行修改:

#include "gemmlowp/fixedpoint/fixedpoint.h"
#include "gemmlowp/public/gemmlowp.h"
變成
#include "third_party/gemmlowp/fixedpoint/fixedpoint.h"
#include "third_party/gemmlowp/public/gemmlowp.h"

error 2:

libcudnn.so.7: error adding symbols: File in wrong format
詳細信息如下:

[ 58%] Linking CXX shared library ../lib/libcaffe2_gpu.so
/usr/local/cuda/lib64/
collect2: error: ld returned 1 exit status
caffe2/CMakeFiles/caffe2_gpu.dir/build.make:185448: recipe for target 'lib/libcaffe2_gpu.so' failed
make[2]: *** [lib/libcaffe2_gpu.so] Error 1
CMakeFiles/Makefile2:4400: recipe for target 'caffe2/CMakeFiles/caffe2_gpu.dir/all' failed
make[1]: *** [caffe2/CMakeFiles/caffe2_gpu.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 58%] Built target python_copy_files
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use-qnnpack caffe2'

解決方案
這是因爲當前平臺上安裝的cudnn不是ARM版本的原因導致的。Jetson TX2是基於ARM架構的,與PC端不同,PC端的cudnn是基於X86_64架構的。因此,解決方案就是安裝ARM版本的cudnn,安裝方法可以參考這篇文章:

Jetson TX2 安裝 cuda9.0 及 cudnn7 超詳細(真實親測)


error 3:

/third_party/onnx/onnx/onnx_pb.h:52:26: fatal error: onnx/onnx.pb.h: No such file or directory
compilation terminated.

詳細信息如下:

caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o -MF caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o.d -o caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o -c ../caffe2/python/pybind_state.cc
In file included from ../caffe2/onnx/helper.h:4:0,
                 from ../caffe2/onnx/backend.h:5,
                 from ../caffe2/python/pybind_state.cc:19:


../third_party/onnx/onnx/onnx_pb.h:52:26: fatal error: onnx/onnx.pb.h: No such file or directory
compilation terminated.
[1735/2643] Building CXX object caffe2/CMakeFiles/caffe2.dir/share/contrib/depthwise/depthwise3x3_conv_op.cc.o
ninja: build stopped: subcommand failed.
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-nnpack --use-qnnpack caffe2'

解決方案
參考:https://github.com/onnx/onnx/issues/1947

third_party/onnx/onnx/onnx_pb.h 中的代碼如下:

#ifdef ONNX_ML
#include "onnx/onnx-ml.pb.h"
#else
#include "onnx/onnx.pb.h"
#endif

但是, onnx-ml.pb.h 和 onnx.pb.h兩個文件不在third_party/onnx,他們是在編譯的過程中生成的,在pytorch的路徑下搜索只能發現 onnx-ml.pb.h 這個文件,因此我們只需要聲明一下 ONNX_ML 即可:

在當前終端下輸入
export ONNX_ML=1

再次編譯即可


error 4:

RuntimeError: cuda runtime error (7) : too many resources requested for launch at /home/nvidia/pytorch/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:66
詳細信息如下:

 self.nyu = h5py.File(self.data_path)
THCudaCheck FAIL file=/home/nvidia/pytorch/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu line=66 error=7 : too many resources requested for launch
Traceback (most recent call last):
  File "test.py", line 96, in <module>
    output = model(input_var)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/nvidia/xiudan/zd_structure_paraformImgNet/fcrn.py", line 249, in forward
    ad1 = self._upsample_add(p1, c2)
  File "/media/nvidia/xiudan/zd_structure_paraformImgNet/fcrn.py", line 216, in _upsample_add
    return torch.nn.functional.interpolate(x, size=(H,W), mode='bilinear',align_corners=True) + y
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 2447, in interpolate
    return torch._C._nn.upsample_bilinear2d(input, _output_size(2), align_corners)
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /home/nvidia/pytorch/aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu:66

解決方案
參照:https://github.com/pytorch/pytorch/issues/8103#issucomment-424343705
“aten/src/THCUNN/generic/SpatialUpSamplingBilinear.cu”: 文件中做如下修改:

Around line 62:
comment out THCState_getCurrentDeviceProperties(state)->maxThraedsPerBlock;
Set
const int num_threads = 512;

Around line 97
comment out THCState_getCurrentDeviceProperties(state)->maxThraedsPerBlock;
Set
const int num_threads = 512;

I followed this guide for installation: https://gist.github.com/dusty-nv/ef2b372301c00c0a9d3203e42fd83426 using the install mode command “sudo python setup.py install”

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章