在安裝caffe2和detectron過程中遇到的問題
1.cmake版本過低 需要使用cmake3及以上
2.使用cmake –version出現下述錯誤:
CMake Error: Could not find CMAKE_ROOT !!!
CMake has most likely not been installed correctly.
Modules directory not found in
/home/kelvin/local/bin
注意環境變量的添加
3.git clone https://github.com/RLovelett/eigen.git.出現讓填賬號問題
因爲eigen在這個鏈接不可獲得,官方關閉了所有用戶的權限,使用鏡像鏈接
https://github.com/eigenteam/eigen-git-mirror.git.。
4.nvcc -V
nvcc command not found
#配置環境變量
echo 'export PATH=/usr/local/cuda-8.0/bin:$PATH'>>~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH'>>~/.bashrc
source ~/.bashrc
#配置完之後,重新輸入nvcc -V就可以看到詳細信息了
5.ImportError: No module named past.builtins
pip install future(不要加sudo)
6.No module named google.protobuf.internal
pip install protobuf
7.ImportError: No module named hypothesis
pip install hypothesis
8.No module named caffe2.python
#python 環境變量的添加
#要注意caffe2沒有安裝到Python環境下,所以要到安裝的caffe2下進行代碼運行
export PATH=/usr/local/pytorch/build:$PATH
export PYTHONPATH=/usr/local/pytorch/build:$PYTHONPATH
9.only in cpu : pybind11模塊沒有
另行安裝pybind11,並且將/path/to/pybind11/includ加入環境變量
(加入環境變量有時候不起作用,則在caffe2 cmake的時候 cmake -D pybind11_INCLUDE_DIR=/root/pybind11/build)
10.RuntimeError: [enforce fail at common_cudnn.h:118] version_match. cuDNN compiled (5105) and runtime (6021) versions mismatch
重新編譯安裝caffe2,同時指定cudnn和cuda文件夾,cudnn指定爲6.0版本
cmake時加上
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0
-DCUDNN_ROOT_DIR=/usr/local/cuda
11.File “/home/gxwtest/object-detection/detectron/lib/utils/vis.py”, line 327, in vis_one_image
ValueError: need more than 2 values to unpack
opencv2.x版本里,cv2.findContours返回值只有兩個,vis.py 327行註釋掉‘_’
12._png.write_png()
RuntimeError: Could not create write struct
pip安裝matplotlib版本過高,_png.so似乎依賴libpng16.so,本機裝的是libpng15.so,卸載matplotlib,用yum install python-matplotlib安裝低版本Matplotlib
13.多卡訓練時報錯:
RuntimeError: [enforce fail at context_gpu.h:314] error == cudaSuccess. 77 vs 0. Error at: /home/gxwtest/object-detection/caffe2/caffe2/core/context_gpu.h:314: an illegal memory access was encountered
#Python端執行
from caffe2.python import workspace
print(workspace.GetCudaPeerAccessPattern())
顯示[True,False,False,True]顯示只能單卡或者雙卡訓練,雙卡訓練時只能(1,4)和(2,3)組合,其它配置都是錯誤。
#執行多卡訓練時用:
python2.7 tools/train_net.py \
--multi-gpu-testing \
--cfg configs/getting_started/tutorial_2gpu_e2e_faster_rcnn_R-50-FPN.yaml \
OUTPUT_DIR ./tmp/detectron-output USE_NCCL True
14.多卡訓練使用nccl報錯:
cudaSuccess. 2 vs 0.
重裝NCCL2.1.15版
wget https://developer.download.nvidia.com/compute/redist/nccl/v2.1/nccl_2.1.15-1%2Bcuda8.0_x86_64.txz
參考https://github.com/facebookresearch/video-nonlocal-net/issues/14
然後重新運行caffe2
cmake -DNCCL_ROOT_DIR=/usr/local/nccl
官方安裝鏈接
其他錯誤參考鏈接
https://blog.csdn.net/wfei101/article/details/79451754
https://blog.csdn.net/zziahgf/article/details/79141879
https://blog.csdn.net/zziahgf/article/details/72461175