Docker部署yolact中編譯DCNv2的問題

yolact部署到Docker中,需要單獨編譯DCNv2

cd external/DCNv2
python setup.py build develop

但是這個DCNv2的編譯需要依賴GPU,總是編不過。

 

失敗1:使用python:3.6鏡像

FROM python:3.6
...
WORKDIR ***/external/DCNv2
RUN python setup.py build develop
...

執行後編譯報錯,通過docker run進入到docker裏面依然編譯報錯:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "setup.py", line 64, in <module>
    ext_modules=get_extensions(),
  File "setup.py", line 41, in get_extensions
    raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

原因分析:python:3.6鏡像未安裝cuda驅動

 

失敗2:改用pytorch/pytorch:1.2-cuda10.0-cudnn7-runtime鏡像

FROM pytorch/pytorch:1.2-cuda10.0-cudnn7-runtime
...
WORKDIR ***/external/DCNv2
RUN python setup.py build develop
...

無論是Dockerfile編譯,還是docker run進入到docker裏面編譯,依然報錯:

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "setup.py", line 64, in <module>
    ext_modules=get_extensions(),
  File "setup.py", line 41, in get_extensions
    raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

原因分析:torch.cuda.is_available() 顯示爲True,但是from torch.utils.cpp_extension import CUDA_HOME,CUDA_HOME爲NULL,看了一下/usr/local目錄下確實沒有cuda相關的目錄。

 

失敗3:改用pytorch/pytorch:1.2-cuda10.0-cudnn7-devel鏡像

FROM pytorch/pytorch:1.2-cuda10.0-cudnn7-devel
...
WORKDIR ***/external/DCNv2
RUN python setup.py build develop
...

中間出現過一個apt-get update失敗的問題:Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/Packages.gz  Hash Sum mismatch

解決方法:

...
# Update source
RUN sed -i s:/archive.ubuntu.com:/mirrors.tuna.tsinghua.edu.cn/ubuntu:g /etc/apt/sources.list
RUN cat /etc/apt/sources.list
RUN apt-get clean
RUN apt-get -y update --fix-missing --allow-unauthenticated
...

docker build跑起來,結果編譯依然報錯(吐血): 

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "setup.py", line 64, in <module>
    ext_modules=get_extensions(),
  File "setup.py", line 41, in get_extensions
    raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

但是通過docker run --gpus all -it ... /bin/bash進入到docker裏面,居然編譯成功了。

running build
running build_ext
running develop
running egg_info
writing DCNv2.egg-info/PKG-INFO
writing dependency_links to DCNv2.egg-info/dependency_links.txt
writing top-level names to DCNv2.egg-info/top_level.txt
reading manifest file 'DCNv2.egg-info/SOURCES.txt'
writing manifest file 'DCNv2.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.6/_ext.cpython-36m-x86_64-linux-gnu.so -> 
Creating /opt/conda/lib/python3.6/site-packages/DCNv2.egg-link (link to .)
Adding DCNv2 0.1 to easy-install.pth file

Installed ***/external/DCNv2
Processing dependencies for DCNv2==0.1
Finished processing dependencies for DCNv2==0.1

 原因分析:通過docker run進入到docker裏面編譯時,已通過--gpus選項爲docker指定了GPU,所以可以使用GPU並編譯成功。但在docker build執行Dockerfile時並未爲docker指定GPU,所以依然無法使用GPU。

 

終極方案:不在docker build時通過Dockerfile編譯,而是在ENDPOINT中執行編譯:

FROM pytorch/pytorch:1.2-cuda10.0-cudnn7-devel
...
ENTRYPOINT ["sh", "run.sh"]

在run.sh中編譯DCNv2:

cd external/DCNv2
python setup.py build develop
cd ../..
python ***.py

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章