服務器配置深度學習環境

一共有兩種方法

首先先執行
sudo apt-get install libc6-dev build-essential
親測如果不執行,手動安裝nvidia driver會報錯

使用apt的方法(比較方便,簡單)

參考了tensoeflow官網的安裝方法,
Ubuntu 18.04 (CUDA 10.1)

   # Add NVIDIA package repositories
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
    sudo apt-get update
    wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
    sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
    sudo apt-get update
# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-418
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.4.38-1+cuda10.1  \
    libcudnn7-dev=7.6.4.38-1+cuda10.1


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
    libnvinfer-dev=6.0.1-1+cuda10.1 \
    libnvinfer-plugin6=6.0.1-1+cuda10.1

但按照步驟走後出現了問題,
1是安裝nvidia driver的時候,儘管輸入的是418版本,但最終安裝的還是430版本,其對應的是cuda10.2。(這個可能不直接影響後續安裝)
2、安裝cuda的時候,不是報“unmet dependence"就是”package is damage,這樣就需要刪除nvidia driver了
3 nvcc不管用

解決方法:參考

https://www.pugetsystems.com/labs/hpc/How-To-Install-CUDA-10-together-with-9-2-on-Ubuntu-18-04-with-support-for-NVIDIA-20XX-Turing-GPUs-1236/

完整的安裝方案:

sudo apt-get install libc6-dev build-essential
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub

sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt-get update
sudo apt-get install --no-install-recommends \
cuda-10-0 \
libcudnn7=7.6.4.38-1+cuda10.0  \
libcudnn7-dev=7.6.4.38-1+cuda10.0

vi ~/.bashrc
export PATH=/usr/local/cuda/bin:${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

(你可以根據需要的cuda版本,改變deb包,這裏是cuda10.0)
以上的方法可以自動給安裝好你的顯卡驅動,
如果要自己獨立安裝顯卡,記得安裝完3後重啓

方法二

nvidia driver、cuda、cudnn全部手動下載和安裝
!!最好以cuda->cudnn->driver的順序安裝,因爲有時候不管driver版本再新,都無法成功安裝cuda。

driver下載:
nvidia driver地址:

https://www.nvidia.com/Download/index.aspx

或者

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update
$ sudo apt-get install nvidia-driver-418

cuda 版本
cat /usr/local/cuda/version.txt

cudnn 版本
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

刪除nvidia driver:

dpkg -l | grep -i nvidia
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get purge nvidia*
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章