服务器配置深度学习环境

一共有两种方法

首先先执行
sudo apt-get install libc6-dev build-essential
亲测如果不执行,手动安装nvidia driver会报错

使用apt的方法(比较方便,简单)

参考了tensoeflow官网的安装方法,
Ubuntu 18.04 (CUDA 10.1)

   # Add NVIDIA package repositories
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
    sudo apt-get update
    wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
    sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
    sudo apt-get update
# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-418
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.4.38-1+cuda10.1  \
    libcudnn7-dev=7.6.4.38-1+cuda10.1


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
    libnvinfer-dev=6.0.1-1+cuda10.1 \
    libnvinfer-plugin6=6.0.1-1+cuda10.1

但按照步骤走后出现了问题,
1是安装nvidia driver的时候,尽管输入的是418版本,但最终安装的还是430版本,其对应的是cuda10.2。(这个可能不直接影响后续安装)
2、安装cuda的时候,不是报“unmet dependence"就是”package is damage,这样就需要删除nvidia driver了
3 nvcc不管用

解决方法:参考

https://www.pugetsystems.com/labs/hpc/How-To-Install-CUDA-10-together-with-9-2-on-Ubuntu-18-04-with-support-for-NVIDIA-20XX-Turing-GPUs-1236/

完整的安装方案:

sudo apt-get install libc6-dev build-essential
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub

sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt-get update
sudo apt-get install --no-install-recommends \
cuda-10-0 \
libcudnn7=7.6.4.38-1+cuda10.0  \
libcudnn7-dev=7.6.4.38-1+cuda10.0

vi ~/.bashrc
export PATH=/usr/local/cuda/bin:${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

(你可以根据需要的cuda版本,改变deb包,这里是cuda10.0)
以上的方法可以自动给安装好你的显卡驱动,
如果要自己独立安装显卡,记得安装完3后重启

方法二

nvidia driver、cuda、cudnn全部手动下载和安装
!!最好以cuda->cudnn->driver的顺序安装,因为有时候不管driver版本再新,都无法成功安装cuda。

driver下载:
nvidia driver地址:

https://www.nvidia.com/Download/index.aspx

或者

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update
$ sudo apt-get install nvidia-driver-418

cuda 版本
cat /usr/local/cuda/version.txt

cudnn 版本
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

删除nvidia driver:

dpkg -l | grep -i nvidia
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get purge nvidia*
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章