【深度学习003】环境搭建—NVIDIA 2080Ti显卡配置

1. 显卡驱动安装

1.1 查看是否存在已经安装的显卡

nvidia-smi

nvidia-settings 是显卡设置
如果已经存在安装的显卡,则需要删除:sudo apt-get remove -purge nvidia*
如果不存在,则不需要删除
1.2 下载显卡驱动程序 .run格式
https://www.nvidia.com/Download/index.aspx?lang=cn# 官方下载地址
1.3 禁用secure boot
将其设置成disable,如果Secure Boot是灰色的,无法disable。请看这篇博客提到的内容,然后反复多试几次,这里确实挺坑的,要费一些时间。

https://blog.csdn.net/qq_20492405/article/details/79034430
1.4 禁用nouveau
(a) 打开编辑配置文件:sudo gedit /etc/modprobe.d/blacklist.conf
(b) 在最后一行加入 blacklist nouveau
© 执行生效 sudo update-initramfs -u
1.5 重启
reboot
1.6 查看nouveau是否运行

lsmod | grep nouveau        //没有输出代表禁用成功

1.7禁止图形桌面

sudo telinit 3              //运行级别3

进入黑屏后,点击ctrl+alt+F1 进入命令行模式
1.8 安装驱动
cd到驱动文件所在目录
(a)给驱动文件增加可执行权限

sudo chmod a+x NVIDIA-Linux-x86_64-418.56.run

(b)执行

sudo sh ./NVIDIA-Linux-x86_64-418.56.run  -no-opengl-files

-no-opengl-files :为了防止循环登录,只安装驱动,不安装openGL


ps:安装时,如果缺少gcc,make, make-guile 则根据提示安装即可。
例如:
gcc --version
sudo apt install gcc


2.安装CUDA

英伟达官方下载
下载完成 copy 到特定的文件目录下。
更新一下

sudo apt update
sudo apt upgrade     // 费时稍长
reboot               //  重启 

2.1 关闭屏幕

sudo service lightdm stop   // ctrl+alt+f1 此命令要是不行使用就使用下面的命令
sudo telinit 3              //运行级别3
startx                         //返回,进入 X 界面,一种图形界面

2.2 激活,执行
cd到特定文件夹后

sudo chmod a+x cuda_10.1.105_418.39_linux.run
sudo sh cuda_10.1.105_418.39_linux.run

出现协议,填写 accept
之后,出现如下选项,按“ENTER”去掉第一个驱动选项,因为之前安装了驱动。之后,Install
在这里插入图片描述
安装完成后出现以下
在这里插入图片描述

2.3 环境变量

sudo  gedit /etc/profile

在结尾加上

export PATH=/usr/local/cuda-10.1/bin:$PATH
export LD_LTBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH
source /etc/profile

原博客有一个创建链接文件的步骤,但是我没做成功,最后查看版本也可以
2.4 查看版本
ncvv --version

(base) li@li-System-Product-Name:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

2.5 测试
安装 g++

sudo apt install g++                              //安装 g++

编译测试样本

cd ~/NVIDIA_CUDA-10.1_Samples
make                               //费时较长,要出现 Finished

在这里插入图片描述
测试

cd ~/NVIDIA_CUDA-10.1_Samples/bin/x86_64/linux/release
./deviceQuery

在这里插入图片描述
出现 PASS 表示安装成功
重启电脑
ps:后来,发现了官方安装教材。是极好的参考资料,建议直接看官方的(注册):
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

3.安装cuDNN(可选)

(注意:此时的安装方法按照官网的步骤,一步一步操作即可,我已经试过了,无坑)
针对卷积神经网络模型优化的数值计算库。使用他可以将卷积神经网络的计算速度提升2~3倍,这里建议一定要安装。
此时的安装方法,一定要不去看各种中文的教程,里面的坑太多(亲试)。看官方的教程。(注册)
cuDNN下载地址:
安装教程:https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-linux
copy官方教材如下:

2.2. Downloading
cuDNN In order to download cuDNN, ensure you are registered for the NVIDIA Developer Program.
//下载时需要先注册
Go to: NVIDIA cuDNN home page. Click Download. Complete the short
survey and click Submit. Accept the Terms and Conditions. A list of
available download versions of cuDNN displays.
Select the cuDNN version you want to install. A list of available resources displays.
//选择版本下载,我在选择时,直接选的“ cuDNN Library for Linux”这一版,因为这一版本的后缀为.tgz
,是linux通用版。
2.3. Installing cuDNN on Linux
The following steps describe how to build a cuDNN dependent program.
Choose the installation method that meets your environment needs. //根据你的系统环境,选择安装方法
For example, the tar file installation applies to all Linux platforms, and the debian installation package
applies to Ubuntu 14.04 and 16.04. //压缩包适合所有linux系统,deb适合Ubuntu14,16。

In the following sections:
your CUDA directory path is referred to as /usr/local/cuda/ your cuDNN download path is referred to as
<cudnnpath> //CUDA的路径是/usr… ; cuDNN的下载路径要看你的下载地址,以来代替你的下载路径
2.3.1. Installing from a Tar File Navigate to your directory containing the cuDNN Tar file.
Unzip the cuDNN package. // cd 到你的下载位置,解压
$ tar -xzvf cudnn-9.0-linux-x64-v7.tgz
Copy the following files into the CUDA Toolkit directory, and change the file permissions.
//复制下列文件到CUDA Toolkit 目录。改变文件权限,permissions:权限
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include //copy cudnn.h文件到CUDA Toolkit目录下
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 //copy libcudnn的所有文件到CUDA的lib64下
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* //更改权限,设为可执行
2.3.2. Installing from a Debian File Navigate to your directory containing cuDNN Debian file.
//下载另外三个.Deb的文件包,然后安装
Install the runtime library,
for example: sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb
Install the developer library, for example:
sudo dpkg -i libcudnn7-devel_7.0.3.11-1+cuda9.0_amd64.deb
Install the code samples and the cuDNN Library User Guide, for example:
sudo dpkg -i libcudnn7-doc_7.0.3.11-1+cuda9.0_amd64.deb
2.4. Verifying To verify that cuDNN is installed and is running properly,
compile the mnistCUDNN sample located in the
/usr/src/cudnn_samples_v7 directory in the debian file.
Copy the cuDNN sample to a writable path.
$cp -r /usr/src/cudnn_samples_v7/ $HOME
Go to the writable path.
$ cd
$HOME/cudnn_samples_v7/mnistCUDNN
Compile the mnistCUDNN sample.
$make clean && make
Run the mnistCUDNN sample.
$ ./mnistCUDNN
If cuDNN is properly installed and running on your Linux system, you will see a
message similar to the following:
Test passed! //安装成功
2.5. Upgrading from v6 to v7 cuDNN v7 can coexist with previous versions of cuDNN, such as v5 or v6.
2.6. Troubleshooting
Join the NVIDIA Developer Forum to post questions and follow
discussions.

在这里插入图片描述


ps
可能遇到的错误。$ ./mnistCUDNN

error while loading shared libraries:libcudart.so.10.0:cannot open shared object file:NO such file or directory

解决办法:

$ sudo vim ~/.bashrc

xieru

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda10.0

$ source ~/.bashrc
$ ./mnistCUDNN
Result of classification: 1 3 5
Test passed!

我的实际安装操作:

3.1 查看cuda的安装位置

(base) li@li-System-Product-Name:~$ which nvcc
/usr/local/cuda-10.1/bin/nvcc
(base) li@li-System-Product-Name:~$ cd /usr/local
(base) li@li-System-Product-Name:/usr/local$ ls
bin  cuda  cuda-10.1  etc  games  include  lib  man  sbin  share  src
(base) li@li-System-Product-Name:/usr/local/cuda-10.1$ ls
bin       include    libnvvp               nvml     src
doc       jre        NsightCompute-2019.1  nvvm     targets
EULA.txt  lib64      nsightee_plugins      samples  tools
extras    libnsight  NsightSystems-2018.3  share    version.txt

3.2 下载cuDNN

(base) li@li-System-Product-Name:/$ cd ~/Downloads
解压:tar -xzvf cudnn-10.1-linux-x64-v7.tgz
(base) li@li-System-Product-Name:~/Downloads$ ls
Anaconda3-2019.03-Linux-x86_64.sh  cuda_10.1.105_418.39_linux.run      flash_player_ppapi_linux.x86_64.tar.gz
**cuda**  cudnn-10.1-linux-x64-v7.5.0.56.deb  NVIDIA-Linux-x86_64-418.56.run
//解压后的文件名字为cuda

3.3 拷贝文件

(base) li@li-System-Product-Name:~/Downloads$ sudo cp cuda/include/cudnn.h /usr/local/cuda-10.1/include
(base) li@li-System-Product-Name:~/Downloads$ sudo chmod a+r /usr/local/cuda-10.1/include/cudnn.h /usr/local/cuda-10.1/lib64/libcudnn*
(base) li@li-System-Product-Name:~/Downloads$ cd /usr/local/cuda-10.1/lib64/
激活:`$ sudo chmod a+r /usr/local/cuda10.1/include/cudnn.h /usr/local/cuda10.1/lib64/libcudnn*` (注意是copy到cuda还是到cuda10.1)
(base) li@li-System-Product-Name:/usr/local/cuda-10.1/lib64$ ls

在这里插入图片描述
3.4 链接文件 (此步非必需,建议跳过)

(注:我不知道这一步是不是必须步骤,我刚开始看的中文教程里面有这一步,我就 操作了,但是官方教程里面没有。)
(ps:今天,我又重新安装了一边验证,证明,按照上面官网给出的步骤走,就可以安装完毕,根本用不着这一步)

cd /usr/local/cuda-10.1/lib64/  
sudo ln -sf libcudnn.so.7.5.0 libcudnn.so.7  
sudo ln -sf libcudnn.so.7 libcudnn.so  
让连接生效:
sudo ldconfig

3.5检测 (同官网给出的方法)
一个坑 (我第二次安装时,此坑没出现)

(base) li@li-System-Product-Name:/usr/local/cuda-10.1/lib64$ cp -r /usr/src/cudnn_samples_v7/ $HOME
cp: cannot stat '/usr/src/cudnn_samples_v7/': No such file or directory
(base) li@li-System-Product-Name:/usr/local/cuda-10.1/lib64$ cp -r /usr/src/cudnn_samples_v7 $HOME
cp: cannot stat '/usr/src/cudnn_samples_v7': No such file or directory

在这里插入图片描述


参考资料


https://zhuanlan.zhihu.com/p/35509593

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章