在CentOS7上安裝NVIDIA CUDA 並在docker中使用CUDA

在CentOS7上安裝NVIDIA CUDA 並在docker中使用CUDA

準備工作

準備安裝包

安裝基礎環境

# 檢查顯卡
$ lspci | grep -i vga
04:00.0 VGA compatible controller: NVIDIA Corporation Device 1b00 (rev a1)
# 檢查系統版本,確保系統支持(需要Linux-64bit系統)
$ uname -m && cat /etc/*release
x86_64
CentOS Linux release 7.2.1511 (Core)
# 安裝GCC
$ yum install gcc gcc-c++
# 安裝Kernel Headers Packages
$ yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

安裝顯卡驅動

開始安裝顯卡驅動

$ sh NVIDIA-Linux-x86_64-381.22.run
  • 開始安裝
  • Accept
    這裏寫圖片描述
  • Building kernerl modules 進度條
    這裏寫圖片描述
  • 32bit兼容包選擇, 這裏要注意選擇NO,不然後面就會出錯。
    這裏寫圖片描述
  • X-configurtion的選擇頁面YES
    這裏寫圖片描述
  • 後面的都選擇默認即可

安裝CUDA

開始安裝CUDA

$ sh cuda_8.0.61_375.26_linux.run
# accept
-------------------------------------------------------------
  Do you accept the previously read EULA?
accept/decline/quit: accept
# no
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?
(y)es/(n)o/(q)uit: n
-------------------------------------------------------------
# 後面的就都選yes或者default
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: 
Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: y
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-8.0 ]: 

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /root ]: 

Installing the NVIDIA display driver...
The driver installation has failed due to an unknown error. Please consult the driver installation log located at /var/log/nvidia-installer.log.

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-8.0
Samples:  Installed in /root, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-8.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_192.log

驗證安裝結果

# 添加環境變量
# 在 ~/.bashrc的最後面添加下面兩行
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
# 使生效
$ source ~/.bashrc
# 驗證安裝結果
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.22                 Driver Version: 381.22                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Graphics Device     Off  | 0000:02:00.0     Off |                  N/A |
| 21%   50C    P8    33W / 265W |      8MiB / 11172MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

安裝cuDNN

$ tar -xvzf cudnn-8.0-linux-x64-v6.0.tgz
$ cp -P cuda/include/cudnn.h /usr/local/cuda-8.0/include
$ cp -P cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64
$ chmod a+r /usr/local/cuda-8.0/include/cudnn.h /usr/local/cuda-8.0/lib64/libcudnn*

TensorFlow GPU測試

下載安裝GPU版本的TensorFlow,運行以下代碼即可測試,無報錯說明cuda安裝成功

import tensorflow as tf

sess = tf.Session()
sess.run()

在docker中使用cuda

下面介紹瞭如何在docker中使用cuda,主要使用了nvidia-docker

安裝docker

$ yum install docker
# 啓動 Docker 服務,並將其設置爲開機啓動
$ systemctl start docker.service 
$ systemctl enable docker.service 

安裝nvidia-docker

# Install nvidia-docker and nvidia-docker-plugin
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker-1.0.1-1.x86_64.rpm
sudo rpm -i /tmp/nvidia-docker*.rpm && rm /tmp/nvidia-docker*.rpm
sudo systemctl start nvidia-docker

編寫Dockerfile,添加Label標籤

# 創建文件夾
$ mkdir dockerBuild
$ cd dockerBuild/
# 新建Dockerfile文件
$ vim Dockerfile

# 輸入以下內容

# FROM 原DOCKER_IMAGE
FROM {SOURCE_DOCKER_IMAGE_NAME}
# MAINTAINER 作者
MAINTAINER {your name}
# needed cuda
LABEL com.nvidia.volumes.needed="nvidia_driver"
# cuda version
LABEL com.nvidia.cuda.version="8.0"

# docker build 注意最後有個點 指的是使用當前目錄下的Dockerfile進行build
$ docker build -t DOCKER_IMAGE_NAME .
# 運行結果 我使用的bamos/openface來編譯的
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM {DOCKER_IMAGE_NAME}
 ---> 62d1673065e8
Step 2 : MAINTAINER {your name}
 ---> Running in b4dc9a88db63
 ---> 7a77f65d0908
Removing intermediate container b4dc9a88db63
Step 3 : LABEL com.nvidia.volumes.needed "nvidia_driver"
 ---> Running in 126095cdc342
 ---> c9035ebe54f8
Removing intermediate container 126095cdc342
Step 4 : LABEL com.nvidia.cuda.version "8.0"
 ---> Running in 12a0e5298d1e
 ---> 682db15bd6ca
Removing intermediate container 12a0e5298d1e
Successfully built 682db15bd6ca

# 查看Label是否成功添加
$ docker inspect DOCKER_IMAGE_NAME 
......
"Labels": {
    "com.nvidia.cuda.version": "8.0",
    "com.nvidia.volumes.needed": "nvidia_driver",
}
......

# 使用nvidia-docker run 來運行編譯好的image
$ nvidia-docker run -it -d DOCKER_IMAGE_NAME /bin/bash 

給容器安裝cuda, cuDNN

前面提到通過nvidia-docker run 可以將顯卡設備和顯卡驅動加載到container裏,但這個時候只是能夠使用顯卡而已,cuda, cuDNN 還沒有安裝

# 進入容器
$ docker attach CONTAINER_NAME/CONTAINER_ID
# 開始執行本文的 cuda, cuDNN安裝步驟即可!
發佈了34 篇原創文章 · 獲贊 29 · 訪問量 12萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章