Ubuntu16.04 安裝TensorFlow-GPU

系統：ubuntu 16.04.5 desktop （server版第1和2部應該不用操作）
顯卡：NVIDIA GeForce GTX 1080 Ti
官方文檔：https://www.tensorflow.org/install/install_linux

1 修改Ubuntu的默認啓動級別爲3

1.1 查看系統目前運行級別

user@ubuntu:~$ runlevel 
N 5

1.2 修改運行級別爲3

編輯/etc/default/grub文件：

user@ubuntu:~$ sudo vi /etc/default/grub
    將GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"的一行註釋掉：
    # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

    將GRUB_CMDLINE_LINUX=""的一行修改爲：
    GRUB_CMDLINE_LINUX="text"

    去掉#GRUB_TERMINAL=console一行的註釋，即修改爲：
    GRUB_TERMINAL=console

user@ubuntu:~$ sudo update-grub

user@ubuntu:~$ sudo systemctl set-default multi-user.target

重啓系統：

user@ubuntu:~$ reboot

1.3 驗證

user@ubuntu:~$ runlevel 
N 3
user@ubuntu:~$

1.4 命令行模式和圖形界面模式的切換

命令行 --> 圖形界面：

現在如果想進入圖形用戶界面（僅進入一次，重啓系統後仍然會進入命令行模式），可執行如下命令：

user@ubuntu:~$ sudo systemctl start lightdm

如果想設置爲系統啓動後默認進入圖形用戶界面，執行如下命令：

user@ubuntu:~$ sudo systemctl set-default graphical.target

然後執行reboot命令重啓系統即可。
user@ubuntu:~$ sudo reboot

圖形界面 --> 命令行：

設置爲系統啓動後默認進入命令行，執行如下命令：
user@ubuntu:~$ sudo systemctl set-default multi-user.target

然後執行reboot命令重啓系統即可。
user@ubuntu:~$ sudo reboot

2 禁用Ubuntu自帶顯卡驅動（重要）

方法一：

想要用GPU版的MxNet必須用NVIDIA的GPU，如果沒有禁用Ubuntu自帶的顯卡驅動，更新Nvdia的驅動，就會出現如X server is running或者不停的提示你重啓，或者即使你安裝成功了，也沒辦連接驅動等各種問題。

桌面版的Ubuntu，就有一個最簡單的方式。在軟件更新裏，有額外驅動這一選項，系統會自動檢測並匹配NVIDIA的顯卡驅動，只要選中安裝即可。就這麼簡單！

方法二：

刪除Nouveau內核驅動程序（修復Nvidia安裝錯誤）
參考：https://tutorials.technology/tutorials/85-How-to-remove-Nouveau-kernel-driver-Nvidia-install-error.html
介紹
警告本教程可能會破壞您的系統，請確保在執行這些步驟之前備份系統。

如果當前正在使用Nouveau內核驅動程序，則安裝Offial nvidia驅動程序將返回錯誤。我們將解釋如何修復錯誤並安裝官方驅動程序。

ERROR: The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.  Please consult the NVIDIA driver README and
your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.

2.1 清理所有nvidia包

在此步驟中，我們將刪除所有與nvidia相關的包。

user@ubuntu:~$ sudo apt-get remove nvidia* && sudo apt autoremove

如果您收到以下錯誤，則表示您從未安裝過nvidia軟件包並且沒問題：

no matches found: nvidia*

現在安裝一些必需的依賴項：

user@ubuntu:~$ sudo apt-get install dkms build-essential linux-headers-generic

2.2 黑名單nouveau驅動程序

現在阻止並禁用nouveau內核驅動程序：

user@ubuntu:~$ sudo vim /etc/modprobe.d/blacklist.conf
#添加

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

2.3 更新initramfs

鍵入以下命令禁用內核nouveau：

user@ubuntu:~$ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf build the new kernel by:

最後更新並重啓：

user@ubuntu:~$ sudo update-initramfs -u
user@ubuntu:~$ reboot

3 安裝Nvidia cuda_9驅動

3.1 安裝依賴包libGLU.so + libX11.so + libXi.so + libXmu.so

user@ubuntu:~$ sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

3.2 下載Nvidia cuda_9.2驅動

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal

3.3 安裝Nvidia cuda_9.2驅動

user@ubuntu:/data/tools$ sudo sh cuda_9.2.148_396.37_linux.run.37_linux
......
......
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 396.37?
(y)es/(n)o/(q)uit: y

Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: 

Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: 

Install the CUDA 9.2 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-9.2 ]: 

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.2 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /home/user ]: 

Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.2 ...
Installing the CUDA Samples in /home/user ...
Copying samples to /home/user/NVIDIA_CUDA-9.2_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-9.2
Samples:  Installed in /home/user

Please make sure that
 -   PATH includes /usr/local/cuda-9.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.2/lib64, or, add /usr/local/cuda-9.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.2/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_18869.log
user@ubuntu:/data/tools$

3.4 添加環境變量

user@ubuntu:~$ vim ~/.bashrc 
# add cuda
export PATH=${PATH}:/usr/local/cuda-9.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
user@ubuntu:~$ source ~/.bashrc

3.5 顯示顯卡信息

user@ubuntu:/data/tools$ nvidia-smi
Fri Sep 14 15:09:33 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   41C    P5    37W / 300W |      0MiB / 11176MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
user@ubuntu:/data/tools$

4 安裝NVIDIA cuDNN

GPU加速深度學習
安裝cudnn前先要確保cuda和NVIDIA驅動已正確安裝

4.1 下載(需要註冊登錄NVIDIA賬戶)

https://developer.nvidia.com/cudnn
選擇系統以及cuda對應的cudnn版本

4.2 deb安裝cuDNN

user@ubuntu:/data/tools$ ll
總用量 1952872
drwxr-xr-x 3 user user        269 9月  14 13:25 ./
drwxr-xr-x 3 user user         19 9月  14 10:21 ../
-rw-rw-r-- 1 user user 1757268179 9月  14 10:25 cuda_9.2.148_396.37_linux.run.37_linux
-rw-rw-r-- 1 user user  123377766 9月  14 13:25 libcudnn7_7.2.1.38-1+cuda9.2_amd64.deb
-rw-rw-r-- 1 user user  114154210 9月  14 10:22 libcudnn7-dev_7.2.1.38-1+cuda9.2_amd64.deb
-rw-rw-r-- 1 user user    4914818 9月  14 13:25 libcudnn7-doc_7.2.1.38-1+cuda9.2_amd64.deb

user@ubuntu:/data/tools$ sudo dpkg -i libcudnn7_7.2.1.38-1+cuda9.2_amd64.deb
正在選中未選擇的軟件包 libcudnn7。
(正在讀取數據庫 ... 系統當前共安裝有 249019 個文件和目錄。)
正準備解包 libcudnn7_7.2.1.38-1+cuda9.2_amd64.deb  ...
正在解包 libcudnn7 (7.2.1.38-1+cuda9.2) ...
正在設置 libcudnn7 (7.2.1.38-1+cuda9.2) ...
正在處理用於 libc-bin (2.23-0ubuntu10) 的觸發器 ...

user@ubuntu:/data/tools$ sudo dpkg -i libcudnn7-dev_7.2.1.38-1+cuda9.2_amd64.deb
(正在讀取數據庫 ... 系統當前共安裝有 249025 個文件和目錄。)
正準備解包 libcudnn7-dev_7.2.1.38-1+cuda9.2_amd64.deb  ...
正在將 libcudnn7-dev (7.2.1.38-1+cuda9.2) 解包到 (7.2.1.38-1+cuda9.2) 上 ...
正在設置 libcudnn7-dev (7.2.1.38-1+cuda9.2) ...
update-alternatives: 使用 /usr/include/x86_64-linux-gnu/cudnn_v7.h 來在自動模式中提供 /usr/include/cudnn.h (libcudnn)

user@ubuntu:/data/tools$ sudo dpkg -i libcudnn7-doc_7.2.1.38-1+cuda9.2_amd64.deb
正在選中未選擇的軟件包 libcudnn7-doc。
(正在讀取數據庫 ... 系統當前共安裝有 249025 個文件和目錄。)
正準備解包 libcudnn7-doc_7.2.1.38-1+cuda9.2_amd64.deb  ...
正在解包 libcudnn7-doc (7.2.1.38-1+cuda9.2) ...
正在設置 libcudnn7-doc (7.2.1.38-1+cuda9.2) ...
user@ubuntu:/data/tools$

4.3 驗證cudnn是否成功

user@ubuntu:/data/tools$ cp -r /usr/src/cudnn_samples_v7 $HOME
user@ubuntu:/data/tools$ cd $HOME/cudnn_samples_v7/mnistCUDNN
user@ubuntu:~/cudnn_samples_v7/mnistCUDNN$ make clean && make
rm -rf *o
rm -rf mnistCUDNN
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o  -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm

user@ubuntu:~/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7201 , CUDNN_VERSION from cudnn.h : 7201 (7.2.1)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 28  Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 11176, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.110592 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.110592 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.147328 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.327680 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.494592 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9×××88 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.104448 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.113632 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.151552 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.323584 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.495616 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

成功安裝，會提示“Test passed!”信息

5 [可選] 安裝 NVIDIA TensorRT 3.0

爲了優化推理效果，您還可以安裝 NVIDIA TensorRT 3.0。搭配預編譯的 tensorflow-gpu 軟件包使用所需的最小 TensorRT 運行時組件集合可按以下方法安裝：

5.1 下載

user@ubuntu:~/cudnn_samples_v7/mnistCUDNN$ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb

5.2 安裝

user@ubuntu:~/cudnn_samples_v7/mnistCUDNN$ sudo dpkg -i nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb
正在選中未選擇的軟件包 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2。
(正在讀取數據庫 ... 系統當前共安裝有 249144 個文件和目錄。)
正準備解包 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb  ...
正在解包 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2 (1-1) ...
正在設置 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2 (1-1) ...

The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/nvinfer-runtime-trt-repo-4.0.1-ga-cuda9.2/7fa2af80.pub

user@ubuntu:/data/tools$ sudo dpkg -i nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb
(正在讀取數據庫 ... 系統當前共安裝有 249154 個文件和目錄。)
正準備解包 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2_1-1_amd64.deb  ...
正在將 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2 (1-1) 解包到 (1-1) 上 ...
正在設置 nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.2 (1-1) ...

user@ubuntu:/data/tools$ sudo apt-get update

6 選擇TensorFlow的安裝方式

tensorflow-gpu
＃要激活此環境，請使用：
＃> source activate tensorflow
或
＃> source activate tensorflow-gpu
＃
＃要停用活動環境，請使用：
＃> source deactivate

參考

https://blog.csdn.net/Jonms/article/details/79318566