[Ubuntu]深度學習環境安裝NVIDIA-1080+CUDA9.0+cuDnn+Tensorflow-gpu-1.6.0+conda

1、安裝Miniconda

wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda-1.6.0-Linux-x86_64.sh

bash Miniconda-1.6.0-Linux-x86_64.sh

記得換源,提升包下載安裝速度:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

 

2、安裝Nvidia驅動

sudo apt-get remove –purge nvidia*
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-390

nvidia-smi

[如果安裝過程中,下載nvidia-390的deb包總是斷掉的話,可以複製鏈接手動下載,然後安裝依賴]

Get:1 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu xenial/main amd64 nvidia-390 amd64 390.77-0ubuntu0~gpu16.04.1 [74.1 MB]
Err:1 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu xenial/main amd64 nvidia-390 amd64 390.77-0ubuntu0~gpu16.04.1
  Hash Sum mismatch
Fetched 18.3 MB in 22s (819 kB/s)                                              
E: Failed to fetch http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu/pool/main/n/nvidia-graphics-drivers-390/nvidia-390_390.77-0ubuntu0~gpu16.04.1_amd64.deb  Hash Sum mismatch

#下載
wget http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu/pool/main/n/nvidia-graphics-drivers-390/nvidia-390_390.77-0ubuntu0~gpu16.04.1_amd64.deb
#安裝
sudo dpkg -i nvidia-graphics-drivers-390/nvidia-390_390.77-0ubuntu0~gpu16.04.1_amd64.deb
#安裝依賴
sudo apt install -y

 

--------

重啓電腦,查看驅動是否安裝好

1>如果登錄ubuntu出現循環登錄的現象,那驅動肯定是沒裝好了,alt+ctrl+F1進入命令行,卸載掉驅動,重新安裝吧

sudo apt-get remove –purge nvidia*

2>如果 nvidia-smi出現

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

出現上面,任何一種情況都是顯卡驅動沒有安裝好,而出現這種情況的原因最大可能是ubuntu的內核版本太低了,嘗試升級內核版本再重試,筆者遇到還幾次都是升級內核後解決的!

3>升級內核版本

在這個網址找到你想升級的內核,大於4.10即可  http://kernel.ubuntu.com/~kernel-ppa/mainline/

#查看內核版本
unama -sr
#筆者升級前版本 Linux 4.4.0-131-generic
#在這個網址找到你想升級的內核,大於4.10即可http://kernel.ubuntu.com/~kernel-ppa/mainline/

wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12.9/linux-headers-4.12.9-041209_4.12.9-041209.201708242344_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12.9/linux-headers-4.12.9-041209-generic_4.12.9-041209.201708242344_amd64.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12.9/linux-image-4.12.9-041209-generic_4.12.9-041209.201708242344_amd64.deb
sudo dpkg -i *.deb

sudo reboot
uname -sr

#筆者升級後版本 Linux 4.12.9-041209-generic

4>再裝一次顯卡驅動

sudo apt-get install nvidia-390

nvidia-smi

再搞不定留言,可以筆者可以幫忙一起看看

3、安裝CUDA9.0

打開根據系統選擇:https://developer.nvidia.com/cuda-90-download-archive

Ubuntu版本的下載地址:https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run

wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run

sudo bash cuda_9.0.176_384.81_linux-run

安裝過程需要輸入一些確認選項,過程如下:

Description

The NVIDIA CUDA Toolkit provides command-line and graphical
tools for building, debugging and optimizing the performance
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-9.0 ]: 

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
 [ default is /home/cqc ]: 

Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...

 

沒報錯就證明安裝好了,然後配置環境變量:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
source ~/.bashrc

4、安裝cuDNN 7.1

下載鏈接:https://developer.nvidia.com/rdp/cudnn-download,需要註冊之後才能打開,不過我把下載鏈接複製出來了,如果你也用9.0的cuda就直接下載吧

linux_64 下載鏈接:https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.1.4/prod/9.0_20180516/cudnn-9.0-linux-x64-v7.1

wget https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.1.4/prod/9.0_20180516/cudnn-9.0-linux-x64-v7.1
tar -zvxf cudnn-9.0-linux-x64-v7.tgz -C /tmp
cd /tmp
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
rm -rf cuda/

5、安裝Tensorflow-gpu

source activate base
conda install tensorflow-gpu==1.6.0
#如果conda install 安裝總是斷或有問題,嘗試換源或者使用pip裝

conda install pip
python3 -m pip install tensorflow-gpu==1.6.0

安裝過程包下載可能經常斷記得設置國內源,這裏使用清華的源:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
conda install tensorflow-gpu==1.6.0

 驗證tensorflow是否使用了GPU版本:

from tensorflow.python.client import device_lib as _device_lib
_device_lib.list_local_devices()

 如果出現的打印中包含GPU等信息,說明安裝成功啦,恭喜可以愉快的使用GPU進行計算啦~~~

python3

Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> from tensorflow.python.client import device_lib as _device_lib
>>> local_device_protos = _device_lib.list_local_devices()

2018-07-30 16:38:08.863925: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-30 16:38:09.086530: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-30 16:38:09.086927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.455
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 750.94MiB
2018-07-30 16:38:09.086962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2018-07-30 16:38:15.853383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/device:GPU:0 with 471 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
>>> 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章