ubuntu18+2080ti+cuda10+cudnn7+tf2.0

操作系統

ubuntu18

系統環境準備

#Disable the Nouveau Drivers
sudo vim /etc/modprobe.d/blacklist-nouveau.conf
#Add the following two lines to the file
blacklist nouveau
options nouveau modeset=0
#Regenerate the kernel initramfs
sudo update-initramfs -u
reboot
#sudo apt-get install gcc
sudo apt-get install build-essential

https://developer.nvidia.com/rdp/cudnn-download

安裝driver,cuda,cudnn

嘗試參考:https://www.jianshu.com/p/d90f5f876de0
安裝cuda時同時會自動安裝顯卡驅動

#安裝nvidia驅動
#Link: https://www.nvidia.com/Download/index.aspx

chmod +x NVIDIA-Linux-x86_64-410.78.run
sudo init 3
sudo ./NVIDIA-Linux-x86_64-410.78.run
reboot
nvidia-smi
nvidia-settings


#If there are some troubles, remove the driver, and run the installer:
sudo apt-get remove --purge nvidia*
sudo ./NVIDIA-Linux-x86_64-410.78.run --no-opengl-files




#安裝cuda10
#Cuda link: https://developer.nvidia.com/cuda-downloads
#注意選項driver爲no,因爲已經先裝了driver

sudo chmod +x cuda_10.0.130_410.48_linux.run cuda_10.0.130.1_linux.run
sudo sh cuda_10.0.130_410.48_linux.run

-----------------
Do you accept the previously read EULA?
accept/decline/quit:    accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?
(y)es/(n)o/(q)uit: no

Install the CUDA 10.0 Toolkit?
(y)es/(n)o/(q)uit: yes

Enter Toolkit Location
 [ default is /usr/local/cuda-10.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: yes

Install the CUDA 10.0 Samples?
(y)es/(n)o/(q)uit: yes

Enter CUDA Samples Location
 [ default is /home/tt01 ]:

Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so

Installing the CUDA Samples in /home/tt01 ...
Copying samples to /home/tt01/NVIDIA_CUDA-10.0_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-10.0
Samples:  Installed in /home/tt01, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-10.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_1987.log
tt01@tt01-virtual-machine:~/nfs/tools/cuda10.0$

sudo sh cuda_10.0.130.1_linux.run

sudo vim ~/.bashrc
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
source ~/.bashrc

nvcc -V  #檢測cuda安裝是否正常,顯示如下內容即可
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


#安裝cudnn
#cuDNN link: https://developer.nvidia.com/rdp/cudnn-download
sudo dpkg -i libcudnn7*

參考:https://github.com/Tsai-Hyun-Joong/Ubuntu-18.04-RTX-2080-Ti-Driver-cuda-10.0-cudnn-7.0-tensorflow-gpu-Anaconda-Tutorial

問題:偶爾重啓系統會丟失顯卡驅動

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

#解決方式:重裝顯卡驅動
sudo init 3
sudo apt-get remove --purge nvidia*
sudo ./NVIDIA-Linux-x86_64-410.78.run --no-opengl-files

anaconda

#Install Anaconda,推薦使用minconda
./Anaconda3-5.3.1-Linux-x86_64.sh
#yes
#回車 
#回車
source ~/.bashrc
conda config --set auto_activate_base false

tensorflow2.0安裝

https://github.com/tensorflow/tensorflow

conda update conda
conda create -n tf python=3.6
conda activate tf

#安裝numpy
pip install --upgrade pip
pip install -U numpy==1.17.2
conda install pillow
pip install  tensorflow_gpu==2.0.0
#pip install  tensorflow_gpu-2.0.0-cp36-cp36m-manylinux2010_x86_64.whl

#
ERROR: tensorboard 2.0.0 has requirement setuptools>=41.0.0, but you'll have setuptools 36.4.0 which is incompatible.
#重裝setuptools
pip install --ignore-installed setuptools==41.0.0   
#重裝tensorflow_gpu
pip install --ignore-installed tensorflow_gpu-2.0.0-cp36-cp36m-manylinux2010_x86_64.whl


#解決libcupti問題
find /usr/local/cuda/ -name libcupti.so.10.0
#顯示如下
/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.10.0
#然後修改~/.bashrc
sudo vim ~/.bashrc
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}


export CUDA_HOME=/usr/local/cuda
source ~/.bashrc

參考:tensorflow 常見問題(不定期更新)

https://github.com/tensorflow/tensorflow

python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章