Ubuntu和Centos安裝Tensorflow教程&PyTorch

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

文字版

Ubuntu和Centos安裝Tensorflow教程&TyTorch

Ubuntu安裝TensorFlow

安裝Ubuntu 16.04 LTS

下載地址:http://releases.ubuntu.com/16.04/

Ubuntu
ISO下載完成之後利用ultraiso寫入U盤,製作系統啓動盤,將系統設置爲U盤啓動,機器重啓後就開始安裝系統(DELL臺式機,F12可以設置U盤啓動)。製作啓動U盤時有兩點需要特別注意:

  1. U盤接口最好需要是2.0,對於3.0的接口有的電腦主板不一定原生支持

  2. 對於新安裝硬盤(例如新買電腦),尤其是SSD,安裝前一定要全盤格式化一遍,不然安裝過程中會出現無法分區,顯示該盤的空間也會異常

安裝CUDA toolkit 9.0和cuDNN 7.1

參考:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

https://docs.nvidia.com/cuda/archive/9.0/cuda-installation-guide-linux/index.html(推薦)

安裝CUDA Toolkit 9.0

$ lspci | grep -i nvidia

  • 通過下面命令查看gcc版本,確保版本是4.xx

$ gcc –version

  • 如果當前gcc是5.xx最好將版本降至4.xx,命令如下:

    1. 下載gcc/g++ 4.9.x

$ sudo apt-get install -y gcc-4.9

$ sudo apt-get install -y g++-4.9

  1. 鏈接gcc/g++實現降級

$ cd /usr/bin

$ sudo rm gcc

$ sudo ln -s gcc-4.9 gcc

$ sudo rm g++

$ sudo ln -s g++-4.9 g++

  • CUDA Kit 9.0下載地址

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604

下載deb後,通過下面幾個步驟安裝:

然後將/usr/local/cuda-9.0/bin加入PATH環境變量。對應Ubuntu,該路徑是加入/etc/profile:

還需要安裝cuda-command-line-tools,安裝完之後加入/etc/profile中。

啓動進程守護,且本地生成一套CUDA的samples程序:

注意:

  1. 安裝完CUDA
    Toolkit後需要重啓賬戶ssh連接,cuda-install-samples-9.1.sh才能夠識別

  2. 上圖的例子是9.1,但是真正應該是9.0

到這裏,測試nvcc應該可以證明有效。

安裝cuDNN 7.1

如果需要手動編譯Tensorflow,就需要安裝cuDNN,對應的版本選擇了7.1,下載地址:

https://developer.nvidia.com/rdp/cudnn-download

這裏下載需要在官網上註冊一個帳號(註冊不一定那麼順利),然後從中選擇正確版本安裝:

解壓壓縮包:

tar -xvf cudnn-9.0-linux-x64-v7.1.tgz

然後將頭文件和庫文件拷貝至相關路徑:

sudo cp cuda/include/cudnn.h /usr/local/cuda-9.0/include/

sudo cp cuda/lib64/libcudnn* /usr/local/cuda-9.0/lib64/

sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h

sudo chmod a+r /usr/local/cuda-9.0/lib64/libcudnn*

安裝Anaconda 3 x64

爲什麼要安裝Anaconda?因爲可以控制多個版本的Python。

下載地址:https://www.anaconda.com/download/#linux

Anaconda通常安裝在Ubuntu的公共目錄:/usr/local/data/anaconda3/,然後將/usr/local/data/anaconda3/bin/加入/etc/profile中:

# cuda export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/extras/CUPTI/lib64 # anaconda PATH=/usr/local/data/anaconda3/bin:$PATH # bazel PATH=/usr/local/data/bazel-0.13.0/lib/bazel/bin:$PATH export PATH
  • 爲了提高cuda安裝包下載速度,增加Anaconda在中國的鏡像:

sudo conda config –add channels
https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

sudo conda config –add channels
https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/

sudo conda config –set show_channel_urls yes

安裝Google的構建工具Bazel

參考:https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu

說明:

Step 1:其中的python是安裝在Anacond3中的3.6

Step 2-3:實際直接執行sudo bash
bazel-\

編譯Tensorflow 17.1

參考:https://www.tensorflow.org/install/install_sources#ConfigureInstallation

爲安裝進行配置

$ cd tensorflow # cd to the top-level directory created

$ ./configure

Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7 Found possible Python library paths: /usr/local/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages Please input the desired Python library path to use. Default is [/usr/lib/python2.7/dist-packages] Using python library path: /usr/local/lib/python2.7/dist-packages Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]: Do you wish to use jemalloc as the malloc implementation? [Y/n] jemalloc enabled Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] No Google Cloud Platform support will be enabled for TensorFlow Do you wish to build TensorFlow with Hadoop File System support? [y/N] No Hadoop File System support will be enabled for TensorFlow Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] No XLA support will be enabled for TensorFlow Do you wish to build TensorFlow with VERBS support? [y/N] No VERBS support will be enabled for TensorFlow Do you wish to build TensorFlow with OpenCL support? [y/N] No OpenCL support will be enabled for TensorFlow Do you wish to build TensorFlow with CUDA support? [y/N] Y CUDA support will be enabled for TensorFlow Do you want to use clang as CUDA compiler? [y/N] nvcc will be used as CUDA compiler Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.0 Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: “3.5,5.2”]: 3.0 Do you wish to build TensorFlow with MPI support? [y/N] MPI support will not be enabled for TensorFlow Configuration finished

構建和安裝Tensorflow的pip包(whl)

http://cwiki.apachecn.org/pages/viewpage.action?pageId=10029599

【注意】

bazel
build –config=opt //tensorflow/tools/pip_package:build_pip_package –verbose-failures

編譯是一個漫長的過程,中間沒有報錯就表示編譯成功。

【注意】

  1. 安裝時,最好使用root賬戶

  2. 同時,有可能需要更新pip,將其更新至最新版本,更新後查看pip版本,注意是否是更新至anaconda3的python目錄

測試安裝的Tensorflow

安裝Keras

  1. sudo su root 切換至root賬戶

  2. pip install kereas

附:從源代碼安裝 TensorFlow【官網】

CentOS 6.9安裝Tensorflow

yum安裝gcc 4.9.2 (devtoolset)

https://blog.csdn.net/ysx_cpp/article/details/77187453

【注意】上圖中最後一句話:將其添加至環境變量中

安裝CUDA Toolkit9.0

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=6&target_type=rpmlocal

sudo rpm -i cuda-repo-rhel6-9-0-local-9.0.176-1.x86_64.rpm sudo yum clean all sudo yum install cuda

Install kernel-headers rpm安裝包

  1. 直接安裝

$ sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

  1. 下載rpm安裝

kernel-headers下載地址:https://rpmfind.net/linux/rpm2html/search.php?query=kernel-headers

$ sudo rpm -ivh kernel-headers-2.6.32-696.16.1.el6.x86_64.rpm

測試安裝nvidia驅動

  1. /usr/local/cuda-9.1/extras/demo_suite/deviveQuery

  2. nvidia-smi

  3. 編譯cuda例子,執行查看cuda參數

安裝好之後可能需要重啓機器

安裝Anaconda3

默認自動添加至/root/.bashrc

爲了保證其他用可用,通常將其配置在/etc/profile中

# gcc 4.9 devtoolset source /opt/rh/devtoolset-3/enable # cuda export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} # anaconda export PATH=/usr/local/data/anaconda3/bin:$PATH # bazel export PATH=/usr/local/data/bazel-0.13.0/lib/bazel/bin:$PATH # JAVA export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-8.b10.el6_9.x86_64/ export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

源碼安裝bazel

https://docs.bazel.build/versions/master/install-compile-source.html

安裝java

卸載原來java

[root\@192 \~]# yum list installed |grep java  

[root\@192 \~]# yum -y remove java-1.8.0-openjdk*      
 *表時卸載所有openjdk相關文件輸入  

[root\@192 \~]# yum -y remove tzdata-java.noarch         卸載tzdata-java 

安裝新java

[root\@192 \~]#  yum -y list java*  

[root\@192 \~]# yum  install  java-1.8.0-openjdk   java-1.8.0-openjdk-devel      安裝JDK,如果沒有java-1.8.0-openjdk-devel就沒有javac命令

[root\@192 \~]# java -version                           查看Java版本信息

添加java環境

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-8.b10.el6_9.x86_64/

export JRE_HOME=$JAVA_HOME/jre

export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

添加至/etc/profile

查看glibc的版本

strings /lib64/libc.so.6 | grep GLIBC

ldd –version

GPU1 Glibc 2.17 沒有發現錯誤

Tensorflow編譯錯誤

錯誤1:patch

  • build tensorflow源碼出現patch command not found,則需要安裝sudo yum install
    patch

錯誤2:ld鏈接錯誤

  • crosstool_wrapper_driver_is_not_gcc failed

/usr/bin/ld: unrecognized option ‘-plugin’

/usr/bin/ld: use the –help option for usage information

collect2: error: ld returned 1 exit status

cd
/home/s915_chenhao/.cache/bazel/_bazel_s915_chenhao/95f58aebcb4f1dd76c69c9695ad4bca2/external/local_config_cuda/crosstool

cd
/home/s915_chenhao/.cache/bazel/_bazel_s915_chenhao/95f58aebcb4f1dd76c69c9695ad4bca2/execroot/org_tensorflow/
external/local_config_cuda/crosstool/clang/bin

錯誤3:找不到庫libcudnn.so

附錄

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver ……

  • 我的記錄

    1. 安裝所有kernel,sudo yum install kernel*

或者sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

  1. 卸載重裝cuda kit,sudo yum remove cuda*

然後Install repository meta-data

$ sudo rpm –install cuda-repo-\

CentOS安裝PyTorch

假設已經安裝好Nvidia驅動和cuDNN

安裝PyTorch

直接使用conda
install可能會出現網絡問題,因此可以首先下載PyTorch安裝包,然後離線安裝。

下載安裝包

使用IDM多線程下載速度還挺快,下載地址:

https://conda.anaconda.org/pytorch/linux-64/pytorch-0.4.1-py36_cuda9.0.176_cudnn7.1.2_1.tar.bz2

conda離線安裝

conda install –offline pytorch-0.4.1-py36_cuda9.0.176_cudnn7.1.2_1.tar.bz2

測試安裝

import torch

x = torch.Tensor(5, 3)

x = torch.rand(5, 3)

x

沒有報錯表示安裝成功。

安裝torchvision

https://zllrunning.github.io/2018/04/17/20180417/

pip install visdom

python -m visdom.server

TensorFlow源碼學習

pywrap_tensorflow_internal.cc是swig生成的c++代碼,pywrap_tensorflow_internal.py是生成的py代碼,用於調用pywrap_tensorflow_internal.so模塊。

參考

  1. Tensorflow[源碼安裝時bazel行爲解析]

  1. 帶你深入AI(6)-
    詳解bazel

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章