文字版

Ubuntu和Centos安装Tensorflow教程&TyTorch

Ubuntu安装TensorFlow

安装Ubuntu 16.04 LTS

下载地址：http://releases.ubuntu.com/16.04/

Ubuntu
ISO下载完成之后利用ultraiso写入U盘，制作系统启动盘，将系统设置为U盘启动，机器重启后就开始安装系统（DELL台式机，F12可以设置U盘启动）。制作启动U盘时有两点需要特别注意：

U盘接口最好需要是2.0，对于3.0的接口有的电脑主板不一定原生支持
对于新安装硬盘（例如新买电脑），尤其是SSD，安装前一定要全盘格式化一遍，不然安装过程中会出现无法分区，显示该盘的空间也会异常

安装CUDA toolkit 9.0和cuDNN 7.1

参考：

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

https://docs.nvidia.com/cuda/archive/9.0/cuda-installation-guide-linux/index.html(推荐)

安装CUDA Toolkit 9.0

通过下面命令产看机器的GPU硬件型号，然后在http://developer.nvidia.com/cuda-gpus这里查看机器的GPU是否支持CUDA。

$ lspci | grep -i nvidia

通过下面命令查看gcc版本，确保版本是4.xx

$ gcc –version

如果当前gcc是5.xx最好将版本降至4.xx，命令如下：
1. 下载gcc/g++ 4.9.x

$ sudo apt-get install -y gcc-4.9

$ sudo apt-get install -y g++-4.9

链接gcc/g++实现降级

$ cd /usr/bin

$ sudo rm gcc

$ sudo ln -s gcc-4.9 gcc

$ sudo rm g++

$ sudo ln -s g++-4.9 g++

CUDA Kit 9.0下载地址

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604

下载deb后，通过下面几个步骤安装：

然后将/usr/local/cuda-9.0/bin加入PATH环境变量。对应Ubuntu，该路径是加入/etc/profile：

还需要安装cuda-command-line-tools，安装完之后加入/etc/profile中。

启动进程守护，且本地生成一套CUDA的samples程序：

注意：

安装完CUDA
Toolkit后需要重启账户ssh连接，cuda-install-samples-9.1.sh才能够识别
上图的例子是9.1，但是真正应该是9.0

到这里，测试nvcc应该可以证明有效。

安装cuDNN 7.1

如果需要手动编译Tensorflow，就需要安装cuDNN，对应的版本选择了7.1，下载地址：

https://developer.nvidia.com/rdp/cudnn-download

这里下载需要在官网上注册一个帐号（注册不一定那么顺利），然后从中选择正确版本安装：

解压压缩包：

tar -xvf cudnn-9.0-linux-x64-v7.1.tgz

然后将头文件和库文件拷贝至相关路径：

sudo cp cuda/include/cudnn.h /usr/local/cuda-9.0/include/

sudo cp cuda/lib64/libcudnn* /usr/local/cuda-9.0/lib64/

sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h

sudo chmod a+r /usr/local/cuda-9.0/lib64/libcudnn*

安装Anaconda 3 x64

为什么要安装Anaconda？因为可以控制多个版本的Python。

下载地址：https://www.anaconda.com/download/#linux

Anaconda通常安装在Ubuntu的公共目录：/usr/local/data/anaconda3/，然后将/usr/local/data/anaconda3/bin/加入/etc/profile中：

# cuda export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/extras/CUPTI/lib64 # anaconda PATH=/usr/local/data/anaconda3/bin:$PATH # bazel PATH=/usr/local/data/bazel-0.13.0/lib/bazel/bin:$PATH export PATH

为了提高cuda安装包下载速度，增加Anaconda在中国的镜像：

sudo conda config –add channels
https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

sudo conda config –add channels
https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/

sudo conda config –set show_channel_urls yes

安装Google的构建工具Bazel

参考：https://docs.bazel.build/versions/master/install-ubuntu.html#install-with-installer-ubuntu

说明：

Step 1：其中的python是安装在Anacond3中的3.6

Step 2-3：实际直接执行sudo bash
bazel-\

编译Tensorflow 17.1

参考：https://www.tensorflow.org/install/install_sources#ConfigureInstallation

为安装进行配置

$ cd tensorflow # cd to the top-level directory created

$ ./configure

Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7 Found possible Python library paths: /usr/local/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages Please input the desired Python library path to use. Default is [/usr/lib/python2.7/dist-packages] Using python library path: /usr/local/lib/python2.7/dist-packages Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]: Do you wish to use jemalloc as the malloc implementation? [Y/n] jemalloc enabled Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] No Google Cloud Platform support will be enabled for TensorFlow Do you wish to build TensorFlow with Hadoop File System support? [y/N] No Hadoop File System support will be enabled for TensorFlow Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] No XLA support will be enabled for TensorFlow Do you wish to build TensorFlow with VERBS support? [y/N] No VERBS support will be enabled for TensorFlow Do you wish to build TensorFlow with OpenCL support? [y/N] No OpenCL support will be enabled for TensorFlow Do you wish to build TensorFlow with CUDA support? [y/N] Y CUDA support will be enabled for TensorFlow Do you want to use clang as CUDA compiler? [y/N] nvcc will be used as CUDA compiler Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.0 Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: “3.5,5.2”]: 3.0 Do you wish to build TensorFlow with MPI support? [y/N] MPI support will not be enabled for TensorFlow Configuration finished

构建和安装Tensorflow的pip包（whl）

http://cwiki.apachecn.org/pages/viewpage.action?pageId=10029599

【注意】

bazel
build –config=opt //tensorflow/tools/pip_package:build_pip_package –verbose-failures

编译是一个漫长的过程，中间没有报错就表示编译成功。

【注意】

安装时，最好使用root账户
同时，有可能需要更新pip，将其更新至最新版本，更新后查看pip版本，注意是否是更新至anaconda3的python目录

测试安装的Tensorflow

安装Keras

sudo su root 切换至root账户
pip install kereas

附：从源代码安装 TensorFlow【官网】

CentOS 6.9安装Tensorflow

yum安装gcc 4.9.2 （devtoolset）

https://blog.csdn.net/ysx_cpp/article/details/77187453

【注意】上图中最后一句话：将其添加至环境变量中

安装CUDA Toolkit9.0

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=6&target_type=rpmlocal

sudo rpm -i cuda-repo-rhel6-9-0-local-9.0.176-1.x86_64.rpm sudo yum clean all sudo yum install cuda

Install kernel-headers rpm安装包

直接安装

$ sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

下载rpm安装

kernel-headers下载地址：https://rpmfind.net/linux/rpm2html/search.php?query=kernel-headers

$ sudo rpm -ivh kernel-headers-2.6.32-696.16.1.el6.x86_64.rpm

测试安装nvidia驱动

/usr/local/cuda-9.1/extras/demo_suite/deviveQuery
nvidia-smi
编译cuda例子，执行查看cuda参数

安装好之后可能需要重启机器

安装Anaconda3

默认自动添加至/root/.bashrc

为了保证其他用可用，通常将其配置在/etc/profile中

# gcc 4.9 devtoolset source /opt/rh/devtoolset-3/enable # cuda export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} # anaconda export PATH=/usr/local/data/anaconda3/bin:$PATH # bazel export PATH=/usr/local/data/bazel-0.13.0/lib/bazel/bin:$PATH # JAVA export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-8.b10.el6_9.x86_64/ export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

源码安装bazel

https://docs.bazel.build/versions/master/install-compile-source.html

安装java

卸载原来java

[root\@192 \~]# yum list installed |grep java

[root\@192 \~]# yum -y remove java-1.8.0-openjdk*
*表时卸载所有openjdk相关文件输入

[root\@192 \~]# yum -y remove tzdata-java.noarch 卸载tzdata-java

安装新java

[root\@192 \~]# yum -y list java*

[root\@192 \~]# yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel 安装JDK,如果没有java-1.8.0-openjdk-devel就没有javac命令

[root\@192 \~]# java -version 查看Java版本信息

添加java环境

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.171-8.b10.el6_9.x86_64/

export JRE_HOME=$JAVA_HOME/jre

export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

添加至/etc/profile

查看glibc的版本

strings /lib64/libc.so.6 | grep GLIBC

ldd –version

GPU1 Glibc 2.17 没有发现错误

Tensorflow编译错误

错误1：patch

build tensorflow源码出现patch command not found，则需要安装sudo yum install
patch

错误2：ld链接错误

crosstool_wrapper_driver_is_not_gcc failed

/usr/bin/ld: unrecognized option ‘-plugin’

/usr/bin/ld: use the –help option for usage information

collect2: error: ld returned 1 exit status

cd
/home/s915_chenhao/.cache/bazel/_bazel_s915_chenhao/95f58aebcb4f1dd76c69c9695ad4bca2/external/local_config_cuda/crosstool

cd
/home/s915_chenhao/.cache/bazel/_bazel_s915_chenhao/95f58aebcb4f1dd76c69c9695ad4bca2/execroot/org_tensorflow/
external/local_config_cuda/crosstool/clang/bin

错误3：找不到库libcudnn.so

附录

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver ……

我的记录
1. 安装所有kernel，sudo yum install kernel*

或者sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

卸载重装cuda kit，sudo yum remove cuda*

然后Install repository meta-data

$ sudo rpm –install cuda-repo-\

CentOS安装PyTorch

假设已经安装好Nvidia驱动和cuDNN

安装PyTorch

直接使用conda
install可能会出现网络问题，因此可以首先下载PyTorch安装包，然后离线安装。

下载安装包

使用IDM多线程下载速度还挺快，下载地址：

https://conda.anaconda.org/pytorch/linux-64/pytorch-0.4.1-py36_cuda9.0.176_cudnn7.1.2_1.tar.bz2

conda离线安装

下载地址：https://anaconda.org/pytorch/pytorch/files

conda install –offline pytorch-0.4.1-py36_cuda9.0.176_cudnn7.1.2_1.tar.bz2

测试安装

import torch

x = torch.Tensor(5, 3)

x = torch.rand(5, 3)

没有报错表示安装成功。

安装torchvision

https://zllrunning.github.io/2018/04/17/20180417/

pip install visdom

python -m visdom.server

TensorFlow源码学习

pywrap_tensorflow_internal.cc是swig生成的c++代码，pywrap_tensorflow_internal.py是生成的py代码，用于调用pywrap_tensorflow_internal.so模块。

参考

Tensorflow[源码安装时bazel行为解析]

带你深入AI（6）-
详解bazel

Ubuntu和Centos安装Tensorflow教程&PyTorch

文字版