前沿:
本來想是搭建一個本地環境,可是在安裝過程中需要 cuda 10.0 ,而我安裝的是 cuda 10.1 不匹配。所以就尋思着安裝了一個 docker,使用容器化安裝。
1. 安裝 docker
見官網教程
2. 安裝 nvidia-docker
待補充
3. 安裝 tensorflow
查看用戶組中是否含有 docker
li@li-System-Product-Name:~$ groups
li adm cdrom sudo dip plugdev lpadmin sambashare docker
//可以看出最後一項就是docker,此時可以不用sudo,直接使用docker開始
檢測 docker
li@li-System-Product-Name:~$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
// 安裝成功
li@li-System-Product-Name:~$ docker --version
Docker version 18.09.5, build e8ff056
檢測 nvidia-docker
li@li-System-Product-Name:~$ docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi
Unable to find image 'nvidia/cuda:10.1-base' locally
10.1-base: Pulling from nvidia/cuda
898c46f3b1a1: Already exists
63366dfa0a50: Already exists
041d4cd74a92: Already exists
6e1bee0f8701: Already exists
131dbe7c254d: Pull complete
5bca6b05dcd6: Pull complete
0d286a7b6e12: Pull complete
Digest: sha256:6ddf907e77f4b53ac8b0b8ce9fa9cd43ffb6882f1ad0f2d41ca996f154f17c7b
Status: Downloaded newer image for nvidia/cuda:10.1-base
Mon Apr 22 13:21:37 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:65:00.0 On | N/A |
| 31% 30C P8 22W / 260W | 84MiB / 10986MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
li@li-System-Product-Name:~$ docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi
Unable to find image 'nvidia/cuda:10.0-base' locally
10.0-base: Pulling from nvidia/cuda
898c46f3b1a1: Already exists
63366dfa0a50: Already exists
041d4cd74a92: Already exists
6e1bee0f8701: Already exists
112097260ef3: Pull complete
30a67c795176: Pull complete
0d286a7b6e12: Pull complete
Digest: sha256:faac85a7d28e086173915df6456784778c4dacb429ff067def0c4a12671240e8
Status: Downloaded newer image for nvidia/cuda:10.0-base
Mon Apr 22 13:22:09 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:65:00.0 On | N/A |
| 31% 30C P8 22W / 260W | 84MiB / 10986MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
安裝 tensorflow
docker pull tensorflow/tensorflow:2.0.0a0-gpu-py3 // 拉取命令
li@li-System-Product-Name:~$ docker pull tensorflow/tensorflow:2.0.0a0-gpu-py3
2.0.0a0-gpu-py3: Pulling from tensorflow/tensorflow
7b722c1070cd: Pull complete
5fbf74db61f1: Pull complete
ed41cb72e5c9: Pull complete
7ea47a67709e: Pull complete
53d00018d593: Pull complete
d452561571e2: Pull complete
741421562e36: Pull complete
cf5a5f77591f: Pull complete
8e44471d34e9: Pull complete
95409a313744: Pull complete
3ca5dc868f92: Pull complete
a1c783d09ef0: Pull complete
eed91d5a4f29: Pull complete
b36de521e979: Pull complete
Digest: sha256:f43f2ea436eebc7b9fe3c80205e6649f4d1a66cfda8626ba010f8d8dfd7985ab
Status: Downloaded newer image for tensorflow/tensorflow:2.0.0a0-gpu-py3
運行 tensorflow
docker run -it -p 8888:8888 tensorflow/tensorflow:2.0.0a0-gpu-py3 //運行命令
li@li-System-Product-Name:~$ docker run -it -p 8888:8888 tensorflow/tensorflow:2.0.0a0-gpu-py3
________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/
WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.
To avoid this, run the container by specifying your user's userid:
$ docker run -u $(id -u):$(id -g) args...
測試
root@cd51a60a7f4f:/# python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import absolute_import, division, print_function, unicode_literals
>>> !pip install -q tensorflow==2.0.0-alpha0
File "<stdin>", line 1
!pip install -q tensorflow==2.0.0-alpha0 //此處出錯,不知爲何
^
SyntaxError: invalid syntax
>>> import tensorflow as tf
>>>
>>> mnist = tf.keras.datasets.mnist
>>> (x_train, y_train), (x_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 6s 1us/step
>>> x_train, x_test = x_train / 255.0, x_test / 255.0
>>> model = tf.keras.models.Sequential([
... tf.keras.layers.Flatten(input_shape=(28, 28)),
... tf.keras.layers.Dense(128, activation='relu'),
... tf.keras.layers.Dropout(0.2),
... tf.keras.layers.Dense(10, activation='softmax')
... ])
2019-04-22 14:03:51.302251: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-04-22 14:03:51.316205: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:
2019-04-22 14:03:51.316261: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2019-04-22 14:03:51.316319: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2019-04-22 14:03:51.337157: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-04-22 14:03:51.338981: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x4147790 executing computations on platform Host. Devices:
2019-04-22 14:03:51.339038: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): <undefined>, <undefined>
//此處懷疑是使用了 CPU 計算
>>> model.compile(optimizer='adam',
... loss='sparse_categorical_crossentropy',
... metrics=['accuracy'])
>>> model.fit(x_train, y_train, epochs=5)
Epoch 1/5
60000/60000 [==============================] - 7s 109us/sample - loss: 0.2981 - accuracy: 0.9136
Epoch 2/5
60000/60000 [==============================] - 6s 107us/sample - loss: 0.1438 - accuracy: 0.9565
Epoch 3/5
60000/60000 [==============================] - 6s 107us/sample - loss: 0.1094 - accuracy: 0.9674
Epoch 4/5
60000/60000 [==============================] - 6s 107us/sample - loss: 0.0904 - accuracy: 0.9715
Epoch 5/5
60000/60000 [==============================] - 6s 107us/sample - loss: 0.0752 - accuracy: 0.9764
<tensorflow.python.keras.callbacks.History object at 0x7f9ae6231a20>
>>> model.evaluate(x_test, y_test)
10000/10000 [==============================] - 1s 55us/sample - loss: 0.0759 - accuracy: 0.9760
[0.07590396217172965, 0.976]