1. tensorflow已經正常安裝,但import 之後報錯···
import tensorflow as tf
import os
os.environ['TP_CPP_MIN_LOG_LEVEL'] = '2'
h = tf.constant('hello')
print(h)
2. 查看自己電腦GPU的情況
cd C:\Program Files\NVIDIA Corporation\NVSMI
nvidia-msi
3. linux下安裝tensorflow gpu
問題描述:按照 官方指導(https://tensorflow.google.cn/install/gpu#ubuntu_1804_cuda_101),將Ubuntu 18.04 (CUDA 10.1)下面的命令 粘貼複製到 install.sh 並運行(sh +x install.sh), 按照指導來說,命令執行完,應該已經成功安裝上了nvidia驅動,但是運行 nvidia-smi時, 並沒有顯示上圖的內容,而是報錯:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
作爲一個電腦軟硬件的小白(尤其是cpu,gpu,顯卡等一概不瞭解),如何解決
解決過程:
1.查看了解已經安裝的各個模塊的版本
- 查看cuda 版本
cat /usr/local/cuda/version.txt
10.1.243 - 查看cudnn 版本 (查看失敗)
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
no such file - 查看顯卡驅動版本 (查看失敗)
cat /proc/driver/nvidia/version 或輸入nvidia-smi
查看ubuntu內核版本:命令 uname -a(uname -r)
2.有人說可能是linux內核版本太高或者太低的問題:
2.1 解決內核版本過高的問題,參考兩篇博客:
https://blog.csdn.net/sinat_23619409/article/details/85220561
https://blog.csdn.net/qq_41870658/article/details/93330041
Linux ubuntu 5.3.0-28-generic #30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
另一種解決方法,參考:https://blog.csdn.net/Felaim/article/details/100516282(仍然沒有解決,要哭死了)
2.2 使secure boot disable的方法
參考博客:https://blog.csdn.net/smcaa/article/details/86482872
2.3 .查看當前系統推薦你安裝的驅動版本
ubuntu-drivers devices
正常的:
== /sys/devices/pci0000:00/0000:00:1c.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001D10sv000017AAsd0000225Ebc03sc02i00
vendor : NVIDIA Corporation
model : GP108M [GeForce MX150]
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-396 - third-party free
driver : nvidia-driver-390 - third-party free
driver : nvidia-driver-415 - third-party free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
我的(不正常吧):
@ubuntu:/usr/src$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:0f.0 ==
modalias : pci:v000015ADd00000405sv000015ADsd00000405bc03sc00i00
vendor : VMware
model : SVGA II Adapter
driver : open-vm-tools-desktop - distro free
查了一下,open-vm-tools-desktop是什麼都東西:VMware自帶的vmware-tools已經沒效果,官方建議是安裝open-vm-tools-desktop來代替其跟物理機交互。
今日份嘗試已盡(20200304)……
4 服務器安裝tensorflow gpu
4.1 瞭解電腦配置
- Cuda版本
cat
/usr/local/cuda-10.1/version.txt
CUDA Version 10.1.243 - Cudnn版本
cat /usr/local/cuda-10.1/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
- 顯卡驅動版本
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 418.87.00 Thu Aug 8 15:35:46 CDT 2019
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) - 查看內核版本
uname -r
4.4.0-21-generic
4.2 虛擬環境
按照
https://baijiahao.baidu.com/s?id=1642214623524909281&wfr=spider&for=pc
的方法安裝
4.2.1 直接在我的賬戶下創建虛擬環境
查看虛擬環境
conda info --envs
# conda environments:
#
root * /home/heling/anaconda3
創建虛擬環境
conda create -n python_0413
接下來又有問題:
因爲我的服務器上沒有sudo權限,需要切換到suprod下安裝軟件,
我創建虛擬環境並激活,然後suprod之後安裝,軟件是安裝在虛擬環境下面嗎?
答:應該先切換到prod下面 再創建虛擬環境嗎
另外:sudo apt-get install tensorflow-gpu報錯
4.2.2 在prod下面創建虛擬環境
4.2.3 找一個源更新一下(沒解決)
查看目前的源: 位置 /etc/apt/sources.list
cat /etc/apt/sources.list, (都是ifengidc的地址)
沒解決…………………………
4.3 安裝tensorflow(繼續按照簡書的“2.安裝tensorflow”裏的方法)
https://baijiahao.baidu.com/s?id=1642214623524909281&wfr=spider&for=pc
4.3.1 官網下載
直接到“PyPi”網站下載TensorFlow2.0 Alpha版的安裝包。進入網址:https://pypi.org/project/tensorflow/2.0.0a0/#files
下載了tensorflow-2.1.0-cp35-cp35m-manylinux2010_x86_64.whl
4.3.2 安裝
pip版本低的問題
pip install tensorflow-2.1.0-cp35-cp35m-manylinux2010_x86_64.whl,
報錯
解決:升級pip
方法:https://www.cnblogs.com/ElegantSmile/p/10766391.html
- 獲取 get-pip.py.
執行腳本 :wget https://bootstrap.pypa.io/get-pip.py, - 執行 python get-pip.py ,更新
(python_0413_p)
prod@knowledge_graph_162v112_syq:~$ python get-pip.py
Looking in indexes:
http://pip.ifengidc.com/simple, http://pip.ifengidc.com:8080/simple
Collecting pip
Downloading http://pip.ifengidc.com/packages/54/0c/d01aa759fdc501a58f431eb594a17495f15b88da142ce14b5845662c13f3/pip-20.0.2-py2.py3-none-any.whl
(1.4 MB)
|████████████████████████████████| 1.4 MB 84.9 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 8.1.2
Uninstalling pip-8.1.2:
Successfully uninstalled pip-8.1.2
Successfully installed pip-20.0.2
pip install tensorflow
pip install tensorflow-2.1.0-cp35-cp35m-manylinux2010_x86_64.whl
報錯:
需要更新依賴包,自己沒有解決,找了公司運維,
- 升級了python要把yum的文件指向python2 ,不然用不了yum(不知道爲啥)
vim /usr/bin/yum, 把#! /usr/bin/python 改成#! /usr/bin/python2 - pip install --ignore-installed setuptools
重新 pip install tensorflow-2.1.0-cp35-cp35m-manylinux2010_x86_64.whl,成功
導入tensorflow報錯
>>> import tensorflow as tf
2020-04-15 10:23:48.482399: W
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load
dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared
object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-10.1/lib64
2020-04-15 10:23:48.482498: W
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load
dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6:
cannot open shared object file: No such file or directory; LD_LIBRARY_PATH:
:/usr/local/cuda-10.1/lib64
2020-04-15 10:23:48.482508: W
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some
TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please
make sure the missing libraries mentioned above are installed properly.
>>> tf.__version__
'2.1.0'