Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED

要瘋了。這幾把玩意要搞死我。issue在此：https://github.com/tensorflow/tensorflow/issues/34214

不要以爲很簡單的事，那是因爲有人替你負重前行，自己搞一下子成功純屬運氣。

錯誤提示如下：

Epoch 1/100
2019-11-12 22:56:58.736778: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-12 22:56:59.069012: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-12 22:56:59.069677: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-12 22:56:59.069729: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
2019-11-12 22:56:59.069743: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-12 22:56:59.069761: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
Traceback (most recent call last):
  File "main_ResNet.py", line 217, in <module>
    shuffle=True)
  File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
    validation_freq=validation_freq)
  File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
    outs = fit_function(ins_batch)
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node conv2d_1/convolution}}]]
	 [[Mean/_417]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node conv2d_1/convolution}}]]
0 successful operations.
0 derived errors ignored.

目前沒人幫我搞這些，運維也不會。這個問題我百度了。也是來源於CSDN博文，誰知道能不能行呢？試試吧

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda

rm -rf .nv/

結果仍舊是同樣的錯誤出現，可見沒用。我也關閉重開了。懷疑人生不？？？

參考官方鏈接重頭搞：https://www.tensorflow.org/install/pip 這幾個版本要符合要求

python3 --version
pip3 --version
virtualenv --version

Requires Python > 3.4 and pip >= 19.0，下面是安裝更新方法

sudo apt update
sudo apt install python3-dev python3-pip
sudo pip3 install -U virtualenv  # system-wide install

如果沒有sudo權限，寡人就沒有，沒辦法。pip可通過升級

python3 -m pip install --upgrade pip

後來發現cudnn沒有在pip list中，經過搜索，發現cuda10.0至少要cudnn7.5.1。安裝後發現沒用。

【甩手掌櫃要不得，搬磚工不能有這種心態】

挨個試試csdn的方法，說代碼有問題。我是不相信的，代碼有個屁的問題。什麼設置GPU按需求分配？？

import tensorflow as tf
import keras.backend as K
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
K.set_session(sess)

事實證明這種同樣沒用。

Train on 15285 samples, validate on 3822 samples
Epoch 1/100
2019-11-13 11:47:34.709111: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-11-13 11:47:35.062741: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 11:47:35.064299: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-13 11:47:35.064397: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
2019-11-13 11:47:35.064428: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-13 11:47:35.064473: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
Traceback (most recent call last):
  File "main_ResNet.py", line 223, in <module>
    shuffle=True)
  File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
    validation_freq=validation_freq)
  File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
    outs = fit_function(ins_batch)
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node conv2d_1/convolution}}]]
	 [[Mean/_417]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node conv2d_1/convolution}}]]
0 successful operations.
0 derived errors ignored.

【說明：直接測試tf.test.is_gpu_available()測試返回True是沒用的，要實際用代碼測試】

其他版本的“按需分配”也試了，沒用。

搞得我一點脾氣都沒有了，提issue吧

同事說版本試試，我試了1.15的tf也是不行

Train on 15285 samples, validate on 3822 samples
Epoch 1/100
2019-11-13 15:07:10.609111: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-11-13 15:07:10.846521: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-13 15:07:10.847414: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-13 15:07:10.847488: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
2019-11-13 15:07:10.847506: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-11-13 15:07:10.847533: E tensorflow/stream_executor/cuda/cuda_dnn.cc:337] Possibly insufficient driver version: 410.48.0
Traceback (most recent call last):
  File "main_ResNet.py", line 229, in <module>
    shuffle=True)
  File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1239, in fit
    validation_freq=validation_freq)
  File "/./anaconda3/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 196, in fit_loop
    outs = fit_function(ins_batch)
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node conv2d_1/convolution}}]]
	 [[Mean/_417]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node conv2d_1/convolution}}]]
0 successful operations.
0 derived errors ignored.

試了下1.12的，結果似乎需要cuda9.0，而我的是10.0的

Traceback (most recent call last):
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/./anaconda3/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/./anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main_ResNet.py", line 10, in <module>
    import tensorflow as tf
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/./anaconda3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/./anaconda3/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/./anaconda3/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

而1.13.1的結果與1.14的相同。

我再開一篇，因爲不只是這個問題。還有

Possibly insufficient driver version: 410.48.0

另外有相關問題可以加入QQ羣討論，不設微信羣

QQ羣：868373192

語音深度學習羣

Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED

認知提升的方法

C#開源的兩款功能強大的錄屏神器

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

前端 Vue yarn.lock文件：詳解和使用指南

多進程之Pool與多線程pool 及tqdm和for 並對比pandas處理結果

Youtube2016推薦召回算法細節及最終實現（離線服務）——完整版

python讀取redis數據及hive入門9——3個表關聯

關於global定義的作用時效問題以及java json/list數據及redis數據解析問題

faiss快速查詢召回數據都一樣咋辦？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結