解決Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

今天在對一個TensorFlow v1版本的代碼進行v2版適配的時候,出現報錯:

2020-04-05 12:06:10.566479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-05 12:06:10.984797: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-05 12:06:10.992356: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D (defined at server_html.py:50) ]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D (defined at server_html.py:50) ]]
	 [[InceptionV3/Predictions/Reshape_1/_17]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D':
  File "server_html.py", line 193, in <module>
    init_graph(model_name=FLAGS.model_name)
  File "server_html.py", line 50, in init_graph
    _ = tf.import_graph_def(graph_def, name='')
  File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
    producer_op_list=producer_op_list)
  File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 513, in _import_graph_def_internal
    _ProcessNewOps(graph)
  File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 243, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3459, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3459, in <listcomp>
    for c_op in c_api_util.new_tf_operations(self)
  File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3347, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1756, in __init__
    self._traceback = tf_stack.extract_stack()

一番搜索後發現解決方案爲在生成Session實例時添加config

import tensorflow.compat.v1 as tf
# 以下兩種congfig都可以
# config = tf.ConfigProto()
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

其中經過嘗試,config = tf.ConfigProto()和config = tf.compat.v1.ConfigProto()都能正常運行
具體原因有待研究

參考:https://github.com/tensorflow/tensorflow/issues/24828#issuecomment-464960819

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章