今天在对一个TensorFlow v1版本的代码进行v2版适配的时候,出现报错:
2020-04-05 12:06:10.566479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-05 12:06:10.984797: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-05 12:06:10.992356: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D (defined at server_html.py:50) ]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D (defined at server_html.py:50) ]]
[[InceptionV3/Predictions/Reshape_1/_17]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'InceptionV3/InceptionV3/Conv2d_1a_3x3/Conv2D':
File "server_html.py", line 193, in <module>
init_graph(model_name=FLAGS.model_name)
File "server_html.py", line 50, in init_graph
_ = tf.import_graph_def(graph_def, name='')
File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 513, in _import_graph_def_internal
_ProcessNewOps(graph)
File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 243, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3459, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3459, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3347, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/home/microfat/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1756, in __init__
self._traceback = tf_stack.extract_stack()
一番搜索后发现解决方案为在生成Session实例时添加config
import tensorflow.compat.v1 as tf
# 以下两种congfig都可以
# config = tf.ConfigProto()
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
其中经过尝试,config = tf.ConfigProto()和config = tf.compat.v1.ConfigProto()都能正常运行
具体原因有待研究
参考:https://github.com/tensorflow/tensorflow/issues/24828#issuecomment-464960819