在進行分佈式進行訓練,
INFO:tensorflow:Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
I0408 04:01:41.507015 140706188736256 cross_device_ops.py:427] Reduce to /replica:0/task:0/device:CPU:0 then broadcast to ('/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Create CheckpointSaverHook.
I0408 04:01:44.424420 140706188736256 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::append
Fatal Python error: Aborted
饒了一大圈排查,通過減少gpu數量,可正常運行了