解決兩個GPU同時訓練問題。
使用場景
我有兩個GPU卡。我希望我兩個GPU能並行運行兩個網絡模型。
代碼
錯誤代碼1:
#對於0號GPU
os.environ['CUDA_VISIBLE_DEVICES']='0,1'
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#對於1號GPU
os.environ['CUDA_VISIBLE_DEVICES']='0,1'
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
0號GPU不報錯,1號GPU報錯。錯誤如下
RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 0 does not equal 1 (while checking arguments for cudnn_convolution)
錯誤代碼2:
#對於0號GPU
os.environ['CUDA_VISIBLE_DEVICES']='0'
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#對於1號GPU
os.environ['CUDA_VISIBLE_DEVICES']='1'
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
0號GPU不報錯,1號GPU報錯。錯誤如下
CUDA: invalid device ordinal
正確代碼如下:
#對於0號GPU
os.environ['CUDA_VISIBLE_DEVICES']='0'
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#對於1號GPU
os.environ['CUDA_VISIBLE_DEVICES']='1'
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")