AttributeError: 'DistributedDataParallel' object has no attribute 'blahblah'

Pytorch DDP would fail when using the parameters directly to calculate the loss.
These are my scripts:

# train.py:
class Model(nn.Module):
    def __init__(self, params):
		...
		self.xnli_proj = nn.Linear(dim, 3)
...
model = Model(params)
output = model.xnli_proj(encoder_output)

And,

python -m torch.distributed.launch --nproc_per_node=8 train.py 

The error I got:

2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002: AttributeError: 'DistributedDataParallel' object has no attribute 'xnli_proj'
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002: Traceback (most recent call last):
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:   File "./UVL/train_nlp.py", line 440, in <module>
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:     main(params)
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:   File "./UVL/train_nlp.py", line 401, in main
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:     trainer.xnli_step(lang1, lang2, params.lambda_xnli)
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:   File "/disk/qiaolin/UVL/src/xnli.py", line 902, in xnli_step
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:     output = self.model.xnli_proj(encoder_output)
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:   File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 585, in __getattr__
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:     type(self).__name__, name))
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002: AttributeError: 'DistributedDataParallel' object has no attribute 'xnli_proj'

And, this is the solution:
https://www.gitmemory.com/issue/NVIDIA/apex/436/529107546

In short, to use DDP, put all parameters in explicit feedforward functions, to let it know they should be collected.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章