AttributeError: 'DistributedDataParallel' object has no attribute 'blahblah'

原創

2020-05-12 19:41

Pytorch DDP would fail when using the parameters directly to calculate the loss.
These are my scripts:

# train.py:
class Model(nn.Module):
    def __init__(self, params):
		...
		self.xnli_proj = nn.Linear(dim, 3)
...
model = Model(params)
output = model.xnli_proj(encoder_output)

And,

python -m torch.distributed.launch --nproc_per_node=8 train.py

The error I got:

2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002: AttributeError: 'DistributedDataParallel' object has no attribute 'xnli_proj'
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002: Traceback (most recent call last):
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:   File "./UVL/train_nlp.py", line 440, in <module>
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:     main(params)
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:   File "./UVL/train_nlp.py", line 401, in main
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:     trainer.xnli_step(lang1, lang2, params.lambda_xnli)
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:   File "/disk/qiaolin/UVL/src/xnli.py", line 902, in xnli_step
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:     output = self.model.xnli_proj(encoder_output)
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:   File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 585, in __getattr__
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002:     type(self).__name__, name))
2020-05-11T15:30:28.000Z /container_e2240_1583898264103_81566_01_000002: AttributeError: 'DistributedDataParallel' object has no attribute 'xnli_proj'

And, this is the solution:
https://www.gitmemory.com/issue/NVIDIA/apex/436/529107546

In short, to use DDP, put all parameters in explicit feedforward functions, to let it know they should be collected.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

AttributeError: 'DistributedDataParallel' object has no attribute 'blahblah'

在實驗室服務器毫無阻攔的安裝pip3

How can I force Python's file.write() to use the same newline format in Windows as in Linux (“\r\n”

sometimes my file just freezes in my vi |vim, what happened?

theano中的index好怪異。。比如最大似然估計的損失計算部分。

pycharm遠程調試報錯cant ser remote tunneling

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結