RuntimeError: Address already in use的解決方法

Traceback (most recent call last):
  File "train.py", line 159, in <module>
    train(args=args)
  File "train.py", line 50, in train
    rank = args.local_rank
  File "/home/wby/anaconda3/envs/wby/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group
    store, rank, world_size = next(rendezvous(url))
  File "/home/wby/anaconda3/envs/wby/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 95, in _tcp_rendezvous_handler
    store = TCPStore(result.hostname, result.port, world_size, start_daemon)
RuntimeError: Address already in use

問題在於,TCP的端口被佔用,一種解決方法是,運行程序的同時指定端口,端口號隨意給出:

--master_port 29501

另一種方式,查找佔用的端口號(在程序裏 插入print輸出),然後找到該端口號對應的PID值:netstat -nltp,然後通過kill -9 PID來解除對該端口的佔用

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章