PyTorc添加網絡圖結構add_graph報錯：RuntimeError: Cannot insert a Tensor that requires grad as a constant.

原創

2020-06-15 02:04

錯誤說明

在使用PyTorch自帶的TensorBoard的add_graph方法將網絡圖結構添加到監測信息中時，遇到如下報錯：

RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient.

這個信息其實有些tricky：

告訴了是add_graph這一步有問題，但是具體哪個變量是帶有不符合要求的屬性不得而知
格式是按照api照貓畫虎的，怎麼會不符合要求呢？

Debug過程

搜遍全網也沒有找到結果，根據一些相關信息（見參考）的蛛絲馬跡，首先懷疑是傳入模型的Tensor的requires_grad設置問題，debug後發現其值爲False(也應該是False，因爲是從原始文件讀入的數據)。
接下來，懷疑是模型output的Tensor的requires_grad的問題，但是沒想到如何簡便地更改add_graph中的模型輸出Tensor的屬性，因爲這一步是在add_graph源碼中的，當然也可以更改源碼中的model(input_to_model)這裏，使其所有輸出的屬性符合要求，但是這有點不elegant。
接下來從參考鏈接2中獲得了一些啓發，想到我使用的是多GPU訓練，會不會是Dataparallel的問題呢？接下來又想到之前在知乎（鏈接3）看到關於讀寫Dataparallel的一些技巧，馬上“靈光一閃”。先嚐試了將add_graph放到模型並行化之前，果然沒有報錯，基本確定是Dataparallel的問題。
然後根據參考鏈接2和3，想到把模型先從並行化容器中取出來，再執行，果然問題完美解決。相關代碼如下

 # setup the summary writer
train_data_sample, label_sample = iter(dataloader_train).next()
writer = SummaryWriter(args.summary_path, flush_secs=120)

with writer:
    writer.add_graph(model.module,train_data_sample.to(device))  # model graph, with input

重點就在model.module，這句話講模型從並行化後的容器中取出來，以未並行化的模型（原始模型）返回，這樣再執行就沒有問題了。

參考

https://github.com/pytorch/pytorch/issues/20101
https://discuss.pytorch.org/t/how-to-reach-model-attributes-wrapped-by-nn-dataparallel/1373/4
https://www.zhihu.com/question/67726969

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

PyTorc添加網絡圖結構add_graph報錯：RuntimeError: Cannot insert a Tensor that requires grad as a constant.

錯誤說明

Debug過程

參考

PyCharm更新package結構，解決unresolved Reference問題

TensorFlow的靜態圖機制和Tensorboard中手動添加監控數據(scalar/simple_value...)的方法

損失函數中的logits

Python使用os.path.join只保留最後一個變量的原因

Ubuntu安裝醫學圖像處理庫openslide-python和pyvips

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結