訓練期間nans的常見原因 - Common causes of nans during training

問題:

I've noticed that a frequent occurrence during training is NAN s being introduced.我注意到在訓練期間經常發生的是NAN被引入。

Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up.很多時候它似乎是由內積/全連接或卷積層中的權重引入的。

Is this occurring because the gradient computation is blowing up?這是因爲梯度計算正在爆炸嗎? Or is it because of weight initialization (if so, why does weight initialization have this effect)?還是因爲權重初始化(如果是這樣,爲什麼權重初始化會有這個效果)? Or is it likely caused by the nature of the input data?或者它可能是由輸入數據的性質引起的?

The overarching question here is simply: What is the most common reason for NANs to occurring during training?這裏的首要問題很簡單:訓練期間發生 NAN 的最常見原因是什麼? And secondly, what are some methods for combatting this (and why do they work)?其次,有哪些方法可以解決這個問題(以及它們爲什麼有效)?


解決方案:

參考: https://stackoom.com/en/question/2IV7q
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章