問題:
I've noticed that a frequent occurrence during training is NAN
s being introduced.我注意到在訓練期間經常發生的是NAN
被引入。
Often times it seems to be introduced by weights in inner-product/fully-connected or convolution layers blowing up.很多時候它似乎是由內積/全連接或卷積層中的權重引入的。
Is this occurring because the gradient computation is blowing up?這是因爲梯度計算正在爆炸嗎? Or is it because of weight initialization (if so, why does weight initialization have this effect)?還是因爲權重初始化(如果是這樣,爲什麼權重初始化會有這個效果)? Or is it likely caused by the nature of the input data?或者它可能是由輸入數據的性質引起的?
The overarching question here is simply: What is the most common reason for NANs to occurring during training?這裏的首要問題很簡單:訓練期間發生 NAN 的最常見原因是什麼? And secondly, what are some methods for combatting this (and why do they work)?其次,有哪些方法可以解決這個問題(以及它們爲什麼有效)?