dropout VS. L2 VS ensemble learning
- Ensemble learning using a different set of hidden units in every iteration (this is the dropout) performs better than when using the same set of hidden units throughout the learning.
Note that even with dropout learning using more hidden units than ensemble learning, overfitting did not occur - L2與dropout的正則化效果相當,在SGD+L2的配置中需要不斷嘗試學習速率α,而dropout沒有對應微調參數。
Selective Dropout
文獻:Barrow E, Eastwood M, Jayne C. Selective Dropout for Deep Neural Networks[M]// Neural Information Processing. Springer International Publishing, 2016.
方法:根據dropout率來決定每層需要dropout的單元數,分別以下面三個值來產生三個神經單元選擇概率,值越大者越
權重變化度:
avgk=1n∑j=1n(|W(i)jk−W(i−1)jk|) ,變化越大則說明該單元還處於積極學習中,則dropout的概率要越低。權重平均值:
avgk=1n∑j=1n(W(i)jk) ,該值越大意味着對應神經元基本學會,則其dropout的概率要越大。輸出方差:
N_Variancek=variance(X(i−1)k) ,該值越大意味着該單元基本穩定,則其dropout的概率要越大。