1,to be done
1,slides 21頁中爲什麼
(我認爲只能爲全正值,因爲
2,使用Xavier的原因
3,當模型太大時,cross validation的必要性
2,Notes
1,神經網絡訓練步驟:
preprocess data :將data normalize到 zero mean.有兩種方法:第一種是減去使用每一個維度的數據的平均值,即subtract the mean image;第二種是減去每一種通道(r,g,b)的平均值,即subtract the per channel mean。
weight Initialization :對於 tanh 採用Xavier initialization(
np.random.randn(fan_in,fan_out)/np.sqrt(fan_in)
)對於relu採用改進版的Xavier initializationnp.random.randn(fan_in,fan_out)/np.sqrt(fan_in/2)
- batch normalization.Usually inserted after Fully Connected or Convolutional layers,
and before nonlinearity. - Hyperparameter 的優化,從coarse到fine .
First stage: only a few epochs to get rough idea of what params work
Second stage: longer running time, finer search (repeat as necessary)