深度學習的一些知識點總結

原創

随风秀舞

2019-06-11 03:46

目標函數要能防止網絡的輸出始終是一個單一數值，例如0。

當training set和test set數據分佈不一致時，保證validate set和test set數據分佈一致
High bias? 增大加深網絡。Large and deep。
High variance？找更多訓練數據，regularization

增大加深網絡，同時增大訓練集數據量，總是可取的。

Regularization：
- L2 平方和，對矩陣是Frobenius Norm，在神經網絡中也被稱作 Weight decay
- L1 絕對值的和
- Dropout
  - 訓練的時候在每一層按照閾值 $p (0<p<1)$ 隨機忽略一些節點，每一層的輸出 $a$ 最後要除以 $p$ ，即 $a = a/p$ ，以保持輸出的大小。By doing this you are assuring that the result of the cost will still have the same expected value as without drop-out. (This technique is also called inverted dropout.)
  - 對於參數較多的層，設置較大的dropout率，參數較少的層，減小dropout率。
  - 對輸入層不要dropout
  - 測試時關閉dropout，只在訓練時使用。
  - Apply dropout both during forward and backward propagation.
  - The dropped neurons don’t contribute to the training in both the forward and backward propagations of the iteration.
  - At each iteration, you train a different model that uses only a subset of your neurons. With dropout, your neurons thus become less sensitive to the activation of one other specific neuron, because that other neuron might be shut down at any time.
- 數據擴增，增大數據量。水平翻轉、旋轉、變形
- Early stopping。驗證集誤差開始增加時停止訓練。此時權重參數還比較小，因此能避免overfitting
- Note that regularization hurts training set performance! This is because it limits the ability of the network to overfit to the training set. But since it ultimately gives better test accuracy, it is helping your system.

爲什麼Regularization可以減弱過擬合？
答：1. 設置一個較大的 $\lambda$ 參數，可以使得參數 $w$ 很接近0，導致網絡中很多節點都失效了，變相地把網絡減小、變淺了。
2. 對於sigmoid和tanh等激活函數而言， $z$ 值較小時處於線性區，較大時處於非線性區。添加regularization之後，使得參數 $w$ 的值變小， $z$ 的值也變小，將一層網絡的輸出拉入到了線性區。這樣，每層網絡近似是線性的，那麼多層網絡也近似是線性的，減少了過擬合。

爲什麼dropout有regularization的作用？
答：如下圖，對於網絡中的某一個節點而言，它的輸入節點都有可能被關掉（隨機的），因此該節點無法依賴任何一個輸入節點，只能把權值分配給所有的節點，這變相地使得參數 $w$ 變小了。

Normalizing訓練集，根據均值和方差，歸一化。
- 歸一化後Cost function的切面接近於一個圓，梯度下降更快。可以加速訓練過程。
參數初始化
- w = np.random.randn(shape) * np.sqrt(1/n)，使參數的方差是 $\sqrt{\dfrac{1}{n^{[l]}}}$
- 使用ReLU激活函數時，會初始化參數方差爲 $\sqrt{\dfrac{2}{n^{[l-1]}}}$ ，其中 $n^{[l-1]}$ 是上一層的節點數量，也被稱作He initialization
- tanh激活函數，方差取 $\sqrt{\dfrac{1}{n^{[l-1]}}}$
- Xavier初始化，方差取 $\sqrt{\dfrac{2}{n^{[l-1]}\cdot n^{[l]}}}$

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

深度學習的一些知識點總結

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

lightdb數據庫超時相關控制參數

lightdb秒級增加列和刪除列（not null帶默認值）

Java ThreadPoolShutdown

VTK 6.1 安裝配置

GAN生成對抗網絡：數學原理

Conv2d反向傳播梯度的計算過程

l1約束比l2約束更容易獲得稀疏解

Python二維數組按列取元素

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結