深度学习的一些知识点总结

原創

随风秀舞

2019-06-11 03:46

目标函数要能防止网络的输出始终是一个单一数值，例如0。

当training set和test set数据分布不一致时，保证validate set和test set数据分布一致
High bias? 增大加深网络。Large and deep。
High variance？找更多训练数据，regularization

增大加深网络，同时增大训练集数据量，总是可取的。

Regularization：
- L2 平方和，对矩阵是Frobenius Norm，在神经网络中也被称作 Weight decay
- L1 绝对值的和
- Dropout
  - 训练的时候在每一层按照阈值 $p (0<p<1)$ 随机忽略一些节点，每一层的输出 $a$ 最后要除以 $p$ ，即 $a = a/p$ ，以保持输出的大小。By doing this you are assuring that the result of the cost will still have the same expected value as without drop-out. (This technique is also called inverted dropout.)
  - 对于参数较多的层，设置较大的dropout率，参数较少的层，减小dropout率。
  - 对输入层不要dropout
  - 测试时关闭dropout，只在训练时使用。
  - Apply dropout both during forward and backward propagation.
  - The dropped neurons don’t contribute to the training in both the forward and backward propagations of the iteration.
  - At each iteration, you train a different model that uses only a subset of your neurons. With dropout, your neurons thus become less sensitive to the activation of one other specific neuron, because that other neuron might be shut down at any time.
- 数据扩增，增大数据量。水平翻转、旋转、变形
- Early stopping。验证集误差开始增加时停止训练。此时权重参数还比较小，因此能避免overfitting
- Note that regularization hurts training set performance! This is because it limits the ability of the network to overfit to the training set. But since it ultimately gives better test accuracy, it is helping your system.

为什么Regularization可以减弱过拟合？
答：1. 设置一个较大的 $\lambda$ 参数，可以使得参数 $w$ 很接近0，导致网络中很多节点都失效了，变相地把网络减小、变浅了。
2. 对于sigmoid和tanh等激活函数而言， $z$ 值较小时处于线性区，较大时处于非线性区。添加regularization之后，使得参数 $w$ 的值变小， $z$ 的值也变小，将一层网络的输出拉入到了线性区。这样，每层网络近似是线性的，那么多层网络也近似是线性的，减少了过拟合。

为什么dropout有regularization的作用？
答：如下图，对于网络中的某一个节点而言，它的输入节点都有可能被关掉（随机的），因此该节点无法依赖任何一个输入节点，只能把权值分配给所有的节点，这变相地使得参数 $w$ 变小了。

Normalizing训练集，根据均值和方差，归一化。
- 归一化后Cost function的切面接近于一个圆，梯度下降更快。可以加速训练过程。
参数初始化
- w = np.random.randn(shape) * np.sqrt(1/n)，使参数的方差是 $\sqrt{\dfrac{1}{n^{[l]}}}$
- 使用ReLU激活函数时，会初始化参数方差为 $\sqrt{\dfrac{2}{n^{[l-1]}}}$ ，其中 $n^{[l-1]}$ 是上一层的节点数量，也被称作He initialization
- tanh激活函数，方差取 $\sqrt{\dfrac{1}{n^{[l-1]}}}$
- Xavier初始化，方差取 $\sqrt{\dfrac{2}{n^{[l-1]}\cdot n^{[l]}}}$

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

深度学习的一些知识点总结

再谈23种设计模式（3）：行为型模式（学习笔记）

Power Automate Desktop 安装完，登录后老是提示one driver 错误

微前端学习笔记(4):从微前端到微模块之EMP与hel-micro方案探索

微前端学习笔记（1）：微前端总体架构概述，从微服务发微

985 硕士程序员，空窗 4 个月没有 Offer！

一文搞懂 Spring 循环依赖

赛博斗地主——使用大语言模型扮演Agent智能体玩牌类游戏。

VScode右键打开(添加到右键)

记一次 .NET某工控视觉自动化系统卡死分析

WindowsServer--SQL Server搭建主从同步实现读写分离 - 事务性分发

VTK 6.1 安裝配置

GAN生成對抗網絡：數學原理

Conv2d反向傳播梯度的計算過程

l1約束比l2約束更容易獲得稀疏解

Python二維數組按列取元素

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結