TensorFlow 中关于 gloable_step 的理解

原創

2020-06-16 02:27

最近学习的过程中总是看到 gloable_step 这个参数，但是也没有讲解他究竟是做什么用。具体来说出现在了优化器，指数衰减函数中，不甚理解。另外在指数衰减中，实现了

 decayed_learning_rate = learning_rate * decay_rate ^ (gloable_step/decay_step)

其他几个参数很好理解，分别是初始学习率，衰减率。decay_step 代表衰减速度(起初也没明白)。一些博客中也没有讲清楚或者没有解释 gloable_step，查阅了一些资料，思考过后，突然顿悟，下面就具体讲解一下这个看起来小但是深思却不得解的参数。

因为学习率的大小很关键，过大可能造成震荡，过小或导致学习速率过慢，学习时间很长。于是利用指数衰减法来得到一个合适的学习率。最初的学习率比较大，能够快速到达最低点，解决学习速率过慢的缺陷。随着训练步数的增加，学习率呈指数形式衰减，防止因为学习率过大，到达不了最低点。最后趋于平稳，达到一个较稳定的学习状态。

TensorFlow 提供了一个指数衰减函数

tf.train.exponential_decay(learning_rate,gloable_step,decay_steps,decay_rate,staircase:bool,name=None)

他实际上实现了以下代码的功能

decayed_learning_rate = learning_rate * decay_rate ^ (gloable_step / decay_steps)

之前以为 gloable_step 是训练完所有轮需要迭代的次数，但是经常看到把 gloable_step 初始化为 0 的操作

gloable_step = tf.Variable(0)

通过查阅资料发现了 Stack Overflow 上面对 gloable_step 的一段解释

global_step refer to the number of batches seen by the graph. Everytime a batch is provided, the weights are updated in the direction that minimizes the loss. global_step just keeps track of the number of batches seen so far. When it is passed in the minimize() argument list, the variable is increased by one. Have a look at optimizer.minimize().

You can get the global_step value using tf.train.global_step().

The 0 is the initial value of the global step in this context.

原来是在传入了 minimize() 中传入了 gloable_step 参数后，每训练完一个 bitch ，gloable_step 就增加 1，他是变化的。那么 gloable_step 就是相当于一个定时器，只不过这个定时器记录的当前迭代的次数，当达到某个值 A 的时候就会执行某个操作。

我们在来看最开始的提到的（gloable_step/decay_step），这个 decay_step 通常代表了完整的使用一遍训练数据所需要的迭代轮数。这个迭代轮数就是总训练样本数除以每个 bitch 中的训练样本数。这种场景就是每完整过完一遍训练数据，学习率就减小一次。所以这个 decay_step 就是设置为一个特定的值，而 gloabal_step 是变化的。当 staircase=True 的时候，（gloable_step/decay_step）就会被转化为一个整数，于是学习率就是一个阶梯函数。

结合 gloable_step 是当前迭代次数并随着每个 bitch 增加的概念，就会很好理解为什么会是一个阶梯函数了。假如设置 decay_steps=100，decay_rate=0.96，也就是每训练 100 轮后学习率乘以 0.96 (这里就对衰减速率有了进一步的理解，当我们设置更大的 decay_step，衰减速率就更慢)。gloable_step 只有增长到 100 的整数倍的时候，（gloable_step/decay_step）才是整数，在 gloable_step 没有达到 100 的整数的时候，（gloable_step/decay_step）小数会一直被转化为一个整数，也即是这个值是不变的，那么学习率也就不会变，表现为随着训练迭代轮数的增加，学习率不变的现象。只有当 gloable_step 增长到 100 的整数倍的时候，学习率才会变化，表现为陡降的现象(垂直下降)，于是就变为了阶梯形。staircase=False 的时候，是每训练一轮都会导致学习率的更新，因此学习率的变化就会表现为图中红色的曲线。综合来说，staircase=True 的时候，是每 decay_step 轮后更新学习率，更新为 learning_rate = learning_rate* decay_rate^（gloable_step/decay_step）[（gloable_step/decay_step）为1,2,3...的时候更新，也就是指数以1,2,3... 的次序变化]；staircase=False 的时候，是每个bitch 更新一次学习率（每个 bitch 为一轮）更新为learning_rate = learning_rate* decay_rate^（gloable_step/decay_step）[指数以1/100，2/100，3/100...的次序更新]

[ 有的文章说 staircase=True 的时候更新为 learning_rate = learning_rate* decay_rate^decay_step，staircase=False 的时候，更新为 learning_rate = learning_rate* decay_rate，想了一下，还是不太合理，一者没有考虑进 gloable_step 的因素，二者每一轮都更新，而一般来说轮数都比较大，每轮都更新学习率下降的就太快了 ]

TensorFlow 中，通过指数衰减函数生成学习率

learning_rate = tf.train.exponential_decay(0.1,gloable_step,100,0.96,staircase=True)

因为指定了 staircase = True，所以每训练 100 轮后学习率乘以 0.96。

在 minimize 函数中传入 gloable_step 将自动更新 gloable_step 参数，从而使学习率得到相应的更新

learning_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,gloable_step=gloable_step)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

TensorFlow 中关于 gloable_step 的理解

信息量，相對熵，交叉熵的理解

Flask 成長之路（三）---- 工程佈局和應用安裝

TensorFlow 中關於 gloable_step 的理解

記一次 scrapy 10060 的錯誤修復歷程

Flask 成長之路（一）---- Flask的安裝

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結