TensorFlow 中關於 gloable_step 的理解

原創

2020-06-16 02:27

最近學習的過程中總是看到 gloable_step 這個參數，但是也沒有講解他究竟是做什麼用。具體來說出現在了優化器，指數衰減函數中，不甚理解。另外在指數衰減中，實現了

 decayed_learning_rate = learning_rate * decay_rate ^ (gloable_step/decay_step)

其他幾個參數很好理解，分別是初始學習率，衰減率。decay_step 代表衰減速度(起初也沒明白)。一些博客中也沒有講清楚或者沒有解釋 gloable_step，查閱了一些資料，思考過後，突然頓悟，下面就具體講解一下這個看起來小但是深思卻不得解的參數。

因爲學習率的大小很關鍵，過大可能造成震盪，過小或導致學習速率過慢，學習時間很長。於是利用指數衰減法來得到一個合適的學習率。最初的學習率比較大，能夠快速到達最低點，解決學習速率過慢的缺陷。隨着訓練步數的增加，學習率呈指數形式衰減，防止因爲學習率過大，到達不了最低點。最後趨於平穩，達到一個較穩定的學習狀態。

TensorFlow 提供了一個指數衰減函數

tf.train.exponential_decay(learning_rate,gloable_step,decay_steps,decay_rate,staircase:bool,name=None)

他實際上實現了以下代碼的功能

decayed_learning_rate = learning_rate * decay_rate ^ (gloable_step / decay_steps)

之前以爲 gloable_step 是訓練完所有輪需要迭代的次數，但是經常看到把 gloable_step 初始化爲 0 的操作

gloable_step = tf.Variable(0)

通過查閱資料發現了 Stack Overflow 上面對 gloable_step 的一段解釋

global_step refer to the number of batches seen by the graph. Everytime a batch is provided, the weights are updated in the direction that minimizes the loss. global_step just keeps track of the number of batches seen so far. When it is passed in the minimize() argument list, the variable is increased by one. Have a look at optimizer.minimize().

You can get the global_step value using tf.train.global_step().

The 0 is the initial value of the global step in this context.

原來是在傳入了 minimize() 中傳入了 gloable_step 參數後，每訓練完一個 bitch ，gloable_step 就增加 1，他是變化的。那麼 gloable_step 就是相當於一個定時器，只不過這個定時器記錄的當前迭代的次數，當達到某個值 A 的時候就會執行某個操作。

我們在來看最開始的提到的（gloable_step/decay_step），這個 decay_step 通常代表了完整的使用一遍訓練數據所需要的迭代輪數。這個迭代輪數就是總訓練樣本數除以每個 bitch 中的訓練樣本數。這種場景就是每完整過完一遍訓練數據，學習率就減小一次。所以這個 decay_step 就是設置爲一個特定的值，而 gloabal_step 是變化的。當 staircase=True 的時候，（gloable_step/decay_step）就會被轉化爲一個整數，於是學習率就是一個階梯函數。

結合 gloable_step 是當前迭代次數並隨着每個 bitch 增加的概念，就會很好理解爲什麼會是一個階梯函數了。假如設置 decay_steps=100，decay_rate=0.96，也就是每訓練 100 輪後學習率乘以 0.96 (這裏就對衰減速率有了進一步的理解，當我們設置更大的 decay_step，衰減速率就更慢)。gloable_step 只有增長到 100 的整數倍的時候，（gloable_step/decay_step）纔是整數，在 gloable_step 沒有達到 100 的整數的時候，（gloable_step/decay_step）小數會一直被轉化爲一個整數，也即是這個值是不變的，那麼學習率也就不會變，表現爲隨着訓練迭代輪數的增加，學習率不變的現象。只有當 gloable_step 增長到 100 的整數倍的時候，學習率纔會變化，表現爲陡降的現象(垂直下降)，於是就變爲了階梯形。staircase=False 的時候，是每訓練一輪都會導致學習率的更新，因此學習率的變化就會表現爲圖中紅色的曲線。綜合來說，staircase=True 的時候，是每 decay_step 輪後更新學習率，更新爲 learning_rate = learning_rate* decay_rate^（gloable_step/decay_step）[（gloable_step/decay_step）爲1,2,3...的時候更新，也就是指數以1,2,3... 的次序變化]；staircase=False 的時候，是每個bitch 更新一次學習率（每個 bitch 爲一輪）更新爲learning_rate = learning_rate* decay_rate^（gloable_step/decay_step）[指數以1/100，2/100，3/100...的次序更新]

[ 有的文章說 staircase=True 的時候更新爲 learning_rate = learning_rate* decay_rate^decay_step，staircase=False 的時候，更新爲 learning_rate = learning_rate* decay_rate，想了一下，還是不太合理，一者沒有考慮進 gloable_step 的因素，二者每一輪都更新，而一般來說輪數都比較大，每輪都更新學習率下降的就太快了 ]

TensorFlow 中，通過指數衰減函數生成學習率

learning_rate = tf.train.exponential_decay(0.1,gloable_step,100,0.96,staircase=True)

因爲指定了 staircase = True，所以每訓練 100 輪後學習率乘以 0.96。

在 minimize 函數中傳入 gloable_step 將自動更新 gloable_step 參數，從而使學習率得到相應的更新

learning_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,gloable_step=gloable_step)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

TensorFlow 中關於 gloable_step 的理解

信息量，相對熵，交叉熵的理解

Flask 成長之路（三）---- 工程佈局和應用安裝

TensorFlow 中關於 gloable_step 的理解

記一次 scrapy 10060 的錯誤修復歷程

Flask 成長之路（一）---- Flask的安裝

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結