tensorflow筆記02：學習率如何更新權重w

學習率的概念

學習速率是指導我們該如何通過損失函數的梯度調整網絡權重的超參數。學習率越低，損失函數的變化速度就越慢。雖然使用低學習率可以確保我們不會錯過任何局部極小值，但也意味着我們將花費更長的時間來進行收斂，特別是在被困在高原區域的情況下。
結論：學習率過大,會導致待優化的參數在最小值附近波動,不收斂;學習率過小,會導致待優化的參數收斂緩慢。在訓練過程中,參數的更新向着損失函數梯度下降的方向。

下述公式表示了上面所說的這種關係。

new_weight = existing_weight — learning_rate * gradient

如何調整學習率

這裏我們課程講了傳統的兩種方法：

1 .公式1：Wn+1 = W n − learning_rate∇

#coding=utf-8
import tensorflow as tf


#設計前向傳輸的參數w

w = tf.Variable(tf.constant(5,tf.float32))

#定義反向傳播，損失函數，反向傳播方法、
loss = tf.square(w+1)
#學習率小，=0.001
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

#創建會話，初始化參數
with tf.Session() as sess:
	init_opt = tf.global_variables_initializer()
	sess.run(init_opt)
	#輸出初始的權重
	print"the first w is:\n",sess.run(w)
	#定義訓練次數
	steps = 40
	for i in range(steps):
		sess.run(train_step)
		w_val = sess.run(w)
		loss_val = sess.run(loss)
		print "After %d steps'training ,the w is %f, the loss is %f:\n"%(i,w_val,loss_val)

學習率=0.001,屬於小學習率時，訓練40次，權重下降很慢，loss也是

現在把學習率調節到0.1,訓練40次，發現實際的結果與loss的圖像符合，w無限趨向-1,loss趨向0，loss收斂。

現在將學習率繼續調節到1,發現權重在-7和5之間震盪，loss基本保持在36，不收斂; 符合結論。

2. 使用指數衰減學習率：學習率隨訓練輪數而動態更新
公式2：

Learning_rate = Learning_rate_Base*Learning_rate_Decay*(global_step/Learning_rate_BATCH_SIZE)

其中，Learning_rate_Base是學習率的初始值，Learning_rate_Decay是學習率的衰減率，global_step是當前訓練輪數，Learning_rate_BATCH_SIZE是學習率 learning_rate 更新頻率，輸入數據集總樣本數除以每
次喂入樣本數。

在tensorflow中表達爲

global_step = tf.Variable(0, trainable=False)
learning_rate=tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True/False)

注：若 staircase 設置爲 True 時,表示 global_step/learning rate step 取整數,學習率階梯型衰減;若 staircase 設置爲 false 時,學習率會是一條平滑下降的曲線。

我用代碼演示上面的loss = （w+1）²的衰減如下：

#coding=utf-8
import tensorflow as tf
import numpy as np
#自適應學習率指數衰減
Learning_rate_Base = 0.1#初始學習率
Learning_rate_Step = 1#喂入多少輪更新一次學習率，一般設置爲總樣本數/BATCH_SIZE
Learning_rate_Decay = 0.99#學習率指數衰減率

#運行幾輪的BATCH_SIZE後的計數器，初值0,設爲不給訓練
global_step =tf.Variable(0,trainable=False) 
#w = tf.Variable(tf.constant(5,tf.float32))
learning_rate = tf.train.exponential_decay(Learning_rate_Base,global_step,Learning_rate_Step,Learning_rate_Decay,staircase=True)

#定義待優化參數
w = tf.Variable(tf.constant(5,dtype=tf.float32))

#定義反向傳播，損失函數，反向傳播方法、
loss = tf.square(w+1)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

#創建會話，初始化參數
with tf.Session() as sess:
	init_opt = tf.global_variables_initializer()
	sess.run(init_opt)
	#輸出初始的權重
	print"the first w is:\n",sess.run(w)
	#定義訓練次數
	steps = 500
	for i in range(steps):
		sess.run(train_step)
		w_val = sess.run(w)
		loss_val = sess.run(loss)
		learning_rate_val = sess.run(learning_rate)
		global_step_val = sess.run(global_step)		
		print "After %d steps'training ,the global_step is %f,w is %f, the loss is %f ,the learning_rate is %f;:\n"%(i, global_step_val, w_val, loss_val, learning_rate_val)

訓練40輪時，我們可以看到，它自己進行學習率的調整，儘管是慢慢下降，但是權重在40輪的訓練中，基本趨向於-1,loss趨向於0：

加大訓練次數觀察，step = 500，學習率變得很小，微調了，權重與實際值-1,loss與實際值0無限接近。

更好的辦法

這裏引用一篇文章，待自己知識會了更多再看
更好選擇學習率

tensorflow筆記02

tensorflow筆記02：學習率如何更新權重w

學習率的概念

如何調整學習率

更好的辦法

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

2020年上半年數據庫系統工程師考試

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

平衡二叉樹及其算法實現

平衡二叉樹之紅-黑樹學習

算法學習（一）查找算法

LeetCode04：最長迴文子串（一步一步提高算法效率）

win10使用自帶工具進行硬盤測速

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結