概念

滑動平均exponential moving average，指數加權平均。

默認情況：
變量v在t時刻的取值
v_t = Theta_t
EMA:
v_t = Beta * v_t-1 + (1-Beta)*Theta_t
公式中如果Beta=0,就完全等於前者。

看公式，直覺理解也很簡單：舊的平均值乘以一個大系數維持穩定，新的數值乘以一個小的係數減少影響，總的來說就是類似Momentum的一個維持穩定的機制，對於新數值，比Momentum還要“打壓”。

如果Beta取一個典型值，如0.9,其實具體含義就是，
vt~= 1/(1-Beta)個數據的平均值
約等於：在此之前十個數據的平均值！（沒這麼整，每個新加的都是0.1的係數，舊的還有0.1的衰減）
1/(1-Beta)|Beta=0.9=10
1/(1-Beta)|Beta=0.98=50

不過，什麼叫新和舊？怎麼理解？新的Theta怎麼得來的？（其實是有兩套東西，拿維護變量w舉例，w是正常更新的，而EMA是獨立維護的，所以整個流程就是，wt-1更新到wt，取新的wt來更新EMA，更具體的，是你調用ema的apply的時候取一次，理論上w本身不一定發生了變動，不過一般都是同步的，見例子。）

關於EMA到底是維護變量還是數據集：其實實際使用中，EMA主要還是維護W和b，舉例說明典型的使用場景。

直覺上，有點momentum的意思，不過momentum是梯度下降用的，滑動平均主要針對變量。

也可以用來平滑數據，減少噪音和異常。和Momentum的相似處，都有慣性，如果Beta太大，整個曲線會有滯後性，和真正的數據產生偏差。。和Adagrad的一個類似點：不過多佔用額外內存，只維護一個值就好，不用真正把數據都調出來。（不過這個圖應該匹配到深度學習的具體哪一環節？訓練中ema不是幹這個的，也間接算訓練吧，假設這個訓練過程中數據產生了變化，那麼參數和模型自然也跟着偏移）

彌補不足：初始數據積累不足的情況

這是global_step參數的存在意義（global_step並不是ema更新的驅動力，只是一種彌補手段）
因爲這個慣性的存在，所以就有了滯後性，所以就需要修正。
圖中紫線，甚至升高都有滯後性，因爲之前沒有數據累積。但是物理慣性仍然大（Beta高），所以有一個維持“0"的趨勢在，所以升不起來。
這個公式二選一：
Beta = min(decay,(1+num_updates)/(10+num_updates))
前期後者小，後期前者小，（極端來說，後者是從1/10到1/1的趨勢。）
所以，Beta=0.98時，updates=5次，6/15=0.4，選擇0.4而不選擇0.98.。。。

深度學習訓練中的作用

說白了，TF中，給W和b使用EMA，就是防止訓練過程遇到異常數據影響訓練效果的，讓W和b維持相對穩定。

影子變量，說是影子，不光是慣性和尾隨的含義。測試的時候使用的就是影子變量，取代變量。理解這個概念很重要，包括BatchNormalization等，如果不理解，很可能會用錯。

所以，感覺這個東西在數據量小後者數據不穩定或者batch_size小的情況下尤其有用。
每個iteration使用全部數據的梯度下降肯定是不太需要EMA了，除非learning_rate大，不然方向不可能有偏差（所以，根據learning_rate的不同，也算有點用），但是實際上mini-batch更多吧，所以EMA有使用的必要。

實現

典型步驟

在TensorFlow中，ExponentialMovingAverage()可以傳入兩個參數：衰減率（decay）和數據的迭代次數（step），這裏的decay和step分別對應我們的β和num_updates，所以在實現滑動平均模型的時候，步驟如下：
1、定義訓練輪數step
2、然後定義滑動平均的類
3、給這個類指定需要用到滑動平均模型的變量（w和b）
4、執行操作，把變量變爲指數加權平均值

# 定義訓練的輪數，False避免這個變量被計算滑動平均值
global_step = tf.Variable(0, trainable=False)

# 給定滑動衰減率和訓練輪數，初始化滑動平均類
# global_step可以加快訓練前期的迭代速度，彌補慣性帶來的滯後
variable_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,
                                                      global_step)
# 用tf.trainable_variable()獲取所有可以訓練的變量列表，全部指定爲使用滑動平均模型
# global_step雖然不在計算圖中，但是是會被ema處理到的，一定要設False
variables_averages_op = variable_averages.apply(tf.trainable_variables())

# 綁定操作：反向傳播更新參數之後，再更新每一個參數的滑動平均值
# 用下面的代碼可以用一次sess.run(train_op)完成這兩個操作。
with tf.control_dependencies([train_step, variables_averages_op]):
    train_op = tf.no_op(name="train")

一個EMA影子變量的例子

import tensorflow as tf
w1 = tf.Variable(0, dtype=tf.float32)
global_step = tf.Variable(0,dtype=tf.float32,trainable=False)
MOVING_AVERAGE_DECAY = 0.99
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
ema_op = ema.apply([w1])#參數列表，本例可以手動指定w1.
#w1直接模擬N次變動，從1變10,讓ema追w1的值
with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    print('init w:',sess.run([w1,ema.average(w1)]))#用.average獲得w1的滑動平均，也就是影子吧。
    sess.run(tf.assign(w1,1))#手動修改w1的值
    sess.run(tf.assign(global_step, 1))
    sess.run(ema_op)#滑動一次。
    print('after an ema op')
    print('w:',sess.run([w1,ema.average(w1)]))
    sess.run(ema_op)#滑動一次。
    print('after an ema op')
    print('w:',sess.run([w1,ema.average(w1)]))#global_step不變動，不影響ema更新
    sess.run(ema_op)#滑動一次。
    print('after an ema op')
    print('w:',sess.run([w1,ema.average(w1)]))
    print('assign global_step:')
    #假裝進行了100輪迭代，w1變成10(其實ema沒有更新中間那一百步）
    sess.run(tf.assign(global_step, 100))
    sess.run(tf.assign(w1, 10))
    sess.run(ema_op)
    print('after 100 ema ops')
    print('w:',sess.run([w1,ema.average(w1)]))
    sess.run(ema_op)
    
    #再拿同樣的w=10多更新幾次影子,讓影子逼近w1
    for i in range(100):
        sess.run(ema_op)
        if i % 10 == 0:
            print('w:',sess.run([w1,ema.average(w1)]))

init w: [0.0, 0.0]
after an ema op
w: [1.0, 0.8181818]
after an ema op
w: [1.0, 0.96694213]
after an ema op
w: [1.0, 0.99398947]
assign global_step:
after 100 ema ops
w: [10.0, 1.7308447]
w: [10.0, 3.0286236]
w: [10.0, 7.031032]
w: [10.0, 8.735577]
w: [10.0, 9.461507]
w: [10.0, 9.770666]
w: [10.0, 9.90233]
w: [10.0, 9.958405]
w: [10.0, 9.982285]
w: [10.0, 9.9924555]
w: [10.0, 9.996786]

進一步接近真實情景，讓w1變動

#同樣的例子，修改一下，讓w1動態變化，ema在後邊追。
import tensorflow as tf
w1 = tf.Variable(0, dtype=tf.float32)
global_step = tf.Variable(0,dtype=tf.float32,trainable=False)#不會被ema做平均
MOVING_AVERAGE_DECAY = 0.99
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)

ema_op = ema.apply(tf.trainable_variables())
with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    print('init w:',sess.run([w1,ema.average(w1)]))#用.average獲得w1的滑動平均，也就是影子吧。
    sess.run(tf.assign(w1,1))#手動修改w1的值
    sess.run(tf.assign(global_step, 1))
    sess.run(ema_op)#滑動一次。
    
    print('after an ema op')
    print('w:',sess.run([w1,ema.average(w1)]))
    #假裝進行了100輪迭代，w1變成10(其實ema沒有更新中間那一百步）
    sess.run(tf.assign(global_step, 100))
    sess.run(tf.assign(w1, 10))
    sess.run(ema_op)
    print('after 100 ema ops')
    print('w:',sess.run([w1,ema.average(w1)]))
    sess.run(ema_op)
    
    #再拿同樣的w=10多更新幾次影子,讓影子逼近w1，同時，w1也變化。
    for i in range(100):
        sess.run(tf.assign_add(w1,1))
        sess.run(ema_op)
        if i % 10 == 0:
            print('w:',sess.run([w1,ema.average(w1)]))
            print('global_step:',sess.run(global_step))
#             print('global_step ema:',sess.run([global_step,ema.average(global_step)]))#global_step的ema

init w: [0.0, 0.0]
after an ema op
w: [1.0, 0.8181818]
after 100 ema ops
w: [10.0, 1.5694213]
w: [11.0, 2.9743524]
global_step: 100.0
w: [21.0, 11.1391325]
global_step: 100.0
w: [31.0, 20.35755]
global_step: 100.0
w: [41.0, 30.024689]
global_step: 100.0
w: [51.0, 39.882935]
global_step: 100.0
w: [61.0, 49.822563]
global_step: 100.0
w: [71.0, 59.79685]
global_step: 100.0
w: [81.0, 69.7859]
global_step: 100.0
w: [91.0, 79.78124]
global_step: 100.0
w: [101.0, 89.77926]
global_step: 100.0

例2:global_step的trainable設置

如果你不限制global_step爲不可訓練，並且ema直接獲取所有可訓練變量，global_step的ema就會變（不過不影響global_step變量自身）

import tensorflow as tf

w1 = tf.Variable(0, dtype=tf.float32)
global_step = tf.Variable(0,dtype=tf.float32,trainable=True)#會被ema做平均。
MOVING_AVERAGE_DECAY = 0.99
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
# ema_op = ema.apply([w1])#參數列表，本例可以手動指定w1.
ema_op = ema.apply(tf.trainable_variables())#注意這種情況，這種情況應該就會影響global_step了，所以得設False
with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    print('init w:',sess.run([w1,ema.average(w1)]))#用.average獲得w1的滑動平均，也就是影子吧。
    sess.run(tf.assign(w1,1))#手動修改w1的值
    sess.run(tf.assign(global_step, 1))
    sess.run(ema_op)#滑動一次。
    
    print('after an ema op')
    print('w:',sess.run([w1,ema.average(w1)]))
    #假裝進行了100輪迭代，w1變成10(其實ema沒有更新中間那一百步）
    sess.run(tf.assign(global_step, 100))
    sess.run(tf.assign(w1, 10))
    sess.run(ema_op)
    print('after 100 ema ops')
    print('w:',sess.run([w1,ema.average(w1)]))
    sess.run(ema_op)
    
    #再拿同樣的w=10多更新幾次影子,讓影子逼近w1，同時，w1也變化。
    for i in range(100):
        sess.run(tf.assign_add(w1,1))
        sess.run(ema_op)
        if i % 10 == 0:
            print('w:',sess.run([w1,ema.average(w1)]))
            print('global_step and ema:',sess.run([global_step,ema.average(global_step)]))

init w: [0.0, 0.0]
after an ema op
w: [1.0, 0.8181818]
after 100 ema ops
w: [10.0, 1.5694213]
w: [11.0, 2.9743524]
global_step and ema: [100.0, 23.225294]
w: [21.0, 11.1391325]
global_step and ema: [100.0, 67.30321]
w: [31.0, 20.35755]
global_step and ema: [100.0, 86.075096]
w: [41.0, 30.024689]
global_step and ema: [100.0, 94.069664]
w: [51.0, 39.882935]
global_step and ema: [100.0, 97.47439]
w: [61.0, 49.822563]
global_step and ema: [100.0, 98.924385]
w: [71.0, 59.79685]
global_step and ema: [100.0, 99.541916]
w: [81.0, 69.7859]
global_step and ema: [100.0, 99.80492]
w: [91.0, 79.78124]
global_step and ema: [100.0, 99.916916]
w: [101.0, 89.77926]
global_step and ema: [100.0, 99.96461]

最後，怎麼用影子變量來測試？

我因爲其他問題，去測了一把，其實變量本身，還是按變量來恢復的，影子怎麼用上？
現在想看ema的值，用ema.average(w1)就可以，然後，怎麼恢復他們？怎麼把他們恢復到模型上？
restore自動做了嗎？其實是用映射主動替換的！

不指定var_list的時候，原來的'weights'仍然對應'weights'，如果指定了var_list映射（var_list有list和dict兩種模式），原來的'weights/ema'會替代'weights'，因爲傳進去的是個dict，手動的映射，所以說，其實模型改過命名，也不是一定就廢棄了，還是可以手動映射來對上的，只不過比較繁瑣，我這裏就不弄了。

模擬訓練與存儲模型

import tensorflow as tf
#這個變量爲了觀察變化,數值越大，慣性越大，相比實際的W，增長越小越緩慢
#實際應該使用0.9、0.99、0.999等
MOVING_AVERAGE_DECAY = 0.6
#0.1的衰減
# [array([ 10.49953651,  20.49953651,  30.49953651], dtype=float32), 
#  array([ 10.38866425,  20.38866425,  30.38866425], dtype=float32)]
#0.6的衰減
#  [array([ 10.49953651,  20.49953651,  30.49953651], dtype=float32), 
#   array([ 10.34986782,  20.34986877,  30.34986877], dtype=float32)]


global_step = tf.Variable(0, trainable=False,name='global_step')
with tf.variable_scope("my_scope"): 
	W = tf.Variable([10,20,30], dtype = tf.float32, name='weights')
    
y = tf.constant([20,30,40],tf.float32)
loss = tf.reduce_mean(tf.square(W-y))
train_step = tf.train.AdamOptimizer(0.1).minimize(loss,global_step=global_step)
    
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)#提前把EMA存起來
# ema_op = ema.apply(tf.trainable_variables())
ema_op = ema.apply([W])

with tf.control_dependencies([train_step, ema_op]):
    train_op = tf.no_op(name='train')

saver = tf.train.Saver()
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print('init,before assign:',sess.run(W))
    for i in range(5):
        sess.run(train_op)
        print('global_step is ',sess.run(global_step))
#         sess.run(train_step)
#         sess.run(ema_op)
        print('W and ema.average:',sess.run([W,ema.average(W)]))

    save_path = saver.save(sess, "my_net/save_net_3.ckpt")
    print('Save to path:',save_path)

init,before assign: [10. 20. 30.]
global_step is  1
W and ema.average: [array([10.1, 20.1, 30.1], dtype=float32), array([10., 20., 30.], dtype=float32)]
global_step is  2
W and ema.average: [array([10.199972, 20.199972, 30.199972], dtype=float32), array([10.081819, 20.081818, 30.081818], dtype=float32)]
global_step is  3
W and ema.average: [array([10.299898, 20.299898, 30.299898], dtype=float32), array([10.170434, 20.170433, 30.170433], dtype=float32)]
global_step is  4
W and ema.average: [array([10.399759, 20.39976 , 30.39976 ], dtype=float32), array([10.260063, 20.260063, 30.260063], dtype=float32)]
global_step is  5
W and ema.average: [array([10.4995365, 20.499537 , 30.499537 ], dtype=float32), array([10.349868, 20.349869, 30.349869], dtype=float32)]
Save to path: my_net/save_net_3.ckpt

錯誤的模型讀取方式

#如果不指定var_list，普通的W還是按普通的W去提取。
#存[array([ 10.49953651,  20.49953651,  30.49953651], dtype=float32), array([ 10.34986782,  20.34986877,  30.34986877]
#取[ 10.49953651  20.49953651  30.49953651]
import tensorflow as tf
import numpy as np

MOVING_AVERAGE_DECAY = 0.6#0.99
global_step = tf.Variable(0, trainable=False,name='global_step')

with tf.variable_scope("my_scope"):
	W2 = tf.Variable([0,0,0], dtype=tf.float32,name='weights')

ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY)
ema_restore = ema.variables_to_restore()##w和w的ema，還有global，只能算指定了ema的綁定關係，不是隻有ema，其實不影響變量自身的讀取

# loader = tf.train.Saver(ema_restore)
loader = tf.train.Saver()
load_path = "my_net/save_net_3.ckpt"
with tf.Session() as sess:
    tf.global_variables_initializer().run()

    loader.restore(sess, load_path)

    print("W2:",sess.run(W2))
#     print("W2:",sess.run([W2,ema.average(W2)]))
#     print('ema W is :',sess.run(ema_val))
    print(sess.run(ema_restore['my_scope/weights/ExponentialMovingAverage']))
    print('after run restore')
    print(tf.global_variables())

使用了非ema變量，原始變量，等於白做了。

INFO:tensorflow:Restoring parameters from my_net/save_net_3.ckpt
W2: [10.4995365 20.499537  30.499537 ]
[10.4995365 20.499537  30.499537 ]
after run restore
[<tf.Variable 'global_step:0' shape=() dtype=int32_ref>, <tf.Variable 'my_scope/weights:0' shape=(3,) dtype=float32_ref>]

正確的模型讀取方式

#如果指定var_list，普通的W按ema去提取並完成替換。。
#存[array([ 10.49953651,  20.49953651,  30.49953651], dtype=float32), array([ 10.34986782,  20.34986877,  30.34986877]
#取[ 10.34986782  20.34986877  30.34986877]
import tensorflow as tf
import numpy as np

MOVING_AVERAGE_DECAY = 0.6#0.99
global_step = tf.Variable(0, trainable=False,name='global_step')

with tf.variable_scope("my_scope"):
	W2 = tf.Variable([0,0,0], dtype=tf.float32,name='weights')

ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY)
ema_restore = ema.variables_to_restore()##w和w的ema，還有global，只能算指定了ema的綁定關係，不是隻有ema，其實不影響變量自身的讀取

print('ema_restore:',ema_restore)#這是一個映射，讓EMA對應普通變量

loader = tf.train.Saver(ema_restore)
load_path = "my_net/save_net_3.ckpt"
with tf.Session() as sess:
    tf.global_variables_initializer().run()

    loader.restore(sess, load_path)

    print("W2:",sess.run(W2))
#     print("W2:",sess.run([W2,ema.average(W2)]))
#     print('ema W is :',sess.run(ema_val))
    print(sess.run(ema_restore['my_scope/weights/ExponentialMovingAverage']))
    print('after run restore')
    print(tf.global_variables())

正確的使用了EMA維護的變量

ema_restore: {'my_scope/weights/ExponentialMovingAverage': <tf.Variable 'my_scope/weights:0' shape=(3,) dtype=float32_ref>, 'global_step': <tf.Variable 'global_step:0' shape=() dtype=int32_ref>}
INFO:tensorflow:Restoring parameters from my_net/save_net_3.ckpt
W2: [10.349868 20.349869 30.349869]
[10.349868 20.349869 30.349869]
after run restore
[<tf.Variable 'global_step:0' shape=() dtype=int32_ref>, <tf.Variable 'my_scope/weights:0' shape=(3,) dtype=float32_ref>]

補一發手寫的映射通用寫法

{'var_name':tensor}
通過手動指定，顛倒兩個變量的讀取和賦值
tf.train.Saver的var_list其實支持兩種寫法，list和dict，dict中的key是文件中的變量名

import tensorflow as tf
import numpy as np

with tf.variable_scope("my_scope"):
	W2 = tf.Variable([1,2,3], dtype=tf.float32,name='weights')
	b2 = tf.Variable([2,3,4], dtype=tf.float32,name='biases')

saver = tf.train.Saver()
load_path = "my_net/save_net_4.ckpt"
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    saver.save(sess, load_path)
    print("W2:",sess.run(W2))
    print("b2:",sess.run(b2))

import tensorflow as tf
import numpy as np

with tf.variable_scope("my_scope"):#顛倒一下
	b2 = tf.Variable([0,0,0], dtype=tf.float32,name='biases')#看name，這裏本來也是b2和biases對應的
	W2 = tf.Variable([0,0,0], dtype=tf.float32,name='weightss')#不同名也無所謂
	W3 = tf.Variable([0,0,0], dtype=tf.float32,name='weightsss')

#另外，也不能夠一個變量恢復到兩個tensor，dict中key衝突覆蓋了
restore_map = {'my_scope/weights':b2,'my_scope/biases':W2#,'my_scope/biases':W3
              }
loader = tf.train.Saver(restore_map)
load_path = "my_net/save_net_4.ckpt"
with tf.Session() as sess:
    tf.global_variables_initializer().run()

    loader.restore(sess, load_path)
    print("W2:",sess.run(W2))
    print("b2:",sess.run(b2))
    print("W3:",sess.run(W3))

TensorFlow中EMA的概念和正確使用方法

概念

彌補不足：初始數據積累不足的情況

深度學習訓練中的作用

實現

典型步驟

一個EMA影子變量的例子

進一步接近真實情景，讓w1變動

例2:global_step的trainable設置

最後，怎麼用影子變量來測試？

模擬訓練與存儲模型

錯誤的模型讀取方式

正確的模型讀取方式

補一發手寫的映射通用寫法

ziw2pdf

apisix~helm方式的部署到k8s

firmeye - IoT固件漏洞挖掘工具

python手寫神經網絡之Dropout實現

python手寫神經網絡之BatchNormalization實現

python手寫神經網絡之權重初始化——梯度消失、表達消失

python手寫神經網絡之優化器（Optimizer）SGD、Momentum、Adagrad、RMSProp、Adam實現與對比——《深度學習入門——基於Python的理論與實現（第六章）》

python實現微分函數，兩種計算方式對比，一個誤區

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結