Tensorflow 神經網絡優化關於損失函數loss learning_rate

原創

2019-08-23 10:35

學習 Tensorflow實踐

損失函數（loss）

Table of Contents

損失函數（loss）

一、激活函數 activation function

二、NN複雜度：多用NN層數和N參數的個數表示

NN優化目標：loss最小

交叉熵ce

softmax函數

學習率learning_rate:每次參數更新的幅度

指數衰減學習率

一、激活函數 activation function

引入激活函數，可以有效避免XW的純線性組合，提高模型的表達力，使模型更有區分度

1、relu激活函數用tf.nn.relu()表示

2、sigmoid激活函數用tf.nn.sigmoid()表示

3、tanh激活函數用tf.nn.tamh()表示

二、NN複雜度：多用NN層數和N參數的個數表示

計算神經網絡層數時只計算有計算能力的層，所以不計算輸入層

層數=隱藏層的層數+1個輸出層

總參數=總W+總b

上圖3*4+4 + 4*2+2=26

NN優化目標：loss最小

主流loss計算：

MSE（Mean Squared Error）均方誤差
CE（Cross Entropy）交叉熵
自定義

——————————————————————————————————————————

擬造數據集 X，Y_ X中有x1,x2 y_=x1+x2 噪聲-0.05～0.05 擬合可以預測y的函數

import tensorflow as tf
import numpy as np
BATCH_SIZE = 8 #每次喂入神經網絡的特徵數量
seed=23455

rdm= np.random.RandomState(seed)
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for (x1,x2) in X]

定義神經網絡輸入輸出及前向傳播過程

x = tf.placeholder(tf.float32,shape=(None,2))
y_= tf.placeholder(tf.float32,shape=(None,1))

w1 = tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)

loss_mse = tf.reduce_mean(tf.square(y_-y))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)

生成會話訓練STEPS輪

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    STEPS =20000
    for i in range(STEPS):
        start=(i*BATCH_SIZE)%32
        end = (i*BATCH_SIZE)% 32 +BATCH_SIZE
        sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})
        if (i%431) == 0 :
            print(start,end)
            print("after",i)
            print(sess.run(w1))
    print("final",sess.run(w1))

——————————————————————————————————————

交叉熵ce

表徵兩個概率分佈之間的距離

$H(y',y)=-\sum y'*logy$

eg：已知答案y_=(1,0) 預測y1=(0.6,0.4) y2=(0.8,0.2)哪個更接近標準答案

ce = -tf.reduce_mean(y_*tf.log(tf.clip_by_value(y,1e-12,1.0)))
#y小於1e-12爲1e-12 大於1.0爲1.0

softmax函數

當n分類的n個輸出通過softmax 函數，便滿足了概率分佈的要求使得每一個元素的範圍都在(0,1)之間，並且所有元素的和爲1

$\forall x P(X=x)\subseteq [0,1] and \sum P(X=x)=1$

可以看這張圖片來差不多理解softmax函數的作用

$softmax(y_{i})=\frac{e^{y_{i}}}{\sum_{j=1}^{n}e^{y_{j}}}$

輸出經過softmax函數使其滿足概率分佈後在於標準答案求交叉熵輸出爲cem 即爲loss

ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem = tf.reduce_mean(ce)

學習率learning_rate:每次參數更新的幅度

$w_{n+1} = w_{n}-learningRate*\frac{\partial loss }{\partial w}$

更新後的參數 = 當前參數-學習率*損失函數的梯度（導數）

設定loss=square(w+1) w初值爲5 反向傳播求最優w 即求最小loss對應的w值 w值最小應爲-1

#coding:utf-8
import tensorflow as tf
w= tf.Variable(tf.constant(5,dtype=tf.float32))

loss = tf.square(w+1)
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(50):
        sess.run(train_step)
        w_val=sess.run(w)
        loss_val = sess.run(loss)
        if (i%5==0):
            print ("after ",i," w is ",w_val," loss is",loss_val)

#learning_rate=0.2
# after  0  w is  2.6  loss is 12.959999
# after  5  w is  -0.720064  loss is 0.07836417
# after  10  w is  -0.9782322  loss is 0.0004738369
# after  15  w is  -0.99830735  loss is 2.8650732e-06
# after  20  w is  -0.9998684  loss is 1.7320417e-08
# after  25  w is  -0.99998975  loss is 1.0510348e-10
# after  30  w is  -0.9999992  loss is 6.004086e-13
# after  35  w is  -0.99999994  loss is 3.5527137e-15
# after  40  w is  -0.99999994  loss is 3.5527137e-15
# after  45  w is  -0.99999994  loss is 3.5527137e-15

#learning_rate=1
# after  0  w is  -7.0  loss is 36.0
# after  5  w is  5.0  loss is 36.0
# after  10  w is  -7.0  loss is 36.0
# after  15  w is  5.0  loss is 36.0
# after  20  w is  -7.0  loss is 36.0
# after  25  w is  5.0  loss is 36.0
# after  30  w is  -7.0  loss is 36.0
# after  35  w is  5.0  loss is 36.0
# after  40  w is  -7.0  loss is 36.0
# after  45  w is  5.0  loss is 36.0

學習率過大震盪不收斂，學習率小了收斂速度慢

指數衰減學習率

learning_rate = LEARNING_RATE_BASE * LEARNING_RATE_DECAY ^ (global_step/LEARNING_RATE_STEP)

LEARNING_RATE_BASE：學習率基數，學習率初始值
LEARNING_RATE_DECAY：學習率衰減（0，1）
LEARNING_RATE_STEP：多少輪更新一次學習率=總樣本數/BATCH_SIZE
global_step：運行了幾輪BATCH_SIZE

import tensorflow as tf 
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
LEARNING_RATE_STEP = 1 #喂入多少輪BATCHSIZE後更新一次學習率

trainable=False爲不被訓練

global_step = tf.Variable(0,trainable=False)
#定義指數下降學習率
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True)
w=tf.Variable(tf.constant(5,dtype=tf.float32))
loss = tf.square(w+1)
train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

生成會話，訓練40輪

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Tensorflow 神經網絡優化關於損失函數loss learning_rate

損失函數（loss）

一、激活函數 activation function

二、NN複雜度：多用NN層數和N參數的個數表示

NN優化目標：loss最小

交叉熵ce

softmax函數

學習率learning_rate:每次參數更新的幅度

指數衰減學習率

[軟件工具百科] 互聯網資源歷史快照歸檔站點與數字圖書館

網易面試：SpringBoot如何開啓虛擬線程？

杭州的 IT 崩盤了麼？

程序員常見的文本查看工具

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

既然測試也要求寫代碼，那乾脆讓開發兼任測試不就好了嗎？

ITSM落地經驗之建設藍圖規劃

PDF 補丁丁 1.0.2 版更新

奇怪！應用的日誌呢？？

Objective-C 隨手記1

anaconda出現CondaHTTPError問題安裝及PyCharm配置

Tensorflow 神經網絡優化關於損失函數 loss learning_rate softmax

計算機組成原理輸入輸出系統3 DMA方式

計算機組成原理輸入輸出系統 2

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Tensorflow 神經網絡優化 關於損失函數loss learning_rate

損失函數（loss）

一、激活函數 activation function

二、NN複雜度：多用NN層數和N參數的個數表示

NN優化目標：loss最小

交叉熵ce

softmax函數

學習率learning_rate:每次參數更新的幅度

指數衰減學習率

Tensorflow 神經網絡優化關於損失函數loss learning_rate