Tensorflow 神經網絡優化 關於損失函數loss learning_rate

學習 Tensorflow實踐

損失函數(loss)

Table of Contents

損失函數(loss)

一、激活函數 activation function

二、NN複雜度:多用NN層數和N參數的個數表示

NN優化目標:loss最小

交叉熵ce

softmax函數

學習率learning_rate:每次參數更新的幅度

指數衰減學習率


一、激活函數 activation function

引入激活函數,可以有效避免XW的純線性組合,提高模型的表達力,使模型更有區分度

1、relu激活函數 用tf.nn.relu()表示

2、sigmoid激活函數 用tf.nn.sigmoid()表示

3、tanh激活函數 用tf.nn.tamh()表示

二、NN複雜度:多用NN層數和N參數的個數表示

計算神經網絡層數時只計算有計算能力的層,所以不計算輸入層

層數=隱藏層的層數+1個輸出層

總參數=總W+總b

上圖3*4+4  +  4*2+2=26

NN優化目標:loss最小

主流loss計算:

  • MSE(Mean Squared Error)均方誤差
  • CE(Cross Entropy)交叉熵
  • 自定義

——————————————————————————————————————————

擬造數據集 X,Y_     X中有x1,x2    y_=x1+x2 噪聲-0.05~0.05 擬合可以預測y的函數

import tensorflow as tf
import numpy as np
BATCH_SIZE = 8 #每次喂入神經網絡的特徵數量
seed=23455

rdm= np.random.RandomState(seed)
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for (x1,x2) in X]

定義神經網絡輸入輸出及前向傳播過程 

x = tf.placeholder(tf.float32,shape=(None,2))
y_= tf.placeholder(tf.float32,shape=(None,1))

w1 = tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)

loss_mse = tf.reduce_mean(tf.square(y_-y))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)

生成會話 訓練STEPS輪

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    STEPS =20000
    for i in range(STEPS):
        start=(i*BATCH_SIZE)%32
        end = (i*BATCH_SIZE)% 32 +BATCH_SIZE
        sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})
        if (i%431) == 0 :
            print(start,end)
            print("after",i)
            print(sess.run(w1))
    print("final",sess.run(w1))

——————————————————————————————————————

交叉熵ce

表徵兩個概率分佈之間的距離

H(y',y)=-\sum y'*logy  

eg:已知答案y_=(1,0) 預測y1=(0.6,0.4) y2=(0.8,0.2)哪個更接近標準答案

ce = -tf.reduce_mean(y_*tf.log(tf.clip_by_value(y,1e-12,1.0)))
#y小於1e-12爲1e-12 大於1.0爲1.0

softmax函數

當n分類的n個輸出 通過softmax 函數,便滿足了概率分佈的要求 使得每一個元素的範圍都在(0,1)之間,並且所有元素的和爲1

\forall x P(X=x)\subseteq [0,1] and \sum P(X=x)=1

可以看這張圖片來差不多理解softmax函數的作用

softmax(y_{i})=\frac{e^{y_{i}}}{\sum_{j=1}^{n}e^{y_{j}}}

輸出經過softmax函數 使其滿足概率分佈後在於標準答案求交叉熵 輸出爲cem 即爲loss

ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem = tf.reduce_mean(ce)

學習率learning_rate:每次參數更新的幅度

w_{n+1} = w_{n}-learningRate*\frac{\partial loss }{\partial w}

更新後的參數 = 當前參數-學習率*損失函數的梯度(導數)

設定loss=square(w+1) w初值爲5 反向傳播求最優w 即求最小loss對應的w值   w值最小應爲-1

#coding:utf-8
import tensorflow as tf
w= tf.Variable(tf.constant(5,dtype=tf.float32))

loss = tf.square(w+1)
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

with tf.Session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)
    for i in range(50):
        sess.run(train_step)
        w_val=sess.run(w)
        loss_val = sess.run(loss)
        if (i%5==0):
            print ("after ",i," w is ",w_val," loss is",loss_val)

#learning_rate=0.2
# after  0  w is  2.6  loss is 12.959999
# after  5  w is  -0.720064  loss is 0.07836417
# after  10  w is  -0.9782322  loss is 0.0004738369
# after  15  w is  -0.99830735  loss is 2.8650732e-06
# after  20  w is  -0.9998684  loss is 1.7320417e-08
# after  25  w is  -0.99998975  loss is 1.0510348e-10
# after  30  w is  -0.9999992  loss is 6.004086e-13
# after  35  w is  -0.99999994  loss is 3.5527137e-15
# after  40  w is  -0.99999994  loss is 3.5527137e-15
# after  45  w is  -0.99999994  loss is 3.5527137e-15

#learning_rate=1
# after  0  w is  -7.0  loss is 36.0
# after  5  w is  5.0  loss is 36.0
# after  10  w is  -7.0  loss is 36.0
# after  15  w is  5.0  loss is 36.0
# after  20  w is  -7.0  loss is 36.0
# after  25  w is  5.0  loss is 36.0
# after  30  w is  -7.0  loss is 36.0
# after  35  w is  5.0  loss is 36.0
# after  40  w is  -7.0  loss is 36.0
# after  45  w is  5.0  loss is 36.0

學習率過大震盪不收斂,學習率小了收斂速度慢

指數衰減學習率

learning_rate = LEARNING_RATE_BASE * LEARNING_RATE_DECAY ^ (global_step/LEARNING_RATE_STEP)

  • LEARNING_RATE_BASE:學習率基數,學習率初始值
  • LEARNING_RATE_DECAY:學習率衰減(0,1)
  • LEARNING_RATE_STEP:多少輪更新一次學習率=總樣本數/BATCH_SIZE
  • global_step:運行了幾輪BATCH_SIZE
import tensorflow as tf 
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
LEARNING_RATE_STEP = 1 #喂入多少輪BATCHSIZE後更新一次學習率

trainable=False爲不被訓練

global_step = tf.Variable(0,trainable=False)
#定義指數下降學習率
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True)
w=tf.Variable(tf.constant(5,dtype=tf.float32))
loss = tf.square(w+1)
train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)

生成會話,訓練40輪

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章