學習 Tensorflow實踐
損失函數(loss)
Table of Contents
一、激活函數 activation function
引入激活函數,可以有效避免XW的純線性組合,提高模型的表達力,使模型更有區分度
1、relu激活函數 用tf.nn.relu()表示
2、sigmoid激活函數 用tf.nn.sigmoid()表示
3、tanh激活函數 用tf.nn.tamh()表示
二、NN複雜度:多用NN層數和N參數的個數表示
計算神經網絡層數時只計算有計算能力的層,所以不計算輸入層
層數=隱藏層的層數+1個輸出層
總參數=總W+總b
上圖3*4+4 + 4*2+2=26
NN優化目標:loss最小
主流loss計算:
- MSE(Mean Squared Error)均方誤差
- CE(Cross Entropy)交叉熵
- 自定義
——————————————————————————————————————————
擬造數據集 X,Y_ X中有x1,x2 y_=x1+x2 噪聲-0.05~0.05 擬合可以預測y的函數
import tensorflow as tf
import numpy as np
BATCH_SIZE = 8 #每次喂入神經網絡的特徵數量
seed=23455
rdm= np.random.RandomState(seed)
X=rdm.rand(32,2)
Y_=[[x1+x2+(rdm.rand()/10.0-0.05)] for (x1,x2) in X]
定義神經網絡輸入輸出及前向傳播過程
x = tf.placeholder(tf.float32,shape=(None,2))
y_= tf.placeholder(tf.float32,shape=(None,1))
w1 = tf.Variable(tf.random_normal([2,1],stddev=1,seed=1))
y=tf.matmul(x,w1)
loss_mse = tf.reduce_mean(tf.square(y_-y))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)
生成會話 訓練STEPS輪
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
STEPS =20000
for i in range(STEPS):
start=(i*BATCH_SIZE)%32
end = (i*BATCH_SIZE)% 32 +BATCH_SIZE
sess.run(train_step,feed_dict={x:X[start:end],y_:Y_[start:end]})
if (i%431) == 0 :
print(start,end)
print("after",i)
print(sess.run(w1))
print("final",sess.run(w1))
——————————————————————————————————————
交叉熵ce
表徵兩個概率分佈之間的距離
eg:已知答案y_=(1,0) 預測y1=(0.6,0.4) y2=(0.8,0.2)哪個更接近標準答案
ce = -tf.reduce_mean(y_*tf.log(tf.clip_by_value(y,1e-12,1.0)))
#y小於1e-12爲1e-12 大於1.0爲1.0
softmax函數
當n分類的n個輸出 通過softmax 函數,便滿足了概率分佈的要求 使得每一個元素的範圍都在(0,1)之間,並且所有元素的和爲1
可以看這張圖片來差不多理解softmax函數的作用
輸出經過softmax函數 使其滿足概率分佈後在於標準答案求交叉熵 輸出爲cem 即爲loss
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem = tf.reduce_mean(ce)
學習率learning_rate:每次參數更新的幅度
更新後的參數 = 當前參數-學習率*損失函數的梯度(導數)
設定loss=square(w+1) w初值爲5 反向傳播求最優w 即求最小loss對應的w值 w值最小應爲-1
#coding:utf-8
import tensorflow as tf
w= tf.Variable(tf.constant(5,dtype=tf.float32))
loss = tf.square(w+1)
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
for i in range(50):
sess.run(train_step)
w_val=sess.run(w)
loss_val = sess.run(loss)
if (i%5==0):
print ("after ",i," w is ",w_val," loss is",loss_val)
#learning_rate=0.2
# after 0 w is 2.6 loss is 12.959999
# after 5 w is -0.720064 loss is 0.07836417
# after 10 w is -0.9782322 loss is 0.0004738369
# after 15 w is -0.99830735 loss is 2.8650732e-06
# after 20 w is -0.9998684 loss is 1.7320417e-08
# after 25 w is -0.99998975 loss is 1.0510348e-10
# after 30 w is -0.9999992 loss is 6.004086e-13
# after 35 w is -0.99999994 loss is 3.5527137e-15
# after 40 w is -0.99999994 loss is 3.5527137e-15
# after 45 w is -0.99999994 loss is 3.5527137e-15
#learning_rate=1
# after 0 w is -7.0 loss is 36.0
# after 5 w is 5.0 loss is 36.0
# after 10 w is -7.0 loss is 36.0
# after 15 w is 5.0 loss is 36.0
# after 20 w is -7.0 loss is 36.0
# after 25 w is 5.0 loss is 36.0
# after 30 w is -7.0 loss is 36.0
# after 35 w is 5.0 loss is 36.0
# after 40 w is -7.0 loss is 36.0
# after 45 w is 5.0 loss is 36.0
學習率過大震盪不收斂,學習率小了收斂速度慢
指數衰減學習率
learning_rate = LEARNING_RATE_BASE * LEARNING_RATE_DECAY ^ (global_step/LEARNING_RATE_STEP)
- LEARNING_RATE_BASE:學習率基數,學習率初始值
- LEARNING_RATE_DECAY:學習率衰減(0,1)
- LEARNING_RATE_STEP:多少輪更新一次學習率=總樣本數/BATCH_SIZE
- global_step:運行了幾輪BATCH_SIZE
import tensorflow as tf
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
LEARNING_RATE_STEP = 1 #喂入多少輪BATCHSIZE後更新一次學習率
trainable=False爲不被訓練
global_step = tf.Variable(0,trainable=False)
#定義指數下降學習率
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True)
w=tf.Variable(tf.constant(5,dtype=tf.float32))
loss = tf.square(w+1)
train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
生成會話,訓練40輪