本文轉載自http://blog.csdn.net/u012436149/article/details/53905797
gradient
tensorflow
中有一個計算梯度的函數tf.gradients(ys, xs)
,要注意的是,xs
中的x
必須要與ys
相關,不相關的話,會報錯。
代碼中定義了兩個變量w1
, w2
, 但res
只與w1
相關
#wrong
import tensorflow as tf
w1 = tf.Variable([[1,2]])
w2 = tf.Variable([[3,4]])
res = tf.matmul(w1, [[2],[1]])
grads = tf.gradients(res,[w1,w2])
with tf.Session() as sess:
tf.global_variables_initializer().run()
re = sess.run(grads)
print(re)
錯誤信息
TypeError: Fetch argument None has invalid type
# right
import tensorflow as tf
w1 = tf.Variable([[1,2]])
w2 = tf.Variable([[3,4]])
res = tf.matmul(w1, [[2],[1]])
grads = tf.gradients(res,[w1])
with tf.Session() as sess:
tf.global_variables_initializer().run()
re = sess.run(grads)
print(re)
# [array([[2, 1]], dtype=int32)]
對於grad_ys
的測試:
import tensorflow as tf
w1 = tf.get_variable('w1', shape=[3])
w2 = tf.get_variable('w2', shape=[3])
w3 = tf.get_variable('w3', shape=[3])
w4 = tf.get_variable('w4', shape=[3])
z1 = w1 + w2+ w3
z2 = w3 + w4
grads = tf.gradients([z1, z2], [w1, w2, w3, w4], grad_ys=[tf.convert_to_tensor([2.,2.,3.]),
tf.convert_to_tensor([3.,2.,4.])])
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(grads))
[array([ 2., 2., 3.],dtype=float32),
array([ 2., 2., 3.], dtype=float32),
array([ 5., 4., 7.], dtype=float32),
array([ 3., 2., 4.], dtype=float32)]
tf.stop_gradient()
阻擋節點BP
的梯度
import tensorflow as tf
w1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)
a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)
# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)
gradients = tf.gradients(b, xs=[w1, w2])
print(gradients)
#輸出
#[None, <tf.Tensor 'gradients/Mul_1_grad/Reshape_1:0' shape=() dtype=float32>]
可見,一個節點
被 stop
之後,這個節點上的梯度,就無法再向前BP
了。由於w1
變量的梯度只能來自a
節點,所以,計算梯度返回的是None
。
a = tf.Variable(1.0)
b = tf.Variable(1.0)
c = tf.add(a, b)
c_stoped = tf.stop_gradient(c)
d = tf.add(a, b)
e = tf.add(c_stoped, d)
gradients = tf.gradients(e, xs=[a, b])
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(gradients))
#輸出 [1.0, 1.0]
雖然 c
節點被stop
了,但是a,b
還有從d
傳回的梯度,所以還是可以輸出梯度值的。
import tensorflow as tf
w1 = tf.Variable(2.0)
w2 = tf.Variable(2.0)
a = tf.multiply(w1, 3.0)
a_stoped = tf.stop_gradient(a)
# b=w1*3.0*w2
b = tf.multiply(a_stoped, w2)
opt = tf.train.GradientDescentOptimizer(0.1)
gradients = tf.gradients(b, xs=tf.trainable_variables())
tf.summary.histogram(gradients[0].name, gradients[0])# 這裏會報錯,因爲gradients[0]是None
#其它地方都會運行正常,無論是梯度的計算還是變量的更新。總覺着tensorflow這麼設計有點不好,
#不如改成流過去的梯度爲0
train_op = opt.apply_gradients(zip(gradients, tf.trainable_variables()))
print(gradients)
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(train_op))
print(sess.run([w1, w2]))
高階導數
tensorflow
求 高階導數可以使用 tf.gradients
來實現
import tensorflow as tf
with tf.device('/cpu:0'):
a = tf.constant(1.)
b = tf.pow(a, 2)
grad = tf.gradients(ys=b, xs=a) # 一階導
print(grad[0])
grad_2 = tf.gradients(ys=grad[0], xs=a) # 二階導
grad_3 = tf.gradients(ys=grad_2[0], xs=a) # 三階導
print(grad_3)
with tf.Session() as sess:
print(sess.run(grad_3))
- Note: 有些 op,tf 沒有實現其高階導的計算,例如 tf.add …, 如果計算了一個沒有實現 高階導的 op的高階導, gradients 會返回 None。