文章目錄
梯度下降
簡介
梯度
梯度是一個向量,表示某一函數在該點處的方向導數沿着該方向取得最大值。
利用梯度優化
梯度方向是函數增長最快的方向,因此搜索函數最小值的過程就是不斷向負梯度方向移動的過程
AutoGrad with Tensorflow
GradientTape
- with tf.GradientTape() as tape:
- Build computation graph
- [w_grad] = tape.gradient(loss,[w])
w = tf.constant(1.)
b = tf.constant(2.)
x = tf.constant(3.)
y = w*x
with tf.GradientTape() as tape:
tape.watch([w])
y2 = w * x
grad1 = tape.gradient(y, [w])
print(grad1)#[None]
grad2 = tape.gradient(y2, [w])#non-persistent error
with tf.GradientTape() as tape:
tape.watch([w])
y2 = w * x
grad2 = tape.gradient(y2, [w])
print(grad2)#2
Persistent GradientTape
non-persistent 只能調用一次,用完就會釋放顯存,可以開啓persistent選項來解決這個問題,用完以後記得手動釋放
with tf.GradientTape(persistent=True) as tape:
tape.watch([w])
y = w * x
grad1 = tape.gradient(y, [w])
print(grad1)
grad1 = tape.gradient(y, [w])
print(grad1)
del tape
grad1 = tape.gradient(y, [w])
print(grad1)
w = tf.Variable(1.0)
b = tf.Variable(2.0)
x = tf.Variable(3.0)
with tf.GradientTape() as t1:
with tf.GradientTape() as t2:
y = w * x * x + w * b
dx,db = t2.gradient(y, [x, b])
print(dx,db)
dx2 = t1.gradient(dx, [x])
print(dx2)
激活函數及其梯度
Sigmoid/Logistic
- 優點:光滑,取值在(0,1)之間
- 缺點:在遠處梯度很小
a = tf.linspace(-10., 10., 10)
with tf.GradientTape() as tape:
tape.watch(a)
y = tf.sigmoid(a)
da = tape.gradient(y, [a])
print(a)
print(y)
print(da)
Tanh
tf.tanh(a)
ReLU
tf.nn.relu(x)
tf.nn.leaky_relu(x)#x<0時梯度爲一個很小的正數
Loss及其梯度
- MSE(Mean Squared Error)
- Cross Entropy Loss
MSE
Softmax
x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])
with tf.GradientTape() as tape:
tape.watch([w,b])
prob = tf.nn.softmax(x@w + b)
loss = tf.reduce_mean(tf.keras.losses.MSE(tf.one_hot(y, depth = 3), prob))
grads = tape.gradient(loss, [w,b])
print(grads[0])
print(grads[1])
Crossentropy gradient
上一章已經寫了,所以這裏不介紹了
with tf.GradientTape() as tape:
tape.watch([w,b])
logits = x@w + b
loss = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.one_hot(y, depth = 3), logits, from_logits=True))
grads = tape.gradient(loss, [w,b])
print(grads[0])
Chain Rule
- 利用鏈式法則進行反向傳播
由於我只是學一下tf的使用,因此課程中關於單層和多層感知機的反向傳播的推導就在這不贅述了,感興趣的可以去看吳恩達的深度學習的網課
可視化
- tensorboard(tf)
- Visdom(pytorch)
tensorboard
- listen logdir
- build summary instance
- fed data into summary instance
# cd 到你的任務目錄,0.0.0.0用以資磁remote,端口自定義防佔用
tensorboard --logdir=./logs --host 0.0.0.0 --port=11021
create_file_writer()
current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
log_dir = 'logs/' + current_time
summary_writer = tf.summary.create_file_writer(log_dir)
image
#以mnist爲例,不過不支持subplot,只能手寫
sample_img = tf.reshape(sample_img, [1, 28, 28, 1])
with summary_writer.as_default():
tf.summary.image("Training sample:", sample_img, step=0)
手寫一個subplot
def plot_to_image(figure):
"""Converts the matplotlib plot specified by 'figure' to a PNG image and
returns it. The supplied figure is closed and inaccessible after this call."""
# Save the plot to a PNG in memory.
buf = io.BytesIO()
plt.savefig(buf, format='png')
# Closing the figure prevents it from being displayed directly inside
# the notebook.
plt.close(figure)
buf.seek(0)
# Convert PNG buffer to TF image
image = tf.image.decode_png(buf.getvalue(), channels=4)
# Add the batch dimension
image = tf.expand_dims(image, 0)
return image
def image_grid(images):
"""Return a 5x5 grid of the MNIST images as a matplotlib figure."""
# Create a figure to contain the plot.
# https://morvanzhou.github.io/tutorials/data-manipulation/plt/4-1-subpot1/
# 可以通過以上鍊接學習subplot
figure = plt.figure(figsize=(10,10))
for i in range(25):
# Start next subplot.
plt.subplot(5, 5, i + 1, title='name')
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(images[i], cmap=plt.cm.binary)
return figure
val_images = x[:25]
val_images = tf.reshape(val_images, [-1, 28, 28, 1])
with summary_writer.as_default():
val_images = tf.reshape(val_images, [-1, 28, 28])
figure = image_grid(val_images)
tf.summary.image('val-images:', plot_to_image(figure), step=step)
scalar
with summary_writer.as_default():
tf.summary.scalar('train-loss', float(loss), step=step)
以上只是tensorboard的簡單說明,可以查看TensorBoard 文檔
這裏記一個tensorflow的學習網站:
簡單粗暴tensorflow2