TensorFlow2.x 學習筆記(六)隨機梯度下降以及數據可視化

梯度下降

簡介

梯度

梯度是一個向量,表示某一函數在該點處的方向導數沿着該方向取得最大值。
gradf(x,y)=f(x,y)=(fx,fy)=fxi+fxjgradf(x,y) = \nabla{f(x,y)} = (\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}) = \frac{\partial f}{\partial x}i + \frac{\partial f}{\partial x}j

利用梯度優化

梯度方向是函數增長最快的方向,因此搜索函數最小值的過程就是不斷向負梯度方向移動的過程
θt+1=θtαtf(θt)\theta_{t+1}= \theta_t - \alpha_t\nabla{f(\theta_t)}

AutoGrad with Tensorflow

GradientTape

  • with tf.GradientTape() as tape:
    • Build computation graph
    • loss=fθ(x)loss = f_\theta(x)
  • [w_grad] = tape.gradient(loss,[w])
w = tf.constant(1.)
b = tf.constant(2.)
x = tf.constant(3.)
y = w*x

with tf.GradientTape() as tape:
    tape.watch([w])
    y2 = w * x
grad1 = tape.gradient(y, [w])
print(grad1)#[None]
grad2 = tape.gradient(y2, [w])#non-persistent error
with tf.GradientTape() as tape:
    tape.watch([w])
    y2 = w * x
grad2 = tape.gradient(y2, [w])
print(grad2)#2

Persistent GradientTape

non-persistent 只能調用一次,用完就會釋放顯存,可以開啓persistent選項來解決這個問題,用完以後記得手動釋放

with tf.GradientTape(persistent=True) as tape:
    tape.watch([w])
    y = w * x
grad1 = tape.gradient(y, [w])
print(grad1)
grad1 = tape.gradient(y, [w])
print(grad1)
del tape
grad1 = tape.gradient(y, [w])
print(grad1)

2ndorder2^{nd}-order

w = tf.Variable(1.0)
b = tf.Variable(2.0)
x = tf.Variable(3.0)

with tf.GradientTape() as t1:
    with tf.GradientTape() as t2:
        y = w * x * x + w * b
    dx,db = t2.gradient(y, [x, b])
    print(dx,db)
dx2 = t1.gradient(dx, [x])
print(dx2)

激活函數及其梯度

Sigmoid/Logistic

  • f(x)=σ(x)=11+exf(x) = \sigma(x) = \frac{1}{1+e^{-x}}
  • f(x)(0,1)f(x) \in (0,1)
  • ddxσ(x)=ddx(11+ex)=σ(x)σ(x)2\frac{d}{dx}\sigma(x) = \frac{d}{dx}(\frac{1}{1+e^{-x}}) = \sigma(x) -\sigma(x)^2
  • 優點:光滑,取值在(0,1)之間
  • 缺點:在遠處梯度很小
a = tf.linspace(-10., 10., 10)

with tf.GradientTape() as tape:
    tape.watch(a)
    y = tf.sigmoid(a)
da = tape.gradient(y, [a])
print(a)
print(y)
print(da)

Tanh

  • f(x)=tanh(x)=(exex)(ex+ex)=2σ(2x)1f(x) = tanh(x) = \frac{(e^x - e^{-x})}{(e^x + e^{-x})} = 2\sigma(2x) - 1
  • f(x)(1,1)f(x) \in (-1,1)
  • ddxtanh(x)=ddx(exex)(ex+ex)=1tanh2(x)\frac{d}{dx}tanh(x) = \frac{d}{dx}\frac{(e^x - e^{-x})}{(e^x + e^{-x})} = 1 - tanh^2(x)
tf.tanh(a)

ReLU

f(x)={0forx<0xforx>0f(x) = \left\{ \begin{array}{rcl} 0 & \mathrm{for} & x<0 \\ x & \mathrm{for} & x>0 \\ \end{array}\right.
f(x)={0forx<01forx0f'(x) = \left\{ \begin{array}{rcl} 0 & \mathrm{for} & x<0 \\ 1 & \mathrm{for} & x\geq0 \\ \end{array}\right.

tf.nn.relu(x)
tf.nn.leaky_relu(x)#x<0時梯度爲一個很小的正數

Loss及其梯度

  • MSE(Mean Squared Error)
  • Cross Entropy Loss

MSE

  • loss=Σ[y(xw+b)]2loss = \Sigma[y - (xw + b)]^2
  • θloss=2Σ[yfθ(x)]θfθ(x)\nabla_{\theta} loss = 2\Sigma[y - f_{\theta}(x)] * \nabla_{\theta} f_{\theta}(x)

Softmax

S(yi)=eyijeyiS(y_i)=\frac{e^{y_i}}{\sum\limits_{j}{e^{y_i}}}
piaj={pi(1pj)ifi=jpipjifij\frac{\partial{p_i}}{\partial{a_j}} = \left\{ \begin{array}{rcl} p_i(1-p_j) & \mathrm{if} & i = j \\ -p_i\cdot p_j & \mathrm{if} & i\neq j \\ \end{array}\right.

x = tf.random.normal([2, 4])
w = tf.random.normal([4, 3])
b = tf.zeros([3])
y = tf.constant([2, 0])

with tf.GradientTape() as tape:
    tape.watch([w,b])
    prob = tf.nn.softmax(x@w + b)
    loss = tf.reduce_mean(tf.keras.losses.MSE(tf.one_hot(y, depth = 3), prob))
grads = tape.gradient(loss, [w,b])
print(grads[0])
print(grads[1])

Crossentropy gradient

上一章已經寫了,所以這裏不介紹了

with tf.GradientTape() as tape:
    tape.watch([w,b])
    logits = x@w + b
    loss = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(tf.one_hot(y, depth = 3), logits, from_logits=True))
grads = tape.gradient(loss, [w,b])
print(grads[0])

Chain Rule

  • 利用鏈式法則進行反向傳播

由於我只是學一下tf的使用,因此課程中關於單層和多層感知機的反向傳播的推導就在這不贅述了,感興趣的可以去看吳恩達的深度學習的網課

可視化

  • tensorboard(tf)
  • Visdom(pytorch)

tensorboard

  • listen logdir
  • build summary instance
  • fed data into summary instance
# cd 到你的任務目錄,0.0.0.0用以資磁remote,端口自定義防佔用
tensorboard --logdir=./logs --host 0.0.0.0 --port=11021

create_file_writer()

current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
log_dir = 'logs/' + current_time
summary_writer = tf.summary.create_file_writer(log_dir) 

image

#以mnist爲例,不過不支持subplot,只能手寫
sample_img = tf.reshape(sample_img, [1, 28, 28, 1])
with summary_writer.as_default():
    tf.summary.image("Training sample:", sample_img, step=0)

手寫一個subplot

def plot_to_image(figure):
  """Converts the matplotlib plot specified by 'figure' to a PNG image and
  returns it. The supplied figure is closed and inaccessible after this call."""
  # Save the plot to a PNG in memory.
  buf = io.BytesIO()
  plt.savefig(buf, format='png')
  # Closing the figure prevents it from being displayed directly inside
  # the notebook.
  plt.close(figure)
  buf.seek(0)
  # Convert PNG buffer to TF image
  image = tf.image.decode_png(buf.getvalue(), channels=4)
  # Add the batch dimension
  image = tf.expand_dims(image, 0)
  return image

def image_grid(images):
  """Return a 5x5 grid of the MNIST images as a matplotlib figure."""
  # Create a figure to contain the plot.
  # https://morvanzhou.github.io/tutorials/data-manipulation/plt/4-1-subpot1/
  # 可以通過以上鍊接學習subplot
  figure = plt.figure(figsize=(10,10))
  for i in range(25):
    # Start next subplot.
    plt.subplot(5, 5, i + 1, title='name')
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(images[i], cmap=plt.cm.binary)
  
  return figure
val_images = x[:25]
val_images = tf.reshape(val_images, [-1, 28, 28, 1])
with summary_writer.as_default():
	val_images = tf.reshape(val_images, [-1, 28, 28])
	figure  = image_grid(val_images)
	tf.summary.image('val-images:', plot_to_image(figure), step=step)

scalar

with summary_writer.as_default(): 
    tf.summary.scalar('train-loss', float(loss), step=step) 

以上只是tensorboard的簡單說明,可以查看TensorBoard 文檔

這裏記一個tensorflow的學習網站:
簡單粗暴tensorflow2

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章