文章目錄
DeepDream
DeepDream是一個實驗,它將通過神經網絡學習到的模式可視化。
它通過通過網絡轉發圖像,然後計算圖像相對於特定層的激活的梯度來實現。然後對圖像進行修改以增加這些激活,增強網絡所看到的模式,併產生一個類似於夢境的圖像。
代碼實現
1、導入需要的庫
import tensorflow as tf
import numpy as np
import matplotlib as mpl
import IPython.display as display
import PIL.Image
from tensorflow.keras.preprocessing import image
2、下載並導入圖像
# Download an image and read it into a NumPy array.
def download(url, max_dim=None):
name = url.split('/')[-1]
image_path = tf.keras.utils.get_file(name, origin=url)
img = PIL.Image.open(image_path)
if max_dim:
img.thumbnail((max_dim, max_dim))
return np.array(img)
# Display an image
def show(img):
display.display(PIL.Image.fromarray(np.array(img)))
# Downsizing the image makes it easier to work with.
original_img = download(url, max_dim=500)
show(original_img)
display.display(display.HTML('Image cc-by: <a "href=https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg">Von.grzanka</a>'))
其中img.thumbnail((max_dim, max_dim)) 用來將圖片進行縮放,如圖片原尺寸爲(800, 1000, 3),如果max_dim=500,那麼得到的圖片尺寸變成(400, 500, 3)。
3、導入InceptionV3模型
base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')
此處遷移學習可參考:Tensorflow2.0之tf.keras.applacations遷移學習。
4、改變模型輸出爲要提取特徵的層
查看InceptionV3模型的層
for layer in base_model.layers:
print(layer.name)
選出要提取特徵的層
# Maximize the activations of these layers
names = ['mixed3', 'mixed5']
layers = [base_model.get_layer(name).output for name in names]
改變模型輸出
# Create the feature extraction model
dream_model = tf.keras.Model(inputs=base_model.input, outputs=layers)
5、定義損失函數
損失是所選層中激活值的總和。在每一層我們都會對損失標準化,因此來自較大層的貢獻不會超過較小層。通常,損失是你希望通過梯度下降來最小化的變量。但在DeepDream中,我們要通過梯度上升最大化這個損失。
def calc_loss(img, model):
# Pass forward the image through the model to retrieve the activations.
# Converts the image into a batch of size 1.
img_batch = tf.expand_dims(img, axis=0)
layer_activations = model(img_batch)
if len(layer_activations) == 1:
layer_activations = [layer_activations]
losses = []
for act in layer_activations:
loss = tf.math.reduce_mean(act)
losses.append(loss)
return tf.reduce_sum(losses)
6、定義一次梯度上升
計算完所選層的損失後,剩下的就是計算相對於圖像的梯度,並將其添加到原始圖像中。
def train_step(img, step_size):
with tf.GradientTape() as tape:
# `GradientTape` only watches `tf.Variable`s by default
# tape.watch(tf.constant(img))
# This needs gradients relative to `img`
img = tf.Variable(img)
loss = calc_loss(img, dream_model)
# Calculate the gradient of the loss with respect to the pixels of the input image.
gradients = tape.gradient(loss, img)
# Normalize the gradients.
gradients /= tf.math.reduce_std(gradients) + 1e-8
# In gradient ascent, the "loss" is maximized so that the input image increasingly "excites" the layers.
# You can update the image by directly adding the gradients (because they're the same shape!)
img = img + gradients*step_size
img = tf.clip_by_value(img, -1, 1)
return loss, img
7、訓練模型
圖像標準化
因爲在訓練過程中要繪製圖像,但在上一步中我們將得到的圖像像素值限制在[-1, 1]中了,所以在這裏我們需要規定一個函數,使得到的圖片像素在[0, 255]之間。
# Normalize an image
def deprocess(img):
img = 255*(img + 1.0)/2.0
return tf.cast(img, tf.uint8)
定義訓練函數
def run_deep_dream_simple(img, steps=100, step_size=0.01):
# Convert from uint8 to the range expected by the model.
img = tf.keras.applications.inception_v3.preprocess_input(img)
img = tf.convert_to_tensor(img)
step_size = tf.convert_to_tensor(step_size)
for step in range(steps):
loss, img = train_step(img, tf.constant(step_size))
if step % 2 == 0:
display.clear_output(wait=True)
show(deprocess(img))
print ("Step {}, loss {}".format(step, loss))
result = deprocess(img)
display.clear_output(wait=True)
show(result)
return result
進行訓練
dream_img = run_deep_dream_simple(img=original_img,
steps=100, step_size=0.01)
8、八度(Octave)
訓練結果還不錯,但是這裏有幾個問題:
- 輸出有噪聲。
- 圖像分辨率低。
- 這些模式看起來都是在同一粒度上發生的。
解決所有這些問題的一種方法是在不同尺度上應用梯度上升。這將允許在小尺度上生成的圖案可以合併到更大尺度上的圖案中,並用額外的細節填充。
要做到這一點,我們可以執行之前的梯度上升方法,然後增加圖像的大小(這被稱爲一個八度),並重復這個過程。
OCTAVE_SCALE = 1.30
img = tf.constant(np.array(original_img))
base_shape = tf.shape(img)[:-1]
float_base_shape = tf.cast(base_shape, tf.float32)
for n in range(-2, 3):
new_shape = tf.cast(float_base_shape*(OCTAVE_SCALE**n), tf.int32)
img = tf.image.resize(img, new_shape).numpy()
img = run_deep_dream_simple(img=img, steps=50, step_size=0.01)
display.clear_output(wait=True)
img = tf.image.resize(img, base_shape)
img = tf.image.convert_image_dtype(img/255.0, dtype=tf.uint8)
show(img)
9、平鋪計算(tiled computation)
需要考慮的一件事是,隨着圖像尺寸的增加,執行梯度計算所需的時間和內存也會隨之增加。上一個部分的八度不會對非常大的圖像起作用。
若要避免此問題,可以將圖像分割爲多個矩形碎片並計算每個碎片的梯度。我們稱此方法爲平鋪計算。
在每次平鋪計算之前對圖像應用隨機移動可防止平鋪接縫出現。
隨機移動
def random_roll(img, maxroll):
# Randomly shift the image to avoid tiled boundaries.
shift = tf.random.uniform(shape=[2], minval=-maxroll, maxval=maxroll, dtype=tf.int32)
shift_down, shift_right = shift[0],shift[1]
img_rolled = tf.roll(tf.roll(img, shift_right, axis=1), shift_down, axis=0)
return shift_down, shift_right, img_rolled
隨機移動的效果如下:
shift_down, shift_right, img_rolled = random_roll(np.array(original_img), 512)
show(img_rolled)
定義一次梯度上升
def train_step(img, tile_size=512):
shift_down, shift_right, img_rolled = random_roll(img, tile_size)
# Initialize the image gradients to zero.
gradients = tf.zeros_like(img_rolled)
# Skip the last tile, unless there's only one tile.
xs = tf.range(0, img_rolled.shape[0], tile_size)[:-1]
if not tf.cast(len(xs), bool):
xs = tf.constant([0])
ys = tf.range(0, img_rolled.shape[1], tile_size)[:-1]
if not tf.cast(len(ys), bool):
ys = tf.constant([0])
for x in xs:
for y in ys:
# Calculate the gradients for this tile.
with tf.GradientTape() as tape:
# This needs gradients relative to `img_rolled`.
# `GradientTape` only watches `tf.Variable`s by default.
tape.watch(img_rolled)
# Extract a tile out of the image.
img_tile = img_rolled[x:x+tile_size, y:y+tile_size]
loss = calc_loss(img_tile, dream_model)
# Update the image gradients for this tile.
gradients = gradients + tape.gradient(loss, img_rolled)
# Undo the random shift applied to the image and its gradients.
gradients = tf.roll(tf.roll(gradients, -shift_right, axis=1), -shift_down, axis=0)
# Normalize the gradients.
gradients /= tf.math.reduce_std(gradients) + 1e-8
return gradients
其中:
# Skip the last tile, unless there's only one tile.
xs = tf.range(0, img_rolled.shape[0], tile_size)[:-1]
if not tf.cast(len(xs), bool):
xs = tf.constant([0])
ys = tf.range(0, img_rolled.shape[1], tile_size)[:-1]
if not tf.cast(len(ys), bool):
ys = tf.constant([0])
用來將圖片的長寬分段,從而能在下面的操作中對圖片的每一部分單獨(矩形碎片)進行梯度計算。
定義訓練函數
def run_deep_dream_with_octaves(img, steps_per_octave=100, step_size=0.01,
octaves=range(-2,3), octave_scale=1.3):
base_shape = tf.shape(img)
img = tf.keras.preprocessing.image.img_to_array(img)
img = tf.keras.applications.inception_v3.preprocess_input(img)
img = tf.convert_to_tensor(img)
for octave in octaves:
# Scale the image based on the octave
new_size = tf.cast(tf.convert_to_tensor(base_shape[:-1]), tf.float32)*(octave_scale**octave)
img = tf.image.resize(img, tf.cast(new_size, tf.int32))
for step in range(steps_per_octave):
gradients = train_step(img)
img = img + gradients*step_size
img = tf.clip_by_value(img, -1, 1)
if step % 10 == 0:
display.clear_output(wait=True)
show(deprocess(img))
print ("Octave {}, Step {}".format(octave, step))
result = deprocess(img)
return result
訓練模型,展示結果
img = run_deep_dream_with_octaves(img=original_img, step_size=0.01)
display.clear_output(wait=True)
img = tf.image.resize(img, base_shape)
img = tf.image.convert_image_dtype(img/255.0, dtype=tf.uint8)
show(img)