簡介
圖像風格遷移是指,將一幅內容圖的內容,和一幅或多幅風格圖的風格融合在一起,從而生成一些有意思的圖片
以下是將一些藝術作品的風格,遷移到一張內容圖之後的效果
我們使用TensorFlow
和Keras
分別來實現圖像風格遷移,主要用到深度學習中的卷積神經網絡,即CNN
準備
安裝包
pip install numpy scipy tensorflow keras
再準備一些風格圖片,和一張內容圖片
原理
爲了將風格圖的風格和內容圖的內容進行融合,所生成的圖片,在內容上應當儘可能接近內容圖,在風格上應當儘可能接近風格圖
因此需要定義內容損失函數和風格損失函數,經過加權後作爲總的損失函數
實現步驟如下
- 隨機產生一張圖片
- 在每輪迭代中,根據總的損失函數,調整圖片的像素值
- 經過多輪迭代,得到優化後的圖片
內容損失函數
兩張圖片在內容上相似,不能僅僅靠簡單的純像素比較
CNN具有抽象和理解圖像的能力,因此可以考慮將各個卷積層的輸出作爲圖像的內容
以VGG19
爲例,其中包括了多個卷積層、池化層,以及最後的全連接層
這裏我們使用conv4_2
的輸出作爲圖像的內容表示,定義內容損失函數如下
風格損失函數
風格是一個很難說清楚的概念,可能是筆觸、紋理、結構、佈局、用色等等
這裏我們使用卷積層各個特徵圖之間的互相關作爲圖像的風格,以conv1_1
爲例
- 共包含64個特徵圖即feature map,或者說圖像的深度、通道的個數
- 每個特徵圖都是對上一層輸出的一種理解,可以類比成64個人對同一幅畫的不同理解
- 這些人可能分別偏好印象派、現代主義、超現實主義、表現主義等不同風格
- 當圖像是某一種風格時,可能這一部分人很欣賞,但那一部分人不喜歡
- 當圖像是另一種風格時,可能這一部分人不喜歡,但那一部分人很欣賞
- 64個人之間理解的差異,可以用特徵圖的互相關表示,這裏使用
Gram
矩陣計算互相關 - 不同的風格會導致差異化的互相關結果
Gram
矩陣的計算如下,如果有64個特徵圖,那麼Gram
矩陣的大小便是64*64
,第i
行第j
列的值表示第i
個特徵圖和第j
個特徵圖之間的互相關,用內積計算
風格損失函數定義如下,對多個卷積層的風格表示差異進行加權
這裏我們使用conv1_1
、conv2_1
、conv3_1
、conv4_1
、conv5_1
五個卷積層,進行風格損失函數的計算,不同的權重會導致不同的遷移效果
總的損失函數
總的損失函數即內容損失函數和風格損失函數的加權,不同的權重會導致不同的遷移效果
TensorFlow實現
加載庫
# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np
import scipy.io
import scipy.misc
import os
import time
def the_current_time():
print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(int(time.time()))))
定義一些變量
CONTENT_IMG = 'content.jpg'
STYLE_IMG = 'style5.jpg'
OUTPUT_DIR = 'neural_style_transfer_tensorflow/'
if not os.path.exists(OUTPUT_DIR):
os.mkdir(OUTPUT_DIR)
IMAGE_W = 800
IMAGE_H = 600
COLOR_C = 3
NOISE_RATIO = 0.7
BETA = 5
ALPHA = 100
VGG_MODEL = 'imagenet-vgg-verydeep-19.mat'
MEAN_VALUES = np.array([123.68, 116.779, 103.939]).reshape((1, 1, 1, 3))
加載VGG19
模型
def load_vgg_model(path):
'''
Details of the VGG19 model:
- 0 is conv1_1 (3, 3, 3, 64)
- 1 is relu
- 2 is conv1_2 (3, 3, 64, 64)
- 3 is relu
- 4 is maxpool
- 5 is conv2_1 (3, 3, 64, 128)
- 6 is relu
- 7 is conv2_2 (3, 3, 128, 128)
- 8 is relu
- 9 is maxpool
- 10 is conv3_1 (3, 3, 128, 256)
- 11 is relu
- 12 is conv3_2 (3, 3, 256, 256)
- 13 is relu
- 14 is conv3_3 (3, 3, 256, 256)
- 15 is relu
- 16 is conv3_4 (3, 3, 256, 256)
- 17 is relu
- 18 is maxpool
- 19 is conv4_1 (3, 3, 256, 512)
- 20 is relu
- 21 is conv4_2 (3, 3, 512, 512)
- 22 is relu
- 23 is conv4_3 (3, 3, 512, 512)
- 24 is relu
- 25 is conv4_4 (3, 3, 512, 512)
- 26 is relu
- 27 is maxpool
- 28 is conv5_1 (3, 3, 512, 512)
- 29 is relu
- 30 is conv5_2 (3, 3, 512, 512)
- 31 is relu
- 32 is conv5_3 (3, 3, 512, 512)
- 33 is relu
- 34 is conv5_4 (3, 3, 512, 512)
- 35 is relu
- 36 is maxpool
- 37 is fullyconnected (7, 7, 512, 4096)
- 38 is relu
- 39 is fullyconnected (1, 1, 4096, 4096)
- 40 is relu
- 41 is fullyconnected (1, 1, 4096, 1000)
- 42 is softmax
'''
vgg = scipy.io.loadmat(path)
vgg_layers = vgg['layers']
def _weights(layer, expected_layer_name):
W = vgg_layers[0][layer][0][0][2][0][0]
b = vgg_layers[0][layer][0][0][2][0][1]
layer_name = vgg_layers[0][layer][0][0][0][0]
assert layer_name == expected_layer_name
return W, b
def _conv2d_relu(prev_layer, layer, layer_name):
W, b = _weights(layer, layer_name)
W = tf.constant(W)
b = tf.constant(np.reshape(b, (b.size)))
return tf.nn.relu(tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b)
def _avgpool(prev_layer):
return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
graph = {}
graph['input'] = tf.Variable(np.zeros((1, IMAGE_H, IMAGE_W, COLOR_C)), dtype='float32')
graph['conv1_1'] = _conv2d_relu(graph['input'], 0, 'conv1_1')
graph['conv1_2'] = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')
graph['avgpool1'] = _avgpool(graph['conv1_2'])
graph['conv2_1'] = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')
graph['conv2_2'] = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')
graph['avgpool2'] = _avgpool(graph['conv2_2'])
graph['conv3_1'] = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')
graph['conv3_2'] = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')
graph['conv3_3'] = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')
graph['conv3_4'] = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')
graph['avgpool3'] = _avgpool(graph['conv3_4'])
graph['conv4_1'] = _conv2d_relu(graph['avgpool3'], 19, 'conv4_1')
graph['conv4_2'] = _conv2d_relu(graph['conv4_1'], 21, 'conv4_2')
graph['conv4_3'] = _conv2d_relu(graph['conv4_2'], 23, 'conv4_3')
graph['conv4_4'] = _conv2d_relu(graph['conv4_3'], 25, 'conv4_4')
graph['avgpool4'] = _avgpool(graph['conv4_4'])
graph['conv5_1'] = _conv2d_relu(graph['avgpool4'], 28, 'conv5_1')
graph['conv5_2'] = _conv2d_relu(graph['conv5_1'], 30, 'conv5_2')
graph['conv5_3'] = _conv2d_relu(graph['conv5_2'], 32, 'conv5_3')
graph['conv5_4'] = _conv2d_relu(graph['conv5_3'], 34, 'conv5_4')
graph['avgpool5'] = _avgpool(graph['conv5_4'])
return graph
內容損失函數
def content_loss_func(sess, model):
def _content_loss(p, x):
N = p.shape[3]
M = p.shape[1] * p.shape[2]
return (1 / (4 * N * M)) * tf.reduce_sum(tf.pow(x - p, 2))
return _content_loss(sess.run(model['conv4_2']), model['conv4_2'])
風格損失函數
STYLE_LAYERS = [('conv1_1', 0.5), ('conv2_1', 1.0), ('conv3_1', 1.5), ('conv4_1', 3.0), ('conv5_1', 4.0)]
def style_loss_func(sess, model):
def _gram_matrix(F, N, M):
Ft = tf.reshape(F, (M, N))
return tf.matmul(tf.transpose(Ft), Ft)
def _style_loss(a, x):
N = a.shape[3]
M = a.shape[1] * a.shape[2]
A = _gram_matrix(a, N, M)
G = _gram_matrix(x, N, M)
return (1 / (4 * N ** 2 * M ** 2)) * tf.reduce_sum(tf.pow(G - A, 2))
return sum([_style_loss(sess.run(model[layer_name]), model[layer_name]) * w for layer_name, w in STYLE_LAYERS])
隨機產生一張初始圖片
def generate_noise_image(content_image, noise_ratio=NOISE_RATIO):
noise_image = np.random.uniform(-20, 20, (1, IMAGE_H, IMAGE_W, COLOR_C)).astype('float32')
input_image = noise_image * noise_ratio + content_image * (1 - noise_ratio)
return input_image
加載圖片
def load_image(path):
image = scipy.misc.imread(path)
image = scipy.misc.imresize(image, (IMAGE_H, IMAGE_W))
image = np.reshape(image, ((1, ) + image.shape))
image = image - MEAN_VALUES
return image
保存圖片
def save_image(path, image):
image = image + MEAN_VALUES
image = image[0]
image = np.clip(image, 0, 255).astype('uint8')
scipy.misc.imsave(path, image)
調用以上函數並訓練模型
the_current_time()
with tf.Session() as sess:
content_image = load_image(CONTENT_IMG)
style_image = load_image(STYLE_IMG)
model = load_vgg_model(VGG_MODEL)
input_image = generate_noise_image(content_image)
sess.run(tf.global_variables_initializer())
sess.run(model['input'].assign(content_image))
content_loss = content_loss_func(sess, model)
sess.run(model['input'].assign(style_image))
style_loss = style_loss_func(sess, model)
total_loss = BETA * content_loss + ALPHA * style_loss
optimizer = tf.train.AdamOptimizer(2.0)
train = optimizer.minimize(total_loss)
sess.run(tf.global_variables_initializer())
sess.run(model['input'].assign(input_image))
ITERATIONS = 2000
for i in range(ITERATIONS):
sess.run(train)
if i % 100 == 0:
output_image = sess.run(model['input'])
the_current_time()
print('Iteration %d' % i)
print('Cost: ', sess.run(total_loss))
save_image(os.path.join(OUTPUT_DIR, 'output_%d.jpg' % i), output_image)
在GPU上跑,花了5分鐘左右,2000輪迭代後是這個樣子
對比原圖
Keras實現
Keras官方提供了圖像風格遷移的例子
https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py
代碼裏引入了一個total variation loss
,翻譯爲全變差正則,據說可以讓生成的圖像更平滑
- Keras相對TensorFlow封裝更高,所以實現已有的模塊更方便,但需要造輪子時較麻煩
- 增加了全變差正則,以生成的圖像作爲參數
- 使用
conv5_2
計算內容損失 - 將內容圖作爲一開始的結果,即不使用隨機產生的圖片
代碼使用方法如下
python neural_style_transfer.py path_to_your_base_image.jpg path_to_your_reference.jpg prefix_for_results
--iter
:迭代次數,默認爲10--content_weight
:內容損失權重,默認爲0.025--style_weight
:風格損失權重,默認爲1.0--tv_weight
:全變差正則權重,默認爲1.0
新建文件夾neural_style_transfer_keras
python main_keras.py content.jpg style5.jpg neural_style_transfer_keras/output
生成的圖片長這樣,10次迭代,花了1分鐘左右
參考
- A Neural Algorithm of Artistic Style:https://arxiv.org/abs/1508.06576
- TensorFlow Implementation of “A Neural Algorithm of Artistic Style”:http://www.chioka.in/tensorflow-implementation-neural-algorithm-of-artistic-style
- 圖像風格遷移簡史:https://zhuanlan.zhihu.com/p/26746283
- 【啄米日常】圖像風格轉移:https://zhuanlan.zhihu.com/p/23479658