深度卷積生成對抗網絡
Deep Convolutional Generative Adversarial Networks
GANs如何工作的基本思想。可以從一些簡單的,易於抽樣的分佈,如均勻分佈或正態分佈中提取樣本,並將其轉換成與某些數據集的分佈相匹配的樣本。雖然例子匹配一個二維高斯分佈得到了交叉點,不是特別令人興奮。
將演示如何使用GANs生成照片級真實感圖像。將以深卷積GAN(DCGAN)爲基礎建立模型。將借用卷積體系結構,已經被證明在區分計算機視覺問題上是如此成功,並展示如何通過GANs來利用來生成真實感圖像。
from mxnet
import gluon, init, np, npx
from mxnet.gluon
import nn
from d2l
import mxnet as d2l
npx.set_np()
- The Pokemon Dataset
將使用的數據集是從pokemondb獲得的Pokemon精靈集合。首先下載、提取並加載此數據集。
#@save
d2l.DATA_HUB[‘pokemon’] = (d2l.DATA_URL
- ‘pokemon.zip’,
‘c065c0e2593b8b161a2d7873e42418bf6a21106c’)
data_dir = d2l.download_extract(‘pokemon’)
pokemon = gluon.data.vision.datasets.ImageFolderDataset(data_dir)
Downloading …/data/pokemon.zip from
http://d2l-data.s3-accelerate.amazonaws.com/pokemon.zip…
將每個圖像調整爲64×64。ToTensor變換將像素值投影到[0,1],而生成器將使用tanh函數來獲得[-1,1]。因此,用0.5平均值和0.5與值範圍匹配的標準偏差。
batch_size = 256
transformer = gluon.data.vision.transforms.Compose([
gluon.data.vision.transforms.Resize(64),
gluon.data.vision.transforms.ToTensor(),
gluon.data.vision.transforms.Normalize(0.5, 0.5)])
data_iter = gluon.data.DataLoader(
pokemon.transform_first(transformer), batch_size=batch_size,
shuffle=True, num_workers=d2l.get_dataloader_workers())
讓想象一下前20幅圖像。
d2l.set_figsize((4, 4))
for X, y in data_iter:
imgs = X[0:20,:,:,:].transpose(0, 2, 3, 1)/2+0.5
d2l.show_images(imgs, num_rows=4, num_cols=5)
break
2. The Generator
生成器Generator需要映射噪聲變量z∈Rd,一個length-d圖像的寬度和寬度64×64。使用轉置卷積層來擴大輸入大小的全卷積網絡。生成器的基本塊包含一個轉置卷積層,然後是批處理規範化和ReLU激活。
class G_block(nn.Block):
def __init__(self, channels, kernel_size=4,
strides=2, padding=1, **kwargs):
super(G_block, self).__init__(**kwargs)
self.conv2d_trans = nn.Conv2DTranspose(
channels, kernel_size, strides, padding, use_bias=False)
self.batch_norm = nn.BatchNorm()
self.activation = nn.Activation('relu')
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d_trans(X)))
x = np.zeros((2, 3, 16, 16))
g_blk = G_block(20)
g_blk.initialize()
g_blk(x).shape
(2, 20, 32, 32)
如果將轉置卷積層更改爲4×4內核,1×1跨步和零填充。輸入大小爲1×1輸出寬度和高度分別增加3。
x = np.zeros((2, 3, 1, 1))
g_blk = G_block(20, strides=1, padding=0)
g_blk.initialize()
g_blk(x).shape
(2, 20, 4, 4)
生成器由四個基本塊組成,將輸入的寬度和高度從1增加到32。同時,首先將潛在變量投影到64×8通道,然後每次將通道減半。最後,利用轉置卷積層產生輸出。進一步將寬度和高度加倍以匹配所需的64×64形狀,並將通道大小減小到3。tanh激活函數用於將輸出值投影到(-1,1)範圍。
n_G = 64
net_G = nn.Sequential()
net_G.add(G_block(n_G8, strides=1, padding=0), # output: (648, 4, 4)
G_block(n_G*4), # output: (64*4, 8, 8)
G_block(n_G*2), # output: (64*2, 16, 16)
G_block(n_G), # output: (64,
32, 32)
nn.Conv2DTranspose(
3, kernel_size=4, strides=2, padding=1, use_bias=False,
activation='tanh')) # output: (3, 64, 64)
生成一個100維的潛在變量來驗證生成器的輸出形狀。
x = np.zeros((1, 100, 1, 1))
net_G.initialize()
net_G(x).shape
(1, 3, 64, 64)
- Discriminator
該鑑別器Discriminator是一個普通的卷積網絡,但使用泄漏ReLU作爲其激活函數。鑑於α∈[0,1]。R ReLU if α=0, and an identity function if α=1.
For α∈(0,1), leaky ReLU。可以看出,如果α=0,以及如果α=1. 爲α∈(0,1),leaky ReLU是一個非線性函數,對負輸入給出非零輸出。目的是解決“dying ReLU”問題,即神經元可能總是輸出一個負值,因此由於ReLU的梯度爲0,因此無法取得任何進展。
alphas = [0, 0.2, 0.4, .6, .8, 1]
x = np.arange(-2, 1, 0.1)
Y = [nn.LeakyReLU(alpha)(x).asnumpy() for alpha in alphas]
d2l.plot(x.asnumpy(), Y, ‘x’, ‘y’, alphas)
鑑別器的基本模塊是卷積層,然後是批處理規範化層和泄漏ReLU激活。卷積層的超參數類似於生成塊中的轉置卷積層。
class D_block(nn.Block):
def __init__(self, channels, kernel_size=4, strides=2,
padding=1, alpha=0.2, **kwargs):
super(D_block, self).__init__(**kwargs)
self.conv2d = nn.Conv2D(
channels, kernel_size, strides, padding, use_bias=False)
self.batch_norm = nn.BatchNorm()
self.activation = nn.LeakyReLU(alpha)
def forward(self, X):
return self.activation(self.batch_norm(self.conv2d(X)))
x = np.zeros((2, 3, 16, 16))
d_blk = D_block(20)
d_blk.initialize()
d_blk(x).shape
(2, 20, 8, 8)
鑑別器是發生器的鏡像。
n_D = 64
net_D = nn.Sequential()
net_D.add(D_block(n_D), # output: (64,
32, 32)
D_block(n_D*2), # output: (64*2, 16, 16)
D_block(n_D*4), # output: (64*4, 8, 8)
D_block(n_D*8), # output: (64*8, 4, 4)
nn.Conv2D(1, kernel_size=4, use_bias=False))
# output: (1, 1, 1)
使用帶輸出通道的卷積層1作爲最後一層獲得單個預測值。
x = np.zeros((1, 3, 64, 64))
net_D.initialize()
net_D(x).shape
(1, 1, 1, 1)
- Training
對生成器和鑑別器使用學習速率,改變β1在亞當中0.9到0.5。降低了動量的平滑度,即過去梯度的指數加權移動平均值,以處理快速變化的梯度,因爲生成器和鑑別器相互競爭。另外,用隨機產生的隨機噪聲來加速計算。
def train(net_D, net_G, data_iter, num_epochs, lr, latent_dim,
ctx=d2l.try_gpu()):
loss = gluon.loss.SigmoidBCELoss()
net_D.initialize(init=init.Normal(0.02), force_reinit=True, ctx=ctx)
net_G.initialize(init=init.Normal(0.02), force_reinit=True, ctx=ctx)
trainer_hp = {'learning_rate': lr, 'beta1': 0.5}
trainer_D = gluon.Trainer(net_D.collect_params(), 'adam', trainer_hp)
trainer_G = gluon.Trainer(net_G.collect_params(), 'adam', trainer_hp)
animator = d2l.Animator(xlabel='epoch', ylabel='loss',
xlim=[1, num_epochs], nrows=2, figsize=(5, 5),
legend=['discriminator', 'generator'])
animator.fig.subplots_adjust(hspace=0.3)
for epoch in range(1, num_epochs + 1):
# Train one epoch
timer = d2l.Timer()
metric = d2l.Accumulator(3) # loss_D, loss_G, num_examples
for X, _ in data_iter:
batch_size = X.shape[0]
Z = np.random.normal(0, 1, size=(batch_size, latent_dim, 1, 1))
X, Z = X.as_in_ctx(ctx), Z.as_in_ctx(ctx),
metric.add(d2l.update_D(X, Z, net_D, net_G, loss, trainer_D),
d2l.update_G(Z, net_D, net_G, loss, trainer_G),
batch_size)
# Show generated examples
Z = np.random.normal(0, 1, size=(21, latent_dim, 1, 1), ctx=ctx)
# Normalize the synthetic data to N(0, 1)
fake_x = net_G(Z).transpose(0, 2, 3, 1) / 2 + 0.5
imgs = np.concatenate(
[np.concatenate([fake_x[i * 7 + j] for j in range(7)], axis=1)
for i in range(len(fake_x)//7)], axis=0)
animator.axes[1].cla()
animator.axes[1].imshow(imgs.asnumpy())
# Show the losses
loss_D, loss_G = metric[0] / metric[2], metric[1] / metric[2]
animator.add(epoch, (loss_D, loss_G))
print('loss_D %.3f, loss_G %.3f, %d examples/sec on %s' % (
loss_D, loss_G, metric[2]/timer.stop(), ctx))
現在訓練模型
latent_dim, lr, num_epochs = 100, 0.005, 40
train(net_D, net_G, data_iter, num_epochs, lr, latent_dim)
loss_D 0.011, loss_G 7.465, 2663 examples/sec on gpu(0)
5. Summary
· DCGAN architecture has four convolutional layers for the Discriminator and four “fractionally-strided” convolutional layers for the Generator.
· The Discriminator is a 4-layer strided convolutions with batch normalization (except its input layer) and leaky ReLU activations.
· Leaky ReLU is a nonlinear function that give a non-zero output for a negative input. It aims to fix the “dying ReLU” problem and helps the gradients flow easier through the architecture.