GAN與自動編碼器:深度生成模型的比較

原文:https://towardsdatascience.com/gans-vs-autoencoders-comparison-of-deep-generative-models-985cf15936ea

想把馬變成斑馬嗎?製作DIY動漫人物或名人?生成對抗網絡(GAN)是您最好的新朋友。

“Generative Adversarial Networks是過去10年機器學習中最有趣的想法。”  -  Facebook AI人工智能研究總監Yann LeCun

可以在此處找到本教程的第1部分:

圖靈學習和GAN簡介
想把馬變成斑馬嗎?製作DIY動漫人物或名人?生成對抗網絡(GAN)是......朝向distasatcience.com

本教程的第2部分可以在這裏找到:

GAN中的高級主題
想要將馬變成斑馬嗎?製作DIY動漫人物或名人?生成對抗網絡(GAN)是......朝向distasatcience.com

這是關於使用生成對抗網絡創建深度生成模型的三部分教程的第三部分。這是關於變分自動編碼器的上一個主題的自然擴展(可在此處找到)。我們將看到,與變分自動編碼器相比,GAN通常優於深度生成模型。然而,衆所周知,它們難以使用並且需要大量數據和調整。我們還將研究一種稱爲VAE-GAN的GAN混合模型。

 

深層生成模型的分類。本文的重點是GAN。

本教程的這一部分主要是變分自動編碼器(VAE),GAN的編碼實現,並且還將向讀者展示如何製作VAE-GAN

  • CelebA數據集的VAE
  • CelebA數據集的DC-GAN
  • 動畫數據集的DC-GAN
  • 動漫數據集的VAE-GAN

我強烈建議讀者在進一步研究之前至少回顧一下GAN教程的第1部分,以及我的變分自動編碼器演練,否則,實現可能對讀者來說可能沒什麼意義。

要獲得我以前運行所有這些代碼的筆記本,請隨時查看我的GitHub存儲庫以獲取這組教程。

mrdragonbear / GAN-Tutorial
通過在GitHub上創建一個帳戶,爲mrdragonbear / GAN-Tutorial開發做出貢獻。github.com

讓我們開始!


CelebA數據集的VAE

CelebFaces屬性數據集(CelebA)是一個大型的人臉屬性數據集,擁有超過200K的名人圖像,每個圖像都有40個屬性註釋。此數據集中的圖像覆蓋了大的姿勢變化和雜亂的背景。CelebA擁有大量的多樣性,大批量和豐富的註釋,包括

  • 10,177個身份,
  • 202,599個面部圖像的數量,
  • 5個地標位置,每個圖像40個二進制屬性註釋。

您可以在這裏從Kaggle下載數據集:

 

CelebFaces屬性(CelebA)數據集
超過20萬名名人的圖像,帶有40個二進制屬性註釋www.kaggle.com

第一步是導入所有必要的功能並提取數據。

 

導入

import shutil
import errno
import zipfile
import os
import matplotlib.pyplot as plt

提取數據

# Only run once to unzip images
zip_ref = zipfile.ZipFile('img_align_celeba.zip','r')
zip_ref.extractall()
zip_ref.close()

定製圖像生成器

這一步很可能是大多數讀者以前沒有用過的。由於數據量巨大,可能無法將數據集加載到Jupyter Notebook的內存中。在處理大型數據集時,這是一個非常正常的問題。

解決方法是使用流生成器,它按順序將批量數據(在這種情況下爲圖像)流式傳輸到內存中,從而限制了函數所需的內存量。需要注意的是,它們理解和編碼有點複雜,因爲它們需要對計算機內存,GPU架構等有合理的理解。

# data generator
# source from https://medium.com/@ensembledme/writing-custom-keras-generators-fe815d992c5a
from skimage.io import imread

def get_input(path):
    """get specific image from path"""
    img = imread(path)
    return img

def get_output(path, label_file = None):
    """get all the labels relative to the image of path"""
    img_id = path.split('/')[-1]
    labels = label_file.loc[img_id].values
    return labels

def preprocess_input(img):
    # convert between 0 and 1
    return img.astype('float32') / 127.5 -1

def image_generator(files, label_file, batch_size = 32):
    while True:

        batch_paths = np.random.choice(a = files, size = batch_size)
        batch_input = []
        batch_output = []

        for input_path in batch_paths:

            input = get_input(input_path)
            input = preprocess_input(input)
            output = get_output(input_path, label_file = label_file)
            batch_input += [input]
            batch_output += [output]
        batch_x = np.array(batch_input)
        batch_y = np.array(batch_output)

        yield batch_x, batch_y

def auto_encoder_generator(files, batch_size = 32):
    while True:
        batch_paths = np.random.choice(a = files, size = batch_size)
        batch_input = []
        batch_output = []

        for input_path in batch_paths:
            input = get_input(input_path)
            input = preprocess_input(input)
            output = input
            batch_input += [input]
            batch_output += [output]
        batch_x = np.array(batch_input)
        batch_y = np.array(batch_output)

        yield batch_x, batch_y

有關在Keras中編寫自定義生成器的更多信息,我在上面的代碼中引用了一篇很好的文章:

編寫自定義Keras生成器
使用Keras生成器的想法是在訓練期間動態獲取批量輸入和相應的輸出...medium.com

加載屬性數據

 

我們不僅擁有此數據集的圖像,而且每個圖像還具有與名人方面相對應的屬性列表。例如,有一些屬性描述了名人是否戴着口紅或帽子,他們是否年輕,是否有黑頭髮等。

# now load attribute

# 1.A.2
import pandas as pd
attr = pd.read_csv('list_attr_celeba.csv')
attr = attr.set_index('image_id')

# check if attribute successful loaded
attr.describe()

完成製作生成器

現在我們完成了生成器的製造。我們將圖像名稱長度設置爲6,因爲我們的數據集中有6位數的圖像。閱讀自定義Keras生成器文章後,這部分代碼應該能理解。

 

import numpy as np
from sklearn.model_selection import train_test_split
IMG_NAME_LENGTH = 6
file_path = "img_align_celeba/"
img_id = np.arange(1,len(attr.index)+1)
img_path = []
for i in range(len(img_id)):
    img_path.append(file_path + (IMG_NAME_LENGTH - len(str(img_id[i])))*'0' + str(img_id[i]) + '.jpg')
# pick 80% as training set and 20% as validation set
train_path = img_path[:int((0.8)*len(img_path))]
val_path = img_path[int((0.8)*len(img_path)):]
train_generator = auto_encoder_generator(train_path,32)
val_generator = auto_encoder_generator(val_path,32)

我們現在可以選擇三個圖像並檢查屬性是否有意義。

fig, ax = plt.subplots(1, 3, figsize=(12, 4))
for i in range(3):    
    ax[i].imshow(get_input(img_path[i]))
    ax[i].axis('off')
    ax[i].set_title(img_path[i][-10:])
plt.show()
    
attr.iloc[:3]

 

三個隨機圖像以及它們的一些屬性。

構建和訓練VAE模型

首先,我們將爲名人臉數據集創建和編譯卷積VAE模型(包括編碼器和解碼器)。

繼續導入

from keras.models import Sequential, Model
from keras.layers import Dropout, Flatten, Dense, Conv2D, MaxPooling2D, Input, Reshape, UpSampling2D, InputLayer, Lambda, ZeroPadding2D, Cropping2D, Conv2DTranspose, BatchNormalization
from keras.utils import np_utils, to_categorical
from keras.losses import binary_crossentropy
from keras import backend as K,objectives
from keras.losses import mse, binary_crossentropy

模型架構

現在我們可以創建並總結模型。

b_size = 128
n_size = 512
def sampling(args):
    z_mean, z_log_sigma = args
    epsilon = K.random_normal(shape = (n_size,) , mean = 0, stddev = 1)
    return z_mean + K.exp(z_log_sigma/2) * epsilon
  
def build_conv_vae(input_shape, bottleneck_size, sampling, batch_size = 32):
    
    # ENCODER
    input = Input(shape=(input_shape[0],input_shape[1],input_shape[2]))
    x = Conv2D(32,(3,3),activation = 'relu', padding = 'same')(input)    
    x = BatchNormalization()(x)
    x = MaxPooling2D((2,2), padding ='same')(x)
    x = Conv2D(64,(3,3),activation = 'relu', padding = 'same')(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2,2), padding ='same')(x)
    x = Conv2D(128,(3,3), activation = 'relu', padding = 'same')(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2,2), padding ='same')(x)
    x = Conv2D(256,(3,3), activation = 'relu', padding = 'same')(x)
    x = BatchNormalization()(x)
    x = MaxPooling2D((2,2), padding ='same')(x)
    
    # Latent Variable Calculation
    shape = K.int_shape(x)
    flatten_1 = Flatten()(x)
    dense_1 = Dense(bottleneck_size, name='z_mean')(flatten_1)
    z_mean = BatchNormalization()(dense_1)
    flatten_2 = Flatten()(x)
    dense_2 = Dense(bottleneck_size, name ='z_log_sigma')(flatten_2)
    z_log_sigma = BatchNormalization()(dense_2)
    z = Lambda(sampling)([z_mean, z_log_sigma])
    encoder = Model(input, [z_mean, z_log_sigma, z], name = 'encoder')
    
    # DECODER
    latent_input = Input(shape=(bottleneck_size,), name = 'decoder_input')
    x = Dense(shape[1]*shape[2]*shape[3])(latent_input)
    x = Reshape((shape[1],shape[2],shape[3]))(x)
    x = UpSampling2D((2,2))(x)
    x = Cropping2D([[0,0],[0,1]])(x)
    x = Conv2DTranspose(256,(3,3), activation = 'relu', padding = 'same')(x)
    x = BatchNormalization()(x)
    x = UpSampling2D((2,2))(x)
    x = Cropping2D([[0,1],[0,1]])(x)
    x = Conv2DTranspose(128,(3,3), activation = 'relu', padding = 'same')(x)
    x = BatchNormalization()(x)
    x = UpSampling2D((2,2))(x)
    x = Cropping2D([[0,1],[0,1]])(x)
    x = Conv2DTranspose(64,(3,3), activation = 'relu', padding = 'same')(x)
    x = BatchNormalization()(x)
    x = UpSampling2D((2,2))(x)
    x = Conv2DTranspose(32,(3,3), activation = 'relu', padding = 'same')(x)
    x = BatchNormalization()(x)
    output = Conv2DTranspose(3,(3,3), activation = 'tanh', padding ='same')(x)
    decoder = Model(latent_input, output, name = 'decoder')

    output_2 = decoder(encoder(input)[2])
    vae = Model(input, output_2, name ='vae')
    return vae, encoder, decoder, z_mean, z_log_sigma

vae_2, encoder, decoder, z_mean, z_log_sigma = build_conv_vae(img_sample.shape, n_size, sampling, batch_size = b_size)
print("encoder summary:")
encoder.summary()
print("decoder summary:")
decoder.summary()
print("vae summary:")
vae_2.summary()

定義VAE損失

def vae_loss(input_img, output):
    # Compute error in reconstruction
    reconstruction_loss = mse(K.flatten(input_img) , K.flatten(output))
    
    # Compute the KL Divergence regularization term
    kl_loss = - 0.5 * K.sum(1 + z_log_sigma - K.square(z_mean) - K.exp(z_log_sigma), axis = -1)
    
    # Return the average loss over all images in batch
    total_loss = (reconstruction_loss + 0.0001 * kl_loss)    
    return total_loss

編譯模型

vae_2.compile(optimizer='rmsprop', loss= vae_loss)
encoder.compile(optimizer = 'rmsprop', loss = vae_loss)
decoder.compile(optimizer = 'rmsprop', loss = vae_loss)

訓練模型

vae_2.fit_generator(train_generator, steps_per_epoch = 4000, validation_data = val_generator, epochs=7, validation_steps= 500)

我們隨機選擇訓練集的一些圖像,通過編碼器運行它們以參數化潛在代碼,然後用解碼器重建圖像。

import random
x_test = []
for i in range(64):
    x_test.append(get_input(img_path[random.randint(0,len(img_id))]))
x_test = np.array(x_test)
figure_Decoded = vae_2.predict(x_test.astype('float32')/127.5 -1, batch_size = b_size)
figure_original = x_test[0]
figure_decoded = (figure_Decoded[0]+1)/2
for i in range(4):
    plt.axis('off')
    plt.subplot(2,4,1+i*2)
    plt.imshow(x_test[i])
    plt.axis('off')
    plt.subplot(2,4,2 + i*2)
    plt.imshow((figure_Decoded[i]+1)/2)
    plt.axis('off')
plt.show()


                   來自訓練集的隨機樣本與VAE重建後的圖片對比

請注意,重建的圖像與原始版本具有相似之處。然而,新圖像有點模糊,這是已知的VAE現象。推測可能是由於變分推斷優化了可能性的下限,而不是實際可能性本身。

潛在空間表示

我們可以選擇具有不同屬性的兩個圖像並繪製其潛在空間表示。請注意,我們可以看到潛在代碼之間的一些差異,我們可能會假設這些差異可以解釋原始圖像之間的差異。

# Choose two images of different attributes, and plot the original and latent space of it

x_test1 = []
for i in range(64):
    x_test1.append(get_input(img_path[np.random.randint(0,len(img_id))]))
x_test1 = np.array(x_test)
x_test_encoded = np.array(encoder.predict(x_test1/127.5-1, batch_size = b_size))
figure_original_1 = x_test[0]
figure_original_2 = x_test[1]
Encoded1 = (x_test_encoded[0,0,:].reshape(32, 16,)+1)/2 
Encoded2 = (x_test_encoded[0,1,:].reshape(32, 16)+1)/2

plt.figure(figsize=(8, 8))
plt.subplot(2,2,1)
plt.imshow(figure_original_1)
plt.subplot(2,2,2)
plt.imshow(Encoded1)
plt.subplot(2,2,3)
plt.imshow(figure_original_2)
plt.subplot(2,2,4)
plt.imshow(Encoded2)
plt.show()


從潛在空間中取樣

我們可以隨機抽樣15個潛碼並對其進行解碼以生成新的名人面孔。我們可以從這種表示中看出,我們的模型生成的圖像與我們的訓練集中的圖像具有非常相似的風格,並且它也具有良好的現實性和變化性。

# We randomly generated 15 images from 15 series of noise information
n = 3
m = 5
digit_size1 = 218
digit_size2 = 178
figure = np.zeros((digit_size1 * n, digit_size2 * m,3))
 
for i in range(3):
    for j in range(5):
        z_sample = np.random.rand(1,512)
        x_decoded = decoder.predict([z_sample])
        figure[i * digit_size1: (i + 1) * digit_size1,
               j * digit_size2: (j + 1) * digit_size2,:] = (x_decoded[0]+1)/2 
plt.figure(figsize=(10, 10))
plt.imshow(figure)
plt.show()


所以我們的VAE模型似乎並不是特別好。隨着更多的時間和更好的超參數選擇等,我們可能會取得比這更好的結果。

現在讓我們將此結果與同一數據集上的DC-GAN進行比較。

CelebA數據集上的DC-GAN

由於我們已經設置了流生成器,因此沒有太多工作要做以啓動和運行DC-GAN模型。

# Create and compile a DC-GAN model, and print the summary

from keras.utils import np_utils
from keras.models import Sequential, Model
from keras.layers import Input, Dense, Dropout, Activation, Flatten, LeakyReLU,\
      BatchNormalization, Conv2DTranspose, Conv2D, Reshape
from keras.layers.advanced_activations import LeakyReLU
from keras.optimizers import Adam, RMSprop
from keras.initializers import RandomNormal
import numpy as np
import matplotlib.pyplot as plt
import random
from tqdm import tqdm_notebook
from scipy.misc import imresize

def generator_model(latent_dim=100, leaky_alpha=0.2, init_stddev=0.02):

    g = Sequential()
    g.add(Dense(4*4*512, input_shape=(latent_dim,),
                kernel_initializer=RandomNormal(stddev=init_stddev)))
    g.add(Reshape(target_shape=(4, 4, 512)))
    g.add(BatchNormalization())
    g.add(Activation(LeakyReLU(alpha=leaky_alpha)))
    g.add(Conv2DTranspose(256, kernel_size=5, strides=2, padding='same',
                kernel_initializer=RandomNormal(stddev=init_stddev)))
    g.add(BatchNormalization())
    g.add(Activation(LeakyReLU(alpha=leaky_alpha)))
    g.add(Conv2DTranspose(128, kernel_size=5, strides=2, padding='same', 
                kernel_initializer=RandomNormal(stddev=init_stddev)))
    g.add(BatchNormalization())
    g.add(Activation(LeakyReLU(alpha=leaky_alpha)))
    g.add(Conv2DTranspose(3, kernel_size=4, strides=2, padding='same', 
                kernel_initializer=RandomNormal(stddev=init_stddev)))
    g.add(Activation('tanh'))
    g.summary()
    #g.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0001, beta_1=0.5), metrics=['accuracy'])
    return g

  
def discriminator_model(leaky_alpha=0.2, init_stddev=0.02):
    
    d = Sequential()
    d.add(Conv2D(64, kernel_size=5, strides=2, padding='same', 
               kernel_initializer=RandomNormal(stddev=init_stddev),
               input_shape=(32, 32, 3)))
    d.add(Activation(LeakyReLU(alpha=leaky_alpha)))
    d.add(Conv2D(128, kernel_size=5, strides=2, padding='same', 
               kernel_initializer=RandomNormal(stddev=init_stddev)))
    d.add(BatchNormalization())
    d.add(Activation(LeakyReLU(alpha=leaky_alpha)))
    d.add(Conv2D(256, kernel_size=5, strides=2, padding='same', 
               kernel_initializer=RandomNormal(stddev=init_stddev)))
    d.add(BatchNormalization())
    d.add(Activation(LeakyReLU(alpha=leaky_alpha)))
    d.add(Flatten())
    d.add(Dense(1, kernel_initializer=RandomNormal(stddev=init_stddev)))
    d.add(Activation('sigmoid'))
    d.summary()
    return d

def DCGAN(sample_size=100):
    # Generator
    g = generator_model(sample_size, 0.2, 0.02)

    # Discriminator
    d = discriminator_model(0.2, 0.02)
    d.compile(optimizer=Adam(lr=0.001, beta_1=0.5), loss='binary_crossentropy')
    d.trainable = False
    # GAN
    gan = Sequential([g, d])
    gan.compile(optimizer=Adam(lr=0.0001, beta_1=0.5), loss='binary_crossentropy')
    
    return gan, g, d

以上代碼僅適用於生成器和鑑別器網絡的體系結構。將這種編碼GAN的方法與我在第2部分中編寫的方法進行比較是一個好主意,您可以看到這個方法不太清晰,我們沒有定義全局參數,因此有很多地方我們可能會遇到潛在的錯誤。

現在我們定義了一系列功能,使我們的生活更輕鬆,這些功能主要用於圖像的預處理和繪圖,以幫助我們分析網絡輸出。

def load_image(filename, size=(32, 32)):
    img = plt.imread(filename)
    # crop
    rows, cols = img.shape[:2]
    crop_r, crop_c = 150, 150
    start_row, start_col = (rows - crop_r) // 2, (cols - crop_c) // 2
    end_row, end_col = rows - start_row, cols - start_row
    img = img[start_row:end_row, start_col:end_col, :]
    # resize
    img = imresize(img, size)
    return img

def preprocess(x):
    return (x/255)*2-1

def deprocess(x):
    return np.uint8((x+1)/2*255)

def make_labels(size):
    return np.ones([size, 1]), np.zeros([size, 1])  

def show_losses(losses):
    losses = np.array(losses)
    
    fig, ax = plt.subplots()
    plt.plot(losses.T[0], label='Discriminator')
    plt.plot(losses.T[1], label='Generator')
    plt.title("Validation Losses")
    plt.legend()
    plt.show()

def show_images(generated_images):
    n_images = len(generated_images)
    cols = 5
    rows = n_images//cols
    
    plt.figure(figsize=(8, 6))
    for i in range(n_images):
        img = deprocess(generated_images[i])
        ax = plt.subplot(rows, cols, i+1)
        plt.imshow(img)
        plt.xticks([])
        plt.yticks([])
    plt.tight_layout()
    plt.show()

訓練模型

我們現在定義訓練功能。正如我們之前所做的那樣,請注意我們在將鑑別器設置爲可訓練和不可訓練之間切換(我們在第2部分中隱含地這樣做了)。

def train(sample_size=100, epochs=3, batch_size=128, eval_size=16, smooth=0.1):
    batchCount=len(train_path)//batch_size
    y_train_real, y_train_fake = make_labels(batch_size)
    y_eval_real,  y_eval_fake  = make_labels(eval_size)
    
    # create a GAN, a generator and a discriminator
    gan, g, d = DCGAN(sample_size)
    
    losses = []
    for e in range(epochs):
        print('-'*15, 'Epoch %d' % (e+1), '-'*15)
        for i in tqdm_notebook(range(batchCount)):
            
            path_batch = train_path[i*batch_size:(i+1)*batch_size]
            image_batch = np.array([preprocess(load_image(filename)) for filename in path_batch])
            
            noise = np.random.normal(0, 1, size=(batch_size, noise_dim))
            generated_images = g.predict_on_batch(noise)
            # Train discriminator on generated images
            d.trainable = True
            d.train_on_batch(image_batch, y_train_real*(1-smooth))
            d.train_on_batch(generated_images, y_train_fake)
            # Train generator
            d.trainable = False
            g_loss=gan.train_on_batch(noise, y_train_real)
        
        # evaluate
        test_path = np.array(val_path)[np.random.choice(len(val_path), eval_size, replace=False)]
        x_eval_real = np.array([preprocess(load_image(filename)) for filename in test_path])
        noise = np.random.normal(loc=0, scale=1, size=(eval_size, sample_size))
        x_eval_fake = g.predict_on_batch(noise)
        
        d_loss  = d.test_on_batch(x_eval_real, y_eval_real)
        d_loss += d.test_on_batch(x_eval_fake, y_eval_fake)
        g_loss  = gan.test_on_batch(noise, y_eval_real)
        
        losses.append((d_loss/2, g_loss))
  
        print("Epoch: {:>3}/{} Discriminator Loss: {:>6.4f} Generator Loss: {:>6.4f}".format(
            e+1, epochs, d_loss, g_loss))  
        
        show_images(x_eval_fake[:10])
    
    # show the result
    show_losses(losses)
    show_images(g.predict(np.random.normal(loc=0, scale=1, size=(15, sample_size))))    
    return g
noise_dim=100
train()

此函數的輸出將爲每個時期提供以下輸出:

 


它還將繪製鑑別器和發生器的驗證損失。

 

生成的圖像看起來合理。在這裏我們可以看到我們的模型表現得很好,儘管圖像的質量不如訓練集中那麼好(因爲我們將圖像重新塑造成更小並使它們比原始圖像更模糊)。然而,它們足夠生動,可以創造出有效的面孔,這些面孔與現實相近。此外,與VAE生成的圖像相比,圖像更具創意和真實感。

因此,在這種情況下,GAN似乎表現出色。現在讓我們嘗試一個新的數據集,看看與混合變體VAE-GAN相比,GAN的表現如何。


動漫數據集

在本節中,我們將使用GAN以及另一種特殊形式的GAN(VAE-GAN)生成與Anime數據集相同樣式的面。術語VAE-GAN首先由Larsen等人使用。在他們的論文“使用學習的相似性度量自動編碼超出像素”。VAE-GAN模型與GAN的區別在於它們的生成器是變異自動編碼器

VAE-GAN架構。資料來源:https//arxiv.org/abs/1512.09300

首先,我們將重點關注DC-GAN。Anime數據集由64x64圖像形式的超過20K動畫面組成。我們還需要創建另一個Keras自定義數據生成器。可以在此處找到數據集的鏈接:

Mckinsey666 / Anime-Face-
Dataset?一系列高品質的動漫人物面孔。通過創建...github.com,爲Mckinsey666 / Anime-Face-Dataset開發做出貢獻


動漫數據集上的DC-GAN

我們需要做的第一件事是創建動漫目錄並下載數據。這可以從上面的鏈接完成,也可以直接從Amazon Web Services完成(如果這種訪問數據的方式仍然可用)。

# Create anime directory and download from AWS
import zipfile
!mkdir anime-faces && wget https://s3.amazonaws.com/gec-harvard-dl2-hw2-data/datasets/anime-faces.zip
with zipfile.ZipFile("anime-faces.zip","r") as anime_ref:
    anime_ref.extractall("anime-faces/")

在繼續前進之前檢查數據總是好的做法,所以我們現在就這樣做。

from skimage import io
import matplotlib.pyplot as plt

filePath='anime-faces/data/'
imgSets=[]

for i in range(1,20001):
    imgName=filePath+str(i)+'.png'
    imgSets.append(io.imread(imgName))

plt.imshow(imgSets[1234])
plt.axis('off')
plt.show()

 

我們現在創建並編譯我們的DC-GAN模型。

 

# Create and compile a DC-GAN model
from keras.models import Sequential, Model
from keras.layers import Input, Dense, Dropout, Activation, \
    Flatten, LeakyReLU, BatchNormalization, Conv2DTranspose, Conv2D, Reshape
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D
from keras.optimizers import Adam, RMSprop,SGD
from keras.initializers import RandomNormal
import numpy as np
import matplotlib.pyplot as plt
import os, glob
from PIL import Image
from tqdm import tqdm_notebook

image_shape = (64, 64, 3)
#noise_shape = (100,)
Noise_dim = 128
img_rows = 64
img_cols = 64
channels = 3
def generator_model(latent_dim=100, leaky_alpha=0.2):
    model = Sequential()
    
    # layer1 (None,500)>>(None,128*16*16)
    model.add(Dense(128 * 16 * 16, activation="relu", input_shape=(Noise_dim,)))
    
    # (None,16*16*128)>>(None,16,16,128)
    model.add(Reshape((16, 16, 128)))
    
   # (None,16,16,128)>>(None,32,32,128)
    model.add(UpSampling2D())
    model.add(Conv2D(256, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    #(None,32,32,128)>>(None,64,64,128)
    model.add(UpSampling2D())
    
    # (None,64,64,128)>>(None,64,64,64)
    model.add(Conv2D(128, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    # (None,64,64,128)>>(None,64,64,32)
    model.add(Conv2D(32, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    
    # (None,64,64,32)>>(None,64,64,3)
    model.add(Conv2D(channels, kernel_size=3, padding="same"))
    model.add(Activation("tanh"))
    model.summary()
    model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0001, beta_1=0.5), metrics=['accuracy'])
    return model

def discriminator_model(leaky_alpha=0.2, dropRate=0.3):
    model = Sequential()
    
    # layer1 (None,64,64,3)>>(None,32,32,32)
    model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=image_shape, padding="same"))
    model.add(LeakyReLU(alpha=leaky_alpha))
    model.add(Dropout(dropRate))
    # layer2 (None,32,32,32)>>(None,16,16,64)
    model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
    # model.add(ZeroPadding2D(padding=((0, 1), (0, 1))))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=leaky_alpha))
    model.add(Dropout(dropRate))
    # (None,16,16,64)>>(None,8,8,128)
    model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(dropRate))
    # (None,8,8,128)>>(None,8,8,256)
    model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(dropRate))
     # (None,8,8,256)>>(None,8,8,64)
    model.add(Conv2D(64, kernel_size=3, strides=1, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(dropRate))
    
    # (None,8,8,64)
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))
    model.summary()
    sgd=SGD(lr=0.0002)
    model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0001, beta_1=0.5), metrics=['accuracy'])
    return model

def DCGAN(sample_size=Noise_dim):
    # generator
    g = generator_model(sample_size, 0.2)
    # discriminator
    d = discriminator_model(0.2)
    d.trainable = False
    # GAN
    gan = Sequential([g, d])
    
    sgd=SGD()
    gan.compile(optimizer=Adam(lr=0.0001, beta_1=0.5), loss='binary_crossentropy')
    return gan, g, d

def get_image(image_path, width, height, mode):
    image = Image.open(image_path)
    #print(image.size)
    return np.array(image.convert(mode))

def get_batch(image_files, width, height, mode):
    data_batch = np.array([get_image(sample_file, width, height, mode) \
                           for sample_file in image_files])
    return data_batch

def show_imgs(generator,epoch):
    row=3
    col = 5
    noise = np.random.normal(0, 1, (row * col, Noise_dim))
    gen_imgs = generator.predict(noise)
    # Rescale images 0 - 1
    gen_imgs = 0.5 * gen_imgs + 0.5
    fig, axs = plt.subplots(row, col)
    #fig.suptitle("DCGAN: Generated digits", fontsize=12)
    cnt = 0
    for i in range(row):
        for j in range(col):
            axs[i, j].imshow(gen_imgs[cnt, :, :, :])
            axs[i, j].axis('off')
            cnt += 1
    #plt.close()
    plt.show()

我們現在可以在Anime數據集上訓練模型。我們將以兩種不同的方式完成這項工作,第一種方法是培訓鑑別器和發生器,培訓時間比例爲1:1。

# Training the discriminator and generator with the 1:1 proportion of training times
def train(epochs=30, batchSize=128):
    filePath = r'anime-faces/data/'
    X_train = get_batch(glob.glob(os.path.join(filePath, '*.png'))[:20000], 64, 64, 'RGB')
    X_train = (X_train.astype(np.float32) - 127.5) / 127.5
    halfSize = int(batchSize / 2)
    batchCount=int(len(X_train)/batchSize)
    dLossReal = []
    dLossFake = []
    gLossLogs = []
    gan, generator, discriminator = DCGAN(Noise_dim)
    for e in range(epochs):
        for i in tqdm_notebook(range(batchCount)):
            index = np.random.randint(0, X_train.shape[0], halfSize)
            images = X_train[index]
            noise = np.random.normal(0, 1, (halfSize, Noise_dim))
            genImages = generator.predict(noise)
            # one-sided labels
            discriminator.trainable = True
            dLossR = discriminator.train_on_batch(images, np.ones([halfSize, 1]))
            dLossF = discriminator.train_on_batch(genImages, np.zeros([halfSize, 1]))
            dLoss = np.add(dLossF, dLossR) * 0.5
            discriminator.trainable = False
            noise = np.random.normal(0, 1, (batchSize, Noise_dim))
            gLoss = gan.train_on_batch(noise, np.ones([batchSize, 1]))
        dLossReal.append([e, dLoss[0]])
        dLossFake.append([e, dLoss[1]])
        gLossLogs.append([e, gLoss])
        dLossRealArr = np.array(dLossReal)
        dLossFakeArr = np.array(dLossFake)
        gLossLogsArr = np.array(gLossLogs)
            
        # At the end of training plot the losses vs epochs
        show_imgs(generator, e)
    plt.plot(dLossRealArr[:, 0], dLossRealArr[:, 1], label="Discriminator Loss - Real")
    plt.plot(dLossFakeArr[:, 0], dLossFakeArr[:, 1], label="Discriminator Loss - Fake")
    plt.plot(gLossLogsArr[:, 0], gLossLogsArr[:, 1], label="Generator Loss")
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.title('GAN')
    plt.grid(True)
    plt.show()
    
    
    return gan, generator, discriminator

GAN,Generator,Discriminator=train(epochs=20, batchSize=128)  
train(epochs=1000, batchSize=128, plotInternal=200)

輸出現在將開始打印一系列動畫角色。它們起初粒度很大,隨着時間的推移逐漸變得越來越明顯。

 

我們還將得到我們的發生器和鑑別器損失函數的圖。

 

現在我們將做同樣的事情,但是鑑別器和發生器的訓練時間不同,看看效果如何。

在繼續前進之前,最好將模型的權重保存在某處,這樣您就不需要再次運行整個訓練,而只需將權重加載到網絡中即可。

要保存重量:

discriminator.save_weights('/content/gdrive/My Drive/discriminator_DCGAN_lr0.0001_deepgenerator+proportion2.h5')
gan.save_weights('/content/gdrive/My Drive/gan_DCGAN_lr0.0001_deepgenerator+proportion2.h5')
generator.save_weights('/content/gdrive/My Drive/generator_DCGAN_lr0.0001_deepgenerator+proportion2.h5')

 

要加載權重:

discriminator.load_weights('/content/gdrive/My Drive/discriminator_DCGAN_lr0.0001_deepgenerator+proportion2.h5')
gan.load_weights('/content/gdrive/My Drive/gan_DCGAN_lr0.0001_deepgenerator+proportion2.h5')
generator.load_weights('/content/gdrive/My Drive/generator_DCGAN_lr0.0001_deepgenerator+proportion2.h5')

 

現在我們進入第二個網絡實施,而不必擔心保存我們以前的網絡。

 

# Train the discriminator and generator separately and with different training times
def train(epochs=300, batchSize=128, plotInternal=50):
    gLoss = 1
    filePath = r'anime-faces/data/'
    
    X_train = get_batch(glob.glob(os.path.join(filePath,'*.png'))[:20000],64,64,'RGB')
    X_train=(X_train.astype(np.float32)-127.5)/127.5
    halfSize= int (batchSize/2)
    dLossReal=[]
    dLossFake=[]
    gLossLogs=[]
    
    for e in range(epochs):
        index=np.random.randint(0,X_train.shape[0],halfSize)
        images=X_train[index]
        noise=np.random.normal(0,1,(halfSize,Noise_dim))
        genImages=generator.predict(noise)
        
        if e < int(epochs*0.5):    
            #one-sided labels
            discriminator.trainable=True
            dLossR=discriminator.train_on_batch(images,np.ones([halfSize,1]))
            dLossF=discriminator.train_on_batch(genImages,np.zeros([halfSize,1]))
            dLoss=np.add(dLossF,dLossR)*0.5
            discriminator.trainable=False
            cnt = e
            while cnt > 3:
                cnt = cnt - 4
            if cnt == 0:
                noise=np.random.normal(0,1,(batchSize,Noise_dim))
                gLoss=gan.train_on_batch(noise,np.ones([batchSize,1]))
                
        elif e>= int(epochs*0.5) :
            cnt = e
            while cnt > 3:
                cnt = cnt - 4
            if cnt == 0:
                #one-sided labels
                discriminator.trainable=True
                dLossR=discriminator.train_on_batch(images,np.ones([halfSize,1]))
                dLossF=discriminator.train_on_batch(genImages,np.zeros([halfSize,1]))
                dLoss=np.add(dLossF,dLossR)*0.5
                discriminator.trainable=False
            
            noise=np.random.normal(0,1,(batchSize,Noise_dim))
            gLoss=gan.train_on_batch(noise,np.ones([batchSize,1]))
        if e % 20 == 0:
           print("epoch: %d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (e, dLoss[0], 100 * dLoss[1], gLoss))
        dLossReal.append([e,dLoss[0]])
        dLossFake.append([e,dLoss[1]])
        gLossLogs.append([e,gLoss])
        if e % plotInternal == 0 and e!=0:
            show_imgs(generator, e)
            
            
        dLossRealArr= np.array(dLossReal)
        dLossFakeArr = np.array(dLossFake)
        gLossLogsArr = np.array(gLossLogs)
        
        chk = e
        while chk > 50:
            chk = chk - 51
        if chk == 0:
            discriminator.save_weights('/content/gdrive/My Drive/discriminator_DCGAN_lr=0.0001,proportion2,deepgenerator_Fake.h5')
            gan.save_weights('/content/gdrive/My Drive/gan_DCGAN_lr=0.0001,proportion2,deepgenerator_Fake.h5')
            generator.save_weights('/content/gdrive/My Drive/generator_DCGAN_lr=0.0001,proportion2,deepgenerator_Fake.h5')
        # At the end of training plot the losses vs epochs
    plt.plot(dLossRealArr[:, 0], dLossRealArr[:, 1], label="Discriminator Loss - Real")
    plt.plot(dLossFakeArr[:, 0], dLossFakeArr[:, 1], label="Discriminator Loss - Fake")
    plt.plot(gLossLogsArr[:, 0], gLossLogsArr[:, 1], label="Generator Loss")
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.title('GAN')
    plt.grid(True)
    plt.show()
    
    
    return gan, generator, discriminator
gan, generator, discriminator = DCGAN(Noise_dim)
train(epochs=4000, batchSize=128, plotInternal=200)

讓我們比較這兩個網絡的輸出。通過運行該行:

 

show_imgs(Generator)

網絡將從生成器輸出一些圖像(這是我們之前定義的功能之一)。

 

生成的圖像來自1:1的鑑別器與發生器的訓練。

現在讓我們檢查第二個模型。

 

從第二網絡生成的圖像具有用於鑑別器和生成器的不同訓練時間。

我們可以看到生成的圖像的細節得到改善,它們的紋理稍微更加細緻。然而,與訓練圖像相比,它們仍然低於標準。

 

訓練Anime數據集中的圖像。

也許VAE-GAN會表現得更好?


動漫數據集上的VAE-GAN

爲了重申我之前所說的關於VAE-GAN的內容,術語VAE-GAN首先被Larsen等人使用。在他們的論文“使用學習的相似性度量自動編碼超出像素”。VAE-GAN模型與GAN的區別在於它們的生成器是變異自動編碼器

 

VAE-GAN架構。資料來源:https//arxiv.org/abs/1512.09300

首先,我們需要創建和編譯VAE-GAN併爲每個網絡做一個摘要(這是一個簡單檢查架構的好方法)。

# Create and compile a VAE-GAN, and make a summary for them
from keras.models import Sequential, Model
from keras.layers import Input, Dense, Dropout, Activation, \
    Flatten, LeakyReLU, BatchNormalization, Conv2DTranspose, Conv2D, Reshape,MaxPooling2D,UpSampling2D,InputLayer, Lambda
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D
from keras.optimizers import Adam, RMSprop,SGD
from keras.initializers import RandomNormal
import numpy as np
import matplotlib.pyplot as plt
import os, glob
from PIL import Image
import pandas as pd
from scipy.stats import norm
import keras
from keras.utils import np_utils, to_categorical
from keras import backend as K
import random
from keras import metrics
from tqdm import tqdm

# plotInternal
plotInternal = 50
#######
latent_dim = 256
batch_size = 256
rows = 64
columns = 64
channel = 3
epochs = 4000
# datasize = len(dataset)
# optimizers
SGDop = SGD(lr=0.0003)
ADAMop = Adam(lr=0.0002)
# filters
filter_of_dis = 16
filter_of_decgen = 16
filter_of_encoder = 16

def sampling(args):
    mean, logsigma = args
    epsilon = K.random_normal(shape=(K.shape(mean)[0], latent_dim), mean=0., stddev=1.0)
    return mean + K.exp(logsigma / 2) * epsilon
def vae_loss(X , output , E_mean, E_logsigma):
	# compute the average MSE error, then scale it up, ie. simply sum on all axes
  reconstruction_loss = 2 * metrics.mse(K.flatten(X), K.flatten(output))
  
	# compute the KL loss
  kl_loss = - 0.5 * K.sum(1 + E_logsigma - K.square(E_mean) - K.exp(E_logsigma), axis=-1) 
  total_loss = K.mean(reconstruction_loss + kl_loss)    
  
  return total_loss
  
def encoder(kernel, filter, rows, columns, channel):
    X = Input(shape=(rows, columns, channel))
    model = Conv2D(filters=filter, kernel_size=kernel, strides=2, padding='same')(X)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = LeakyReLU(alpha=0.2)(model)
    model = Conv2D(filters=filter*2, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = LeakyReLU(alpha=0.2)(model)
    model = Conv2D(filters=filter*4, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = LeakyReLU(alpha=0.2)(model)
    model = Conv2D(filters=filter*8, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = LeakyReLU(alpha=0.2)(model)
    model = Flatten()(model)
    mean = Dense(latent_dim)(model)
    logsigma = Dense(latent_dim, activation='tanh')(model)
    latent = Lambda(sampling, output_shape=(latent_dim,))([mean, logsigma])
    meansigma = Model([X], [mean, logsigma, latent])
    meansigma.compile(optimizer=SGDop, loss='mse')
    return meansigma

def decgen(kernel, filter, rows, columns, channel):
    X = Input(shape=(latent_dim,))
    model = Dense(2*2*256)(X)
    model = Reshape((2, 2, 256))(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = Activation('relu')(model)
    model = Conv2DTranspose(filters=filter*8, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = Activation('relu')(model)
    
    model = Conv2DTranspose(filters=filter*4, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = Activation('relu')(model)
    model = Conv2DTranspose(filters=filter*2, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = Activation('relu')(model)
    model = Conv2DTranspose(filters=filter, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = Activation('relu')(model)
    model = Conv2DTranspose(filters=channel, kernel_size=kernel, strides=2, padding='same')(model)
    model = Activation('tanh')(model)
    model = Model(X, model)
    model.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0001, beta_1=0.5), metrics=['accuracy'])
    return model

def discriminator(kernel, filter, rows, columns, channel):
    X = Input(shape=(rows, columns, channel))
    model = Conv2D(filters=filter*2, kernel_size=kernel, strides=2, padding='same')(X)
    model = LeakyReLU(alpha=0.2)(model)
    model = Conv2D(filters=filter*4, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = LeakyReLU(alpha=0.2)(model)
    model = Conv2D(filters=filter*8, kernel_size=kernel, strides=2, padding='same')(model)
    model = BatchNormalization(epsilon=1e-5)(model)
    model = LeakyReLU(alpha=0.2)(model)
    model = Conv2D(filters=filter*8, kernel_size=kernel, strides=2, padding='same')(model)

    dec = BatchNormalization(epsilon=1e-5)(model)
    dec = LeakyReLU(alpha=0.2)(dec)
    dec = Flatten()(dec)
    dec = Dense(1, activation='sigmoid')(dec)
    output = Model(X, dec)
    output.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5), metrics=['accuracy'])
                   
    return output
  
def VAEGAN(decgen,discriminator):
    # generator
    g = decgen
    # discriminator
    d = discriminator
    d.trainable = False
    # GAN
    gan = Sequential([g, d])
    
#     sgd=SGD()
    gan.compile(optimizer=Adam(lr=0.0001, beta_1=0.5), loss='binary_crossentropy')
    return g, d, gan

我們再次定義了一些函數,以便我們可以從生成器中打印圖像。

def get_image(image_path, width, height, mode):
    image = Image.open(image_path)
    #print(image.size)

    return np.array(image.convert(mode))

def show_imgs(generator):
    row=3
    col = 5
    noise = np.random.normal(0, 1, (row*col, latent_dim))
    gen_imgs = generator.predict(noise)

    # Rescale images 0 - 1
    gen_imgs = 0.5 * gen_imgs + 0.5

    fig, axs = plt.subplots(row, col)
    #fig.suptitle("DCGAN: Generated digits", fontsize=12)
    cnt = 0

    for i in range(row):
        for j in range(col):
            axs[i, j].imshow(gen_imgs[cnt, :, :, :])
            axs[i, j].axis('off')
            cnt += 1

    #plt.close()
    plt.show()

生成器的參數將受到GAN和VAE培訓的影響。

 

<span style="color:rgba(0, 0, 0, 0.84)"><code># note: </code>發電機的參數將受到GAN和VAE培訓的影響


<code>G, D, GAN = VAEGAN(decgen(5, filter_of_decgen, rows, columns, channel),discriminator(5, filter_of_dis, rows, columns, channel))

# encoder
E = encoder(5, filter_of_encoder, rows, columns, channel)
print("This is the summary for encoder:")
E.summary()


# generator/decoder
# G = decgen(5, filter_of_decgen, rows, columns, channel)
print("This is the summary for dencoder/generator:")
G.summary()


# discriminator
# D = discriminator(5, filter_of_dis, rows, columns, channel)
print("This is the summary for discriminator:")
D.summary()


D_fixed = discriminator(5, filter_of_dis, rows, columns, channel)
D_fixed.compile(optimizer=SGDop, loss='mse')

# gan
print("This is the summary for GAN:")
GAN.summary()

# VAE
X = Input(shape=(rows, columns, channel))

E_mean, E_logsigma, Z = E(X)

output = G(Z)
# G_dec = G(E_mean + E_logsigma)
# D_fake, F_fake = D(output)
# D_fromGen, F_fromGen = D(G_dec)
# D_true, F_true = D(X)

# print("type(E)",type(E))
# print("type(output)",type(output))
# print("type(D_fake)",type(D_fake))

VAE = Model(X, output)
VAE.add_loss(vae_loss(X, output, E_mean, E_logsigma))
VAE.compile(optimizer=SGDop)

print("This is the summary for vae:")
VAE.summary()</code></span>

在下面的單元格中,我們開始訓練我們的模型 請注意,我們使用前面的方法來訓練鑑別器和GAN和VAE不同的時間長度。我們強調在訓練過程的前半部分對鑑別器進行訓練,並且我們在下半場更多地訓練發生器,因爲我們想要提高輸出圖像的質量。

# We train our model in this cell

dLoss=[]
gLoss=[]
GLoss = 1
GlossEnc = 1
GlossGen = 1
Eloss = 1

halfbatch_size = int(batch_size*0.5)

for epoch in tqdm(range(epochs)):
    if epoch < int(epochs*0.5):
        noise = np.random.normal(0, 1, (halfbatch_size, latent_dim))
        index = np.random.randint(0,dataset.shape[0], halfbatch_size)
        images = dataset[index]  

        latent_vect = E.predict(images)[0]
        encImg = G.predict(latent_vect)
        fakeImg = G.predict(noise)

        D.Trainable = True
        DlossTrue = D.train_on_batch(images, np.ones((halfbatch_size, 1)))
        DlossEnc = D.train_on_batch(encImg, np.ones((halfbatch_size, 1)))       
        DlossFake = D.train_on_batch(fakeImg, np.zeros((halfbatch_size, 1)))

#         DLoss=np.add(DlossTrue,DlossFake)*0.5
        
        DLoss=np.add(DlossTrue,DlossEnc)
        DLoss=np.add(DLoss,DlossFake)*0.33
        D.Trainable = False

        cnt = epoch

        while cnt > 3:
            cnt = cnt - 4

        if cnt == 0:
            noise = np.random.normal(0, 1, (batch_size, latent_dim))
            index = np.random.randint(0,dataset.shape[0], batch_size)
            images = dataset[index]  
            latent_vect = E.predict(images)[0]     
            
            GlossEnc = GAN.train_on_batch(latent_vect, np.ones((batch_size, 1)))
            GlossGen = GAN.train_on_batch(noise, np.ones((batch_size, 1)))
            Eloss = VAE.train_on_batch(images, None)   
            GLoss=np.add(GlossEnc,GlossGen)
            GLoss=np.add(GLoss,Eloss)*0.33
        dLoss.append([epoch,DLoss[0]]) 
        gLoss.append([epoch,GLoss])
    
    elif epoch >= int(epochs*0.5):
        cnt = epoch
        while cnt > 3:
            cnt = cnt - 4

        if cnt == 0:
            noise = np.random.normal(0, 1, (halfbatch_size, latent_dim))
            index = np.random.randint(0,dataset.shape[0], halfbatch_size)
            images = dataset[index]  

            latent_vect = E.predict(images)[0]
            encImg = G.predict(latent_vect)
            fakeImg = G.predict(noise)

            D.Trainable = True
            DlossTrue = D.train_on_batch(images, np.ones((halfbatch_size, 1)))
        #     DlossEnc = D.train_on_batch(encImg, np.ones((halfbatch_size, 1)))       
            DlossFake = D.train_on_batch(fakeImg, np.zeros((halfbatch_size, 1)))

            DLoss=np.add(DlossTrue,DlossFake)*0.5
        
#             DLoss=np.add(DlossTrue,DlossEnc)
#             DLoss=np.add(DLoss,DlossFake)*0.33
            D.Trainable = False

        noise = np.random.normal(0, 1, (batch_size, latent_dim))
        index = np.random.randint(0,dataset.shape[0], batch_size)
        images = dataset[index]  
        latent_vect = E.predict(images)[0]
        
        GlossEnc = GAN.train_on_batch(latent_vect, np.ones((batch_size, 1)))
        GlossGen = GAN.train_on_batch(noise, np.ones((batch_size, 1)))
        Eloss = VAE.train_on_batch(images, None)   
        GLoss=np.add(GlossEnc,GlossGen)
        GLoss=np.add(GLoss,Eloss)*0.33
    
        dLoss.append([epoch,DLoss[0]]) 
        gLoss.append([epoch,GLoss])

    if epoch % plotInternal == 0 and epoch!=0:
        show_imgs(G)


    dLossArr= np.array(dLoss)
    gLossArr = np.array(gLoss)
    
#     print("dLossArr.shape:",dLossArr.shape)
#     print("gLossArr.shape:",gLossArr.shape)
    
    chk = epoch

    while chk > 50:
        chk = chk - 51

    if chk == 0:
        D.save_weights('/content/gdrive/My Drive/VAE discriminator_kernalsize5_proportion_32.h5')
        G.save_weights('/content/gdrive/My Drive/VAE generator_kernalsize5_proportion_32.h5')
        E.save_weights('/content/gdrive/My Drive/VAE encoder_kernalsize5_proportion_32.h5')

        
    if epoch%20 == 0:    
        print("epoch:", epoch + 1,"  ", "DislossTrue loss:",DlossTrue[0],"D accuracy:",100* DlossTrue[1], "DlossFake loss:", DlossFake[0],"GlossEnc loss:",
          GlossEnc, "GlossGen loss:",GlossGen, "Eloss loss:",Eloss)
#     print("loss:")
#     print("D:", DlossTrue, DlossEnc, DlossFake)
#     print("G:", GlossEnc, GlossGen)
#     print("VAE:", Eloss)

print('Training done,saving weights')
D.save_weights('/content/gdrive/My Drive/VAE discriminator_kernalsize5_proportion_32.h5')
G.save_weights('/content/gdrive/My Drive/VAE generator_kernalsize5_proportion_32.h5')
E.save_weights('/content/gdrive/My Drive/VAE encoder_kernalsize5_proportion_32.h5')


print('painting losses')
# At the end of training plot the losses vs epochs
plt.plot(dLossArr[:, 0], dLossArr[:, 1], label="Discriminator Loss")
plt.plot(gLossArr[:, 0], gLossArr[:, 1], label="Generator Loss")
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('GAN')
plt.grid(True)
plt.show()
print('end')

如果您計劃運行此網絡,請注意訓練過程需要很長時間。除非您可以訪問一些功能強大的GPU或者願意運行該模型一整天,否則我不會嘗試這種方法。

現在我們的VAE-GAN訓練已經完成,我們可以檢查輸出圖像的外觀,並將它們與之前的GAN進行比較。

# In this cell, we generate and visualize 15 images. 

show_imgs(G)


我們可以看到,在VAE-GAN的這個實現中,我們得到了一個很好的模型,它可以生成清晰且與原始圖像類似的圖像。我們的VAE-GAN可以更加穩健地創建圖像,這可以在沒有動畫面部的額外噪音的情況下完成。然而,我們模型的泛化能力不是很好,它很少改變角色的方式或性別,所以這是我們可以嘗試改進的一點。


最後評論

不一定清楚任何一個模型比其他模型更好,並且這些方法都沒有被適當地優化,因此很難進行比較。

這仍然是一個活躍的研究領域,所以如果你有興趣,我建議多給自己出難題,並嘗試在你自己的工作中使用GAN來看看你能想出什麼。

我希望你喜歡這篇關於GAN的文章三部曲,現在可以更好地瞭解它們是什麼,它們能做什麼,以及如何製作自己的。

謝謝你的閱讀!


進一步閱讀

在COLAB中運行BigGAN:

更多代碼幫助+示例:

有影響力的論文:

 

 

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章