[paper] LAPGAN

(NIPS 2015) Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
Paper: http://arxiv.org/abs/1506.05751
Code: https://github.com/facebook/eyescream

In this paper we introduce a generative parametric model capable of producing high quality samples of natural images.

Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion.

Introduction

Building a good generative model of natural images has been a fundamental problem within computer vision.

However, images are complex and high dimensional, making them hard to model well, despite extensive efforts.

we exploit the multiscale structure of natural images, building a series of generative models, each of which captures image structure at a particular scale of a Laplacian pyramid [1].

At each scale we train a convolutional network-based generative model using the Generative Adversarial Networks (GAN) approach of Goodfellow et al. [11]. Samples are drawn in a coarse-to-fine fashion, commencing with a low-frequency residual image.

The second stage samples the band-pass structure at the next level, conditioned on the sampled residual.

Approach

Generative Adversarial Networks

Laplacian Pyramid

The Laplacian pyramid [1] is a linear invertible image representation consisting of a set of band-pass images, spaced an octave apart, plus a low-frequency residual.

Laplacian Generative Adversarial Networks (LAPGAN)

Our proposed approach combines the conditional GAN model with a Laplacian pyramid representation.

The generative models {G0,,GK} are trained using the CGAN approach at each level of the pyramid.

Specifically, we construct a Laplacian pyramid from each training image I. At each level we make a stochastic choice (with equal probability) to either

(i) construct the coefficients hk either using the standard procedure from Eqn. 3,

or

(ii) generate them using Gk .

Figure 1: The sampling procedure for our LAPGAN model.

Figure 1: The sampling procedure for our LAPGAN model.

We start with a noise sample z3 (right side) and use a generative model G3 to generate I~3 .

This is upsampled (green arrow) and then used as the conditioning variable (orange arrow) l2 for the generative model at the next level, G2 .

Together with another noise sample z2 , G2 generates a difference image h~2 which is added to l2 to create I~2 .

This process repeats across two subsequent levels to yield a final full resolution sample I~0 .

Figure 2: The training procedure for our LAPGAN model.

Starting with a 64x64 input image I from our training set (top left):

(i) we take I0=I and blur and downsample it by a factor of two (red arrow) to produce I1 ;

(ii) we upsample I1 by a factor of two (green arrow), giving a low-pass version l0 of I0 ;

(iii) with equal probability we use l0 to create either a real or a generated example for the discriminative model D0 .

In the real case (blue arrows), we compute high-pass h0=I0l0 which is input to D0 that computes the probability of it being real vs generated.

In the generated case (magenta arrows), the generative network G0 receives as input a random noise vector z0 and l0 . It outputs a generated high-pass image h~0=G0(z0,l0) , which is input to D0 .

In both the real/generated cases, D0 also receives l0 (orange arrow).

Optimizing Eqn. 2, G0 thus learns to generate realistic high-frequency structure h~0 consistent with the low-pass image l0 .

The same procedure is repeated at scales 1 and 2, using I1 and I2 .

Note that the models at each level are trained independently.

At level 3, I3 is an 8×8 image, simple enough to be modeled directly with a standard GANs G3 & D3 .

Model Architecture & Training

CIFAR10 and STL10

LSUN

Experiments

Evaluation of Log-Likelihood

Model Samples

Human Evaluation of Samples

Discussion

發佈了104 篇原創文章 · 獲贊 66 · 訪問量 22萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章