Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

This repository contains pre-trained models and sampling code for the 3D Generative Adversarial Network (3D-GAN) presented at NIPS 2016.

http://3dgan.csail.mit.edu

Prerequisites

論文介紹

3D-GAN which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets. The benefits of our model are
three-fold: first, the use of an adversarial criterion, instead of traditional heuristic
criteria, enables the generator to capture object structure implicitly and to synthesize high-quality 3D objects; second, the generator establishes a mapping from
a low-dimensional probabilistic space to the space of 3D objects, so that we can
sample objects without a reference image or CAD models, and explore the 3D
object manifold; third, the adversarial discriminator provides a powerful 3D shape
descriptor which, learned without supervision, has wide applications in 3D object
recognition.

a generative model should be able to go beyond memorizing and

recombining parts or pieces from a pre-defined repository to produce novel shapes; and for objects to
be realistic, there need to be fine details in the generated examples.

隨着large 3d cad數據集的引入,如ShapeNet,產生了一些有趣的想法:如基於體素的深度表示;不同於檢索式,它是用深層表示來合成3d 物體的。即隱形編碼2d圖片,用於產生其深度特徵表示,再利用其深度特徵產生3d圖像;本文方法將深度對抗網絡和體元卷積網絡融合,通過判別器來判斷3d object is synthesized or real.
作者用高斯或者均值分佈來生成3d object,並且利用無監督的方式。判別器和傳統的3d物體判別其一樣,可以用來判別一個輸入是否爲真實的3d物體。另外,作者實驗了利用VAE,將2d圖像作爲輸入,先產生其向量表示,再生成其對應的3d物體,其中當然有一定的限制,即輸入圖像的範圍式一定的,不可任意輸入。

近年來一些工作如:學習聯合學習嵌入的3d shape和合成圖像;學習判斷3d object的判別式表達;利用循環網絡進行3d重構;嘗試生成3d圖像;有些將圖像用於3d像2d投射的過程中;大多數這些都是監督訓練的,可以用來3d shape檢索、分類、重構;
Network structure Inspired by Radford et al. [2016], we design an all-convolutional neural

network to generate 3D objects. As shown in Figure 1, the generator consists of five volumetric fully
convolutional layers of kernel sizes 4 × 4 × 4 and strides 2, with batch normalization and ReLU
layers added in between and a Sigmoid layer at the end. The discriminator basically mirrors the
generator, except that it uses Leaky ReLU [Maas et al., 2013] instead of ReLU layers. There are no
pooling or linear layers in our network.

Training details A straightforward training procedure is to update both the generator and the

discriminator in every batch. However, the discriminator usually learns much faster than the generator,
possibly because generating objects in a 3D voxel space is more difficult than differentiating between
real and synthetic objects [Goodfellow et al., 2014, Radford et al., 2016]. It then becomes hard
for the generator to extract signals for improvement from a discriminator that is way ahead, as all
examples it generated would be correctly identified as synthetic with high confidence. Therefore,
to keep the training of both networks in pace, we employ an adaptive training strategy: for each
batch, the discriminator only gets updated if its accuracy in the last batch is not higher than 80%. We
observe this helps to stabilize the training and to produce better results. We set the learning rate of
G to 0:0025, D to 10−5, and use a batch size of 100. We use ADAM [Kingma and Ba, 2015] for
optimization, with β = 0:5.
這裏寫圖片描述
實驗部分主要是:1、可視化判別器;2、比較生成器,是否記住數據;3、插值,有一類物體線性插值到另一類物體,如從汽車逐漸變化到船
4、算術運算,即物體可以做bool運算,一個東西減去另一個東西得到其剩餘部分,兩個東西相加得到兩個物體的和。

最後一個分是如何訓練2d圖像到3d物體的映射,如下圖:


文章多處強調無監督學習,但沒有提如何進行無監督訓練。 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章