Image-to-Image Translation with Conditional Adversarial Networks
Paper:https://arxiv.org/pdf/1611.07004.pdf
Code:https://github.com/affinelayer/Pix2Pix-tensorflow
Tips:CVPR2017的一篇paper。
(閱讀筆記)
1.Main idea
- 用條件GAN解決圖像到圖像的轉換問題。a general-purpose solution to image-to-image translation problems.
- 去學習損失函數來實現圖像到圖像的映射關係。learn a loss function to train this mapping.
2.Intro
- 類似於語言翻譯,給出了圖像到圖像的定義解釋。we define automatic image-to-image translation as the task of translating one possible representation of a scene into another.
- 雖然CNN已經取得了很優秀的結果,但還是需要一個目標函數。In other words, we still have to tell the CNN what we wish it to minimize.
得益於GAN,所以可以直接學到一個高維的Loss function。 - 之前的大多相關工作都是學習圖像與圖像之間的結構形式的損失,然後介紹了條件GAN的發展。
3.Details
- 目標函數與原始GAN的目標函數差不多,只是添加了L1損失,如下式:
注意到如果不加噪聲,那麼生成器只會學習到定式的函數(只會輸出與輸入很類似的結果),這樣的結果是不夠好的。 - 生成器和U-net類似,自編碼器並有跳躍連接的形式。
判別器是一個馬爾科夫過程(patchGAN),並不是整張圖片進行判別,而是一個區域一個區域(patch)的判別,最後結果求平均得分。This discriminator tries to classify if each patch in an image is real or fake.
這樣以後,運行速度更快,參數更少,也能得到很好的結果。produce high quality results; has fewer parameters, runs faster, and can be applied to arbitrarily large images. - 但是代碼的實現卻還是和其他GAN一樣,並沒有發現patch的具體設置,於是:
The difference between a PatchGAN and regular GAN discriminator is that rather the regular GAN maps from a 256x256 image to a single scalar output, which signifies “real” or “fake”, whereas the PatchGAN maps from 256x256 to an array of outputs , where each signifies whether the patch in the image is real or fake.
參考:https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/39
Maybe it would have been better if we called it a “Fully Convolutional GAN” like in FCNs, it is the same idea.