MyDLNote - Network: [18ECCV] Image Inpainting for Irregular Holes Using Partial Convolutions

Image Inpainting for Irregular Holes Using Partial Convolutions

我的博客儘可能提取文章內的主要傳達的信息,既不是完全翻譯,也不簡單粗略。論文的motivation網絡設計細節,將是我寫這些博客關注的重點。

目錄

Image Inpainting for Irregular Holes Using Partial Convolutions

Abstract

Introduction

Approach

Partial Convolutional Layer

Network Architecture and Implementation

Loss Functions


Abstract

Existing deep learning based image inpainting methods use a standard convolutional network over the corrupted image, using convolutional filter responses conditioned on both valid pixels as well as the substitute values in the masked holes (typically the mean value). This often leads to artifacts such as color discrepancy and blurriness. Postprocessing is usually used to reduce such artifacts, but are expensive and may fail. We propose the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels. We further include a mechanism to automatically generate an updated mask for the next layer as part of the forward pass. Our model outperforms other methods for irregular masks. We show qualitative and quantitative comparisons with other methods to validate our approach.

標準卷積的方法:修復時,卷積遍歷整個圖像。卷積的輸出是對有效像素和mask空洞中的替換值來計算的,但後者是不應該用來包括進來。這樣需要加入後處理,但通常是耗時或無效的。

本文的Pcovn,1)卷積是經過mask的,2)renormalize則只考慮有效像素,3)自動生成下一層的更新mask,作爲前向傳遞的一部分。

 


Introduction

Previous deep learning approaches have focused on rectangular regions located around the center of the image, and often rely on expensive post-processing. The goal of this work is to propose a model for image inpainting that operates robustly on irregular hole patterns (see Fig. 1), and produces semantically meaningful predictions that incorporate smoothly with the rest of the image without the need for any additional post-processing or blending operation.

本文的目的:不規則空洞修復; 不借助後處理或blending操作,產生語義上有意義的預測。

To properly handle irregular masks, we propose the use of a Partial Convolutional Layer, comprising a masked and re-normalized convolution operation followed by a mask-update step. The concept of a masked and re-normalized convolution is also referred to as segmentation-aware convolutions in [Segmentation-aware convolutional networks using local attention masks] for the image segmentation task, however they did not make modifications to the input mask.

不規則空洞修復手段:爲了正確處理不規則掩碼,我們建議使用部分卷積層,它包括掩碼和重新歸一化卷積操作,然後是 mask 更新步驟。在 [Segmentation-aware convolutional networks using local attention masks] 中,對於圖像分割任務,掩碼和重新歸一化卷積的概念也被稱爲分割感知卷積,但是它們沒有對輸入掩碼進行修改。

Our use of partial convolutions is such that given a binary mask our convolutional results depend only on the non-hole regions at every layer. Our main extension is the automatic mask update step, which removes any masking where the partial convolution was able to operate on an unmasked value. Given sufficient layers of successive updates, even the largest masked holes will eventually shrink away, leaving only valid responses in the feature map. The partial convolutional layer ultimately makes our model agnostic to placeholder hole values.

部分卷積策略:我們使用的部分卷積是這樣的,給定一個二進制mask,我們的卷積結果只依賴於每一層的非空洞區域。我們的主要擴展是自動mask更新步驟,它刪除了部分卷積能夠對未masked值進行操作的任何mask。如果有足夠的連續更新層,即使最大的掩蔽漏洞最終也會消失,只在功能圖中留下有效的響應。部分卷積層最終使我們的模型不受佔位符空洞值的影響。

 


Approach

Partial Convolutional Layer

我的理解:partial convolution layer = partial convolution + mask update

partial convolution:

We refer to our partial convolution operation and mask update function jointly as the Partial Convolutional Layer. Let W be the convolution filter weights for the convolution filter and b is the corresponding bias. X are the feature values (pixels values) for the current convolution (sliding) window and M is the corresponding binary mask. The partial convolution at every location, similarly defined in [Segmentation-aware convolutional networks using local attention masks], is expressed as:

where \odot denotes element-wise multiplication, and 1 has same shape as M but with all the elements being 1. As can be seen, output values depend only on the unmasked inputs. The scaling factor sum(1)/sum(M) applies appropriate scaling to adjust for the varying amount of valid (unmasked) inputs.

先給出 partial convolution 的定義。

比例因子sum(1)/sum(M)用來調整適當的比例對有效 (unmasked) 輸入的變化量。

輸出只取決於輸入的非 masked 區域。

mask update:

After each partial convolution operation, we then update our mask as follows: if the convolution was able to condition its output on at least one valid input value, then we mark that location to be valid. This is expressed as:

and can easily be implemented in any deep learning framework as part of the forward pass.

mask更新準則:如果卷積能夠將它的輸出條件設置爲至少一個有效的輸入值,那麼我們將該位置標記爲有效。

可以應用於任何深度學習網絡的前饋網絡當中。

 

Network Architecture and Implementation

Network Design. We design a UNet-like architecture similar to the one used in [pix2pix], replacing all convolutional layers with partial convolutional layers and using nearest neighbor up-sampling in the decoding stage. The skip links will concatenate two feature maps and two masks respectively, acting as the feature and mask inputs for the next partial convolution layer. The last partial convolution layer’s input will contain the concatenation of the original input image with hole and original mask, making it possible for the model to copy non-hole pixels. Network details are found in the supplementary file.

Partial Convolution as Padding. We use the partial convolution with appropriate masking at image boundaries in lieu of typical padding . This ensures that the inpainted content at the image border will not be affected by invalid values outside of the image – which can be interpreted as another hole.

本文采用的網絡是 pix2pix,其中所有的卷積換爲 PConv,最後一層的輸入將通道相加輸入圖像和初始 mask(就是一個殘差式結構)。

 傳統的 padding 是直接在 feature map 邊緣補 0。而本文則把邊緣也當做是空洞,用 PConv 去學習並補充。

 

Loss Functions

perpixel losses

perceptual loss

style-loss

total variation (TV) loss

Removing Checkerboard Artifacts and Fish Scale Artifacts

Perceptual loss is known to generate checkerboard artifacts. Johnson et al.  suggests to ameliorate the problem by using the total variation (TV) loss. We found this not to be the case for our model. Figure 3(b) shows the result of the model trained by removing L_{style_{out}} and L_{style_{comp }} from L_{total}. For our model, the additional style loss term is necessary. However, not all the loss weighting schemes for the style loss will generate plausible results. Figure 3(f) shows the result of the model trained with a small style loss weight. Compared to the result of the model trained with full L_{total} in Figure 3(g), it has many fish scale artifacts. However, perceptual loss is also important; grid-shaped artifacts are less prominent in the results with full L_{total} (Figure 3(k)) than the results without perceptual loss (Figure 3(j)). We hope this discussion will be useful to readers interested in employing VGG-based high level losses.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章