Delve into FastFCN:Rethinking Dilated Convolution in the backbone for Semantic Segmentation

原創

2020-07-03 09:47

1. What is the contribution of this paper?

This paper proposed joint unsampling module named Joint Pyramid Upsampling(JPU) by formulating the task of extracting high-resolution feature maps into a joint upsampling problem. This module can replace the dilated convolutions which are embbed into the backbone to extract hight-resoltution feature maps, resutlt in less computation complexity and memory footprint but no performance loss.

2. What is the method?

Main idea: Feature $Y$ extraxted by dilated convolution can be approximated by regular convolution when input $X_l$ of regular convolution is downsmapled from the input $X_h$ of dilated convolution. Formally, given the input feature map $x$ , and $x_l$ , the latter $x_l$ is downsampled for the former $x$ (usually, using regular convolution with stride >=2), the output feature map $y_d$ of dilated convolution with input $x$ can be approximated by the output feature map $y_s$ of reguar convolution with input $x$ . Mathematically,
$d_d \approx \hat{h}(x)$ where $\hat{h} = \argmin \limits_{h \in \mathcal{H}} \lVert y_s - h(x_l) \rVert, \space\space\space x_l = Conv_{stride>=2}(x)$ This idea is the same as joint upsample . Thus, the key of the method is how to find the mapping $\hat{h}$ . In this paper, author design JPU module (Joint Pyramid Upsampling) to simulate optimization process. The goal is to produce the feature map whose os = 8 when last layer feature map downsampling rate is 32. Here is JPU module figure:

3. My comment

This paper redefines the relationship between dilated convolution and regular convolution with stride>=2 as a joint upsampling problem, and show reasonable proof. The detailed proofing process can be seen in the Sec.3.3 of this paper. Then, it proposes Joint Pyramid Upsampling module to solve the optimized $\hat{y}$ . However, I think author did not explain cleary why JPU can sovle the mapping $\hat{y}$ . For example, if the trained JPU module can be regarded as the optimizest $\hat{y}$ , what are the high-resolution input, low-resolution input and low-resolution target, which are essential in Joint Upsmapling problem. Thus, it is unreasonable to think that JPU can solve the $\hat{y}$ . I think JPU just collects richer contextual information to do prediction, it can be considered as an extension of ASPP.

4. Something worth thinking about

4.1 why is there less computation complexity and memory footprint?

In the paper, author said that “Compared to DilatedFCN, our method takes 4 times fewer computation and memory resources in 23 residual blocks (69 layers) and 16 times fewer in 3 blocks (9 layers) when the backbone is ResNet-101.” Why is here 4 times and 16 times fewer computation and memory resources. conv4 feature map spatial size is 2 times larger than that of conv3, conv5 feature is 4 times larger than that of conv3, Thus why is not 2 times and 4 times. How to calculate it?

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Delve into FastFCN:Rethinking Dilated Convolution in the backbone for Semantic Segmentation

1. What is the contribution of this paper?

2. What is the method?

3. My comment

4. Something worth thinking about

4.1 why is there less computation complexity and memory footprint?

SQL優化-20231016

Accumulation of LaTex

Installation Problem of PyTorch using Ananconda

Delve into FastFCN:Rethinking Dilated Convolution in the backbone for Semantic Segmentation

Some Issues About Producing Reference Part in LaTex Environment withTexStudio

用Python做算法題常用的知識點

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結