DeepLabv2

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

原文地址：DeepLabv2

收錄：TPAMI2017 (IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017)
代碼:

DeepLabv2可以看成是DeepLabv1的強化版，在空洞卷積和全連接的CRF使用上與DeepLabv1類似~

Abstract

本文爲使用深度學習的語義分割任務，做出了三個主要貢獻：

首先，強調使用空洞卷積，作爲密集預測任務的強大工具。空洞卷積能夠明確地控制DCNN內計算特徵響應的分辨率，即可以有效的擴大感受野，在不增加參數量和計算量的同時獲取更多的上下文。
其次，我們提出了空洞空間卷積池化金字塔(atrous spatial pyramid pooling (ASPP))，以多尺度的信息得到更強健的分割結果。ASPP並行的採用多個採樣率的空洞卷積層來探測，以多個比例捕捉對象以及圖像上下文。
最後，通過組合DCNN和概率圖模型，改進分割邊界結果。在DCNN中最大池化和下采樣組合實現可平移不變性，但這對精度是有影響的。通過將最終的DCNN層響應與全連接的CRF結合來克服這個問題。

論文提出的DeepLabv2在PASCAL VOC2012上表現優異，並在PASCAL-Context, PASCAL-Person-Part, and Cityscapes上都表現不錯。

Introduction

DCNN(Deep Convolutional Neural Networks)將CV（computer vision）系統的性能推向了一個新的高度。成功的關鍵在於DCNN對於局部圖像轉換的內在不變性。這使得模型可以學習高層次的抽象表示。這種不變性帶了高層次抽象表示的同時也可能妨礙諸如語義分割之類的密集預測任務，在空間信息上是不理想的。

將DCNN應用在語義分割任務上，我們着重以下三個問題：

降低特徵分辨率
多個尺度上存在對象
由於DCNN的內在不變性，定位精度底

接下來我們討論並解決這些問題。

第一個挑戰是因爲：DCNN連續的最大池化和下采樣組合引起的空間分辨率下降，爲了解決這個問題，DeepLabv2在最後幾個最大池化層中去除下采樣，取而代之的是使用空洞卷積，以更高的採樣密度計算特徵映射。

第二個挑戰是因爲：在多尺度上存在物體。解決這一問題有一個標準方法是將一張圖片縮放不同版本，彙總特徵或最終預測得到結果，實驗表明能提高系統的性能，但這個增加了計算特徵響應，需要大量的存儲空間。我們受到spatial pyramid pooling(SPP)的啓發，提出了一個類似的結構，在給定的輸入上以不同採樣率的空洞卷積並行採樣，相當於以多個比例捕捉圖像的上下文，稱爲ASPP(atrous spatial pyramid pooling)模塊。

第三個挑戰涉及到以下情況：對象分類要求空間變換不變性，而這影響了DCNN的空間定位精度。解決這一問題的一個做法是在計算最終分類結果時，使用跳躍層，將前面的特徵融合到一起**。DeepLabv2是採樣全連接的CRF在增強模型捕捉細節的能力。**

下面是一個DeepLab的例子：

總體步驟如下：

輸入經過改進的DCNN(帶空洞卷積和ASPP模塊)得到粗略預測結果，即Aeroplane Coarse Score map
通過雙線性插值擴大到原本大小，即Bi-linear Interpolation
再通過全連接的CRF細化預測結果，得到最終輸出Final Output

總結一下，DeepLabv2的主要優點在於：

速度： DCNN在現代GPU上以8FPS運行，全連接的CRF在CPU上需要0.5s
準確性：在PASCAL VOC2012,PASCAL-Context, PASCALPerson-Part,Cityscapes都獲得的優異的結果
簡單性：系統是由兩個非常成熟的模塊級聯而成，DCNNs和CRFs

本文DeepLabv2是在DeepLabv1的基礎上做了改進，基礎層由VGG16換成了更先進的ResNet，添加了多尺度和ASPP模塊技術得到了更好的分割結果。

Related Work

DCNN應用於語義分割任務上，涉及到分類和定位細化，工作的核心是把兩個任務結合起來。

基於DCNN的語義分割系統有三種大類：

第一種：採樣基於DCNN的自下而上的圖像分割級聯。將形狀信息合併的分類過程中，這些方法得益於傳遞的形狀邊界信息，從而能夠很好分割。但這不能從錯誤中恢復出來。(開始錯就會一直錯)
第二種：依靠DCNN做密集計算得到預測結果，並將多個獨立結果做耦合。其中一種是在多個分辨率下使用DCNN，使用分割樹來平滑預測結果。最近有使用skip layer來級聯內部的計算特徵用於分類。
第三種：使用DCNN直接做密集的像素級別分類。直接使用全卷積方式應用在整個圖像，將DCNN後續的FC層轉爲卷積層，爲了處理空間定位問題，使用上採樣和連接中間層的特徵來細化結果。

我們的工作是建立在這些工作的基礎上，自從第一個版本DeepLabv1公佈，許多工作採用了其中一個或兩個關鍵要素：在DCNN的結果上使用全連接的CRF細化結果；使用空洞卷積做密集的特徵提取。也有許多工作着重這兩者間end-to-end的探索，將DCNN和CRF一起做聯合學習。

空洞卷積能在保持計算量和參數量的同時擴大感受野，配合使用金字塔池化方案可以聚合多尺度的上下文信息，可通過空洞卷積控制特徵分辨率、配合更先進的DCNN模型、多尺度聯合技術、並在DCNN之上集成全連接的CRF可以獲取更好的分割結果。

DCNN和CRF的組合不是新話題，以前的作品着重於應用局部CRF，這忽略像素間的長期依賴。而DeepLab採用的是全連接的CRF模型，其中高斯核可以捕獲長期依賴性，從而得到較好的分割結果。

Method

空洞卷積用於密集特徵提取和擴大感受野

DCNN中連續的最大池化和下采樣重複組合層大大的降低了最終的feature map的空間分辨率，有一些補救方式是使用deconvolutional layer(轉置卷積，用於擴大特徵映射分辨率)，但這需要額外的空間和計算量。我們主張使用空洞卷積，可以以任何特徵響應分辨率計算任何層的特徵映射。

關於空洞卷積使用詳解：

首先考慮一維信號，空洞卷積輸出爲 $y[i]$ ,輸入爲 $x[i]$ ,長度K的濾波器爲 $\omega[k]$ 。定義爲: $y[k]=\sum_{k=1}^Kx[i+r·k] \omega[k]$ 輸入採樣的步幅爲參數 $r$ ，標準的採樣率是 $r=1$ .如下圖(a)所示:

圖(b)是採樣率 $r=2$ 的採樣情況.

在看看在二維信號(圖片)上使用空洞卷積的表現，給定一個圖像：

上分支：首先下采樣將分辨率降低2倍，做卷積。再上採樣得到結果。本質上這只是在原圖片的1/4內容上做卷積響應。
下分支：如果我們將全分辨率圖像做空洞卷積(採樣率爲2，核大小與上面卷積核相同)，直接得到結果。這樣可以計算出整張圖像的響應，如上圖所示，這樣做效果更佳。

空洞卷積能夠放大濾波器的感受野，速率 $r$ 引入 $r-1$ 個零，有效的將感受野從 $k×k$ 擴展到 $k_e=k+(k-1)(r-1)$ ，而不增加參數和計算量。在DCNN中，常見的做法是混合使用空洞卷積以高的分辨率(理解爲採樣密度)計算最終的DCNN網絡響應。DeepLabv2中使用空洞卷積將特徵的密度提升4倍，將輸出的特徵響應雙線性插值上採樣8倍恢復到原始的分辨率。

使用ASPP模塊表示多尺度圖像

許多工作證明使用圖像的多尺度信息可以提高DCNN分割不同大小物體的精度，我們嘗試了兩種方法來處理語義分割中尺度變化。

第一種方法是標準的多尺度處理：將放縮輸入爲不同版本，分別輸入到DCNN中，融合得到分數圖得到預測結果。這可以顯著的提升預測結果，但是這也耗費了大量的計算力和空間。
第二種方法是受到 SPPNet中SPP模塊結構的啓發。

上圖爲SPPNet中SPP模塊樣式。
DeepLabv2的做法與SPPNet類似，並行的採用多個採樣率的空洞卷積提取特徵，再將特徵融合，類似於空間金字塔結構，形象的稱爲Atrous Spatial Pyramid Pooling (ASPP)。示意圖如下：

在同一Input Feature Map的基礎上，並行的使用4個空洞卷積，空洞卷積配置爲 $r=\{6,12,18,24\}$ ，核大小爲 $3×3$ 。最終將不同卷積層得到的結果做像素加融合到一起.

使用全連接CRF做結構預測用於恢復邊界精度

因爲最大池化和下采樣組合，DCNN的高層特徵具有內在不變性(這一點反覆說了很多遍了~)。分類性能和定位準確性之間的折中似乎是固有的。如下圖，DCNN可以預測對象存在和粗略的位置，但不能精確的劃定其邊界：

我們將DCNN和全連接的CRF組合到一起，這在前面的DeepLabv1-CRF在語義分割上的應用中詳解過了，這部分就跳過了~

Experiment

DeepLabv2在PASCAL VOC 2012, PASCAL-Context, PASCALPerson- Part, and Cityscapes四個數據集上做了評估。

測試細節：

項目	設置
DCNN模型	權重採用預訓練的VGG16，ResNet101
DCNN損失函數	輸出的結果與ground truth下采樣8倍做像素交叉熵
訓練器	SGD，batch=20
學習率	初始爲0.001，最後的分類層是0.01。每2000次迭代乘0.1
權重	0.9的動量， 0.0005的衰減

模型對預訓練的VGG16和ResNet101模型做fine-tune。訓練時DCNN和CRF的是解耦的，即分別訓練，訓練CRF時DCNN輸出作爲CRF的一元勢函數輸入是固定的。

大概訓練驗證手段是對CRF做交叉驗證。使用 $\omega_2=3$ 和 $\sigma_{\gamma}=3$ 在小的交叉驗證集上尋找最佳的 $\omega_1,\sigma_{\alpha},\sigma_{\beta}$ ,採用從粗到細的尋找策略。

不同卷積核大小和採樣率的組合下的模型：

DeepLab-LargeFOV(kernel size3×3, r = 12)取得了一個很好的平衡。可以看出小卷積核配合高採樣率可以保持感受野的前提下顯著減少參數量，同時加快計算速度，而且分割效果也很好。使用CRF做後端處理可以保持平均提升了3~5%的性能。

PASCAL VOC 2012

在PASCAL VOC 2012上評估了DeepLab-CRF-LargeFOV模型，這裏做了三個主要的改進：

1.訓練期間使用不同的學習策略；
2.使用ASPP模塊；
3.使用深度網絡和多層次處理.

學習策略實驗

poly學習策略：學習率計算公式爲 $lr_{base}*(1-\frac{iter}{max_iter})^{power}$ ，其中 $power=0.9$ 。

由上圖結果，可以看出使用poly策略比固定的step策略效果要好。使用小批量batch_size=10(大了很喫顯存，很喫硬件)，訓練20K得到的效果最佳。

ASPP模塊實驗

ASPP的結構如下圖所示，

並行以不同採樣率的空洞卷積捕獲不同大小的上下文信息。

下表報告了不同ASPP模塊配置的實驗結果：

baseline的LargeFOV：具體結構如Fig7的(a)，在FC6上具有 $r=12$ 的單分支
ASPP-S：具有並行四分支，空洞卷積採用較小的採樣率 $r={2,4,8,12}$
ASPP-L：具有並行四分支，空洞卷積採用較大的採樣率 $r={6,12,18,24}$

可以看到，使用大采樣率的ASPP模塊效果要突出。

使用ASPP模塊的可視化結果：

高採樣率的ASPP更能捕獲全局信息，相比之下分割更爲合理。

不同深度網絡和多尺度處理實驗

DeepLabv2主要是在ResNet上做實驗，對比了幾個方法：

多尺度輸入：以比例 $\{ 0.5,0.75,1\}$ 將輸入送到DCNN，融合結果
在MS-COCO上預訓練模型
訓練期間隨機縮放(0.5到1.5)輸入圖片做數據增強

模型處理方法的影響結果：

多尺度帶來了2.55%的提升。多種技巧融合得到了77.69%的結果。

使用CRF後端處理的可視化效果：

CRF細化了分割結果，恢復一些錯分的像素，同時也明確了一部分分割邊界。

DeepLabv2與其他先進模型相比：

效果那自然是很好的~

對比了ResNet101和VGG16做基礎層對比：

採用ResNet101做基礎層顯著的提升了模型性能，基於ResNet101的DeepLab能夠比VGG16更好的沿邊界分割。

PASCAL-Context

在PASCAL-Context上基於VGG16和ResNet101不同變體的模型與其他先進模型的對比結果：

採樣多種技巧將最終結果提升到了45.7%。

可視化結果如下：

可以看到將更好的沿邊界分割了。

PASCAL-Person-Part

在PASCAL-Person-Part數據集上更關注於ResNet101模型，與其他模型對比結果：

可視化結果如下：

Cityscapes

因爲Cityscapes的數據分辨率較大，故先做下采樣2倍，同時使用的各種技巧得到不錯的實驗結果：

可視化結果如下：

失敗案例

論文給出了一些訓練失敗的案例，如下圖：

模型丟失了很多細節，並在CRF後丟失現象更嚴重了。

Conclusion

DeepLabv2將空洞卷積應用到密集的特徵提取，進一步的提出了空洞卷積金字塔池化結構、並將DCNN和CRF融合用於細化分割結果。實驗表明，DeepLabv2在多個數據集上表現優異，有着不錯的分割性能。

代碼分析

代碼參考的是github-TensorFlow版本。注意這個代碼沒有實現CRF部分。

這裏模型使用的框架和前面的筆記ICNet代碼框架相同，可參考初期的NetWork設置詳解。

DeepLab_ResNet結構

關於裝飾器等定義在NetWork.py中了，這裏就不贅述。主要看DeepLab_ResNet模型定義。

前面主要是ResNet101的變體結構定義：

ResNet的前體

Reset基礎層有兩個常見的殘差模塊的變體：

左邊是普通的殘差單元：輔分支通道直接恆等映射，主分支的前兩個卷積都降通道數了，第三個卷積擴大回原通道數。這可以保持分割結果的同時大幅度減少計算量。
右邊是特殊的殘差單元：功能是增通道，即輔和主分支都增加通道；功能是增通道降採樣，即輔分支卷積採用2步長，主分支第一個卷積採用2步長；功能包括空洞卷積的，將原 $3×3$ 的普通卷積替換爲不同採樣率的空洞卷積即可。

ResNet部分代碼如下：

from kaffe.tensorflow import Network
import tensorflow as tf

class DeepLabResNetModel(Network):
    def setup(self, is_training, num_classes):
        '''Network definition.
        
        Args:
          is_training: whether to update the running mean and variance of the batch normalisation layer.
                       If the batch size is small, it is better to keep the running mean and variance of 
                       the-pretrained model frozen.
          num_classes: number of classes to predict (including background).
        '''
        (self.feed('data')
             .conv(7, 7, 64, 2, 2, biased=False, relu=False, name='conv1')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn_conv1')
             .max_pool(3, 3, 2, 2, name='pool1')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch1')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn2a_branch1'))

        (self.feed('pool1')
             .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2a_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2a_branch2a')
             .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2a_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2a_branch2b')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn2a_branch2c'))

        (self.feed('bn2a_branch1', 
                   'bn2a_branch2c')
             .add(name='res2a')
             .relu(name='res2a_relu')
             .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2b_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2b_branch2a')
             .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2b_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2b_branch2b')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2b_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn2b_branch2c'))

        (self.feed('res2a_relu', 
                   'bn2b_branch2c')
             .add(name='res2b')
             .relu(name='res2b_relu')
             .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2c_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2c_branch2a')
             .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2c_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn2c_branch2b')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2c_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn2c_branch2c'))

        (self.feed('res2b_relu', 
                   'bn2c_branch2c')
             .add(name='res2c')
             .relu(name='res2c_relu')
             .conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res3a_branch1')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn3a_branch1'))

        (self.feed('res2c_relu')
             .conv(1, 1, 128, 2, 2, biased=False, relu=False, name='res3a_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn3a_branch2a')
             .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3a_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn3a_branch2b')
             .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3a_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn3a_branch2c'))

        (self.feed('bn3a_branch1', 
                   'bn3a_branch2c')
             .add(name='res3a')
             .relu(name='res3a_relu')
             .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b1_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn3b1_branch2a')
             .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b1_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn3b1_branch2b')
             .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b1_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn3b1_branch2c'))

        (self.feed('res3a_relu', 
                   'bn3b1_branch2c')
             .add(name='res3b1')
             .relu(name='res3b1_relu')
             .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b2_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn3b2_branch2a')
             .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b2_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn3b2_branch2b')
             .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b2_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn3b2_branch2c'))

        (self.feed('res3b1_relu', 
                   'bn3b2_branch2c')
             .add(name='res3b2')
             .relu(name='res3b2_relu')
             .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b3_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn3b3_branch2a')
             .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b3_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn3b3_branch2b')
             .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b3_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn3b3_branch2c'))

        (self.feed('res3b2_relu', 
                   'bn3b3_branch2c')
             .add(name='res3b3')
             .relu(name='res3b3_relu')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4a_branch1')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4a_branch1'))

上面一段代碼示意圖如下：

總結一下，假設原輸入爲 $(1024,1024,3)$

先卷積+池化操作，提取特徵映射pool1爲 $(256,256,64)$
經過一個增加通道的殘差模塊，得到res2a_relu爲 $(256,256,256)$ ，後面再跟兩個普通的殘差模塊，得到res2c_relu
再接一個升通道降採樣的殘差模塊，得到res3a_relu爲 $(128,128,512)$ ，後面接三個普通的殘差模塊，得到res3b3_relu

這是ResNet變體初期的實現，到res3b3_relu輸出的特徵映射步幅已經爲 $\frac{1024}{128}=8$ 了，後面要配合帶空洞卷積的殘差模塊了~

ResNet的變體部分

DeepLabv2中使用的ResNet變體結構，前部分與原ResNet模型基本一致(即上面的代碼)，下部分是主要的改進部分了：

        '''與上面代碼有重複  '''
        (self.feed('res3b2_relu', 
                   'bn3b3_branch2c')
             .add(name='res3b3')
             .relu(name='res3b3_relu')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4a_branch1')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4a_branch1'))
             
        (self.feed('res3b3_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4a_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4a_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4a_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4a_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4a_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4a_branch2c'))

        (self.feed('bn4a_branch1', 
                   'bn4a_branch2c')
             .add(name='res4a')
             .relu(name='res4a_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b1_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b1_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b1_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b1_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b1_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b1_branch2c'))

        (self.feed('res4a_relu', 
                   'bn4b1_branch2c')
             .add(name='res4b1')
             .relu(name='res4b1_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b2_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b2_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b2_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b2_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b2_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b2_branch2c'))

        (self.feed('res4b1_relu', 
                   'bn4b2_branch2c')
             .add(name='res4b2')
             .relu(name='res4b2_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b3_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b3_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b3_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b3_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b3_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b3_branch2c'))

        (self.feed('res4b2_relu', 
                   'bn4b3_branch2c')
             .add(name='res4b3')
             .relu(name='res4b3_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b4_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b4_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b4_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b4_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b4_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b4_branch2c'))

        (self.feed('res4b3_relu', 
                   'bn4b4_branch2c')
             .add(name='res4b4')
             .relu(name='res4b4_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b5_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b5_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b5_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b5_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b5_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b5_branch2c'))

        (self.feed('res4b4_relu', 
                   'bn4b5_branch2c')
             .add(name='res4b5')
             .relu(name='res4b5_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b6_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b6_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b6_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b6_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b6_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b6_branch2c'))

        (self.feed('res4b5_relu', 
                   'bn4b6_branch2c')
             .add(name='res4b6')
             .relu(name='res4b6_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b7_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b7_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b7_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b7_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b7_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b7_branch2c'))

        (self.feed('res4b6_relu', 
                   'bn4b7_branch2c')
             .add(name='res4b7')
             .relu(name='res4b7_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b8_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b8_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b8_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b8_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b8_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b8_branch2c'))

        (self.feed('res4b7_relu', 
                   'bn4b8_branch2c')
             .add(name='res4b8')
             .relu(name='res4b8_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b9_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b9_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b9_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b9_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b9_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b9_branch2c'))

        (self.feed('res4b8_relu', 
                   'bn4b9_branch2c')
             .add(name='res4b9')
             .relu(name='res4b9_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b10_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b10_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b10_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b10_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b10_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b10_branch2c'))

        (self.feed('res4b9_relu', 
                   'bn4b10_branch2c')
             .add(name='res4b10')
             .relu(name='res4b10_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b11_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b11_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b11_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b11_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b11_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b11_branch2c'))

        (self.feed('res4b10_relu', 
                   'bn4b11_branch2c')
             .add(name='res4b11')
             .relu(name='res4b11_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b12_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b12_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b12_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b12_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b12_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b12_branch2c'))

        (self.feed('res4b11_relu', 
                   'bn4b12_branch2c')
             .add(name='res4b12')
             .relu(name='res4b12_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b13_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b13_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b13_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b13_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b13_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b13_branch2c'))

        (self.feed('res4b12_relu', 
                   'bn4b13_branch2c')
             .add(name='res4b13')
             .relu(name='res4b13_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b14_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b14_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b14_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b14_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b14_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b14_branch2c'))

        (self.feed('res4b13_relu', 
                   'bn4b14_branch2c')
             .add(name='res4b14')
             .relu(name='res4b14_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b15_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b15_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b15_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b15_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b15_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b15_branch2c'))

        (self.feed('res4b14_relu', 
                   'bn4b15_branch2c')
             .add(name='res4b15')
             .relu(name='res4b15_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b16_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b16_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b16_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b16_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b16_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b16_branch2c'))

        (self.feed('res4b15_relu', 
                   'bn4b16_branch2c')
             .add(name='res4b16')
             .relu(name='res4b16_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b17_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b17_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b17_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b17_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b17_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b17_branch2c'))

        (self.feed('res4b16_relu', 
                   'bn4b17_branch2c')
             .add(name='res4b17')
             .relu(name='res4b17_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b18_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b18_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b18_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b18_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b18_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b18_branch2c'))

        (self.feed('res4b17_relu', 
                   'bn4b18_branch2c')
             .add(name='res4b18')
             .relu(name='res4b18_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b19_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b19_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b19_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b19_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b19_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b19_branch2c'))

        (self.feed('res4b18_relu', 
                   'bn4b19_branch2c')
             .add(name='res4b19')
             .relu(name='res4b19_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b20_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b20_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b20_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b20_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b20_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b20_branch2c'))

        (self.feed('res4b19_relu', 
                   'bn4b20_branch2c')
             .add(name='res4b20')
             .relu(name='res4b20_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b21_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b21_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b21_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b21_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b21_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b21_branch2c'))

        (self.feed('res4b20_relu', 
                   'bn4b21_branch2c')
             .add(name='res4b21')
             .relu(name='res4b21_relu')
             .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b22_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b22_branch2a')
             .atrous_conv(3, 3, 256, 2, padding='SAME', biased=False, relu=False, name='res4b22_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn4b22_branch2b')
             .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b22_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn4b22_branch2c'))

        (self.feed('res4b21_relu', 
                   'bn4b22_branch2c')
             .add(name='res4b22')
             .relu(name='res4b22_relu')
             .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5a_branch1')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn5a_branch1'))

        (self.feed('res4b22_relu')
             .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5a_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn5a_branch2a')
             .atrous_conv(3, 3, 512, 4, padding='SAME', biased=False, relu=False, name='res5a_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn5a_branch2b')
             .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5a_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn5a_branch2c'))

        (self.feed('bn5a_branch1', 
                   'bn5a_branch2c')
             .add(name='res5a')
             .relu(name='res5a_relu')
             .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5b_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn5b_branch2a')
             .atrous_conv(3, 3, 512, 4, padding='SAME', biased=False, relu=False, name='res5b_branch2b')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn5b_branch2b')
             .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5b_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn5b_branch2c'))

        (self.feed('res5a_relu', 
                   'bn5b_branch2c')
             .add(name='res5b')
             .relu(name='res5b_relu')
             .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5c_branch2a')
             .batch_normalization(is_training=is_training, activation_fn=tf.nn.relu, name='bn5c_branch2a')
             .atrous_conv(3, 3, 512, 4, padding='SAME', biased=False, relu=False, name='res5c_branch2b')
             .batch_normalization(activation_fn=tf.nn.relu, name='bn5c_branch2b', is_training=is_training)
             .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5c_branch2c')
             .batch_normalization(is_training=is_training, activation_fn=None, name='bn5c_branch2c'))
             
        (self.feed('res5b_relu', 
                   'bn5c_branch2c')
             .add(name='res5c')
             .relu(name='res5c_relu')
             .atrous_conv(3, 3, num_classes, 6, padding='SAME', relu=False, name='fc1_voc12_c0'))

上面一段代碼示意圖如下：

總結一下，輸入res3b3_relu爲 $(128,128,512)$

經過升通道帶空洞卷積(r=2)的殘差模塊，得到res4a_relu爲 $(128,128,1024)$
再經過22個帶空洞卷積(r=2)的殘差模塊，得到res4b_relu爲 $(128,128,1024)$
再接一個升通道帶空洞卷積(r=4)的殘差模塊，得到res5a_relu爲 $(128,128,2048)$
後面接兩個帶空洞卷積(r=4)的的殘差模塊，得到res5c_relu爲 $(128,128,2048)$

這是ResNet變體後部分的實現，到res5c_relu輸出的特徵映射步幅雖然是 $\frac{1024}{128}=8$ 了，但是因爲空洞卷積的使用，感受野擴大了很多，到這裏，ResNet的部分算是結束了，下面就是ASPP模塊了~

ASPP模塊

DeepLabv2的ASPP和SPP模塊很相似，主要就是在同一輸入特徵上應用不同採樣率的空洞卷積，將結果融合到一起即可~

        (self.feed('res5b_relu', 
                   'bn5c_branch2c')
             .add(name='res5c')
             .relu(name='res5c_relu')
             .atrous_conv(3, 3, num_classes, 6, padding='SAME', relu=False, name='fc1_voc12_c0'))

        (self.feed('res5c_relu')
             .atrous_conv(3, 3, num_classes, 12, padding='SAME', relu=False, name='fc1_voc12_c1'))

        (self.feed('res5c_relu')
             .atrous_conv(3, 3, num_classes, 18, padding='SAME', relu=False, name='fc1_voc12_c2'))

        (self.feed('res5c_relu')
             .atrous_conv(3, 3, num_classes, 24, padding='SAME', relu=False, name='fc1_voc12_c3'))

        (self.feed('fc1_voc12_c0', 
                   'fc1_voc12_c1', 
                   'fc1_voc12_c2', 
                   'fc1_voc12_c3')
             .add(name='fc1_voc12'))

上面一段代碼示意圖如下：

總結一下，輸入res5c_relu爲 $(128,128,2048)$

並行經過空洞卷積層，卷積核爲 $(3,3,num\_class)$ ,取 $num\_class=21$
4個並行空洞卷積採樣率爲 $r=6,12,18,24$
得到的輸出作像素加，得到最終輸出fc1_voc12爲 $(128,128,21)$

這是ASPP模塊的實現。

總得來說DeepLabv2中關於DCNN模型的實驗還是很容易理解的(起碼比前面的ICNet看起來簡單多了)，關於CRF部分，現在TensorFlow1.4中有contrib的CRF，有興趣的可以實驗一下~

Semantic Segmentation -- (DeepLabv2)Semantic Image Segmentation ... Fully Connected CRFs論文解讀

DeepLabv2

Abstract

Introduction

Related Work

Method

空洞卷積用於密集特徵提取和擴大感受野

使用ASPP模塊表示多尺度圖像

使用全連接CRF做結構預測用於恢復邊界精度

Experiment

PASCAL VOC 2012

學習策略實驗

ASPP模塊實驗

不同深度網絡和多尺度處理實驗

PASCAL-Context

PASCAL-Person-Part

Cityscapes

失敗案例

Conclusion

代碼分析

DeepLab_ResNet結構

ResNet的前體

ResNet的變體部分

ASPP模塊

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

2020年上半年數據庫系統工程師考試

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

Semantic Segmentation -- (DeepLabv2)Semantic Image Segmentation ... Fully Connected CRFs論文解讀

Object Detection -- 論文FPN(Feature Pyramid Networks for Object Detection)解讀

TensorFlow實戰：Chapter-8上(Mask R-CNN介紹與實現)

機器學習Chapter3-(聚類分析)聚類簡介

論文DenseNet（Densely Connected Convolutional Networks）解讀

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結