TridentNet筆記

論文地址:

Abstract(以摘要爲提綱,每句話對應筆記中的一部分)

Scale variation is one of the key challenges in object detection. (首句表明本文就是在Scale variation上作貢獻)

In this work, we first present a controlled experiment to investigate the effect of receptive fields for scale variation in object detection.(Receptive field 即“感受野” 對Scale variation影響很大)

Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power.(提出了TridentNet,三個分支對應特定尺度-小中大)

We construct a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields. (TridentNet三個分支特點:共享參數,只有空洞率不同)

Then, we adopt a scale-aware training scheme to specialize each branch by sampling object instances of proper scales for training.(Scale-aware 訓練)

As a bonus, a fast approximation version of TridentNet could achieve significant improvements without any additional parameters and computational cost compared with the vanilla detector.(如何做測試)

On the COCO dataset, our TridentNet with ResNet-101 backbone achieves state-of-the-art single-model results of 48.4 mAP. Codes are available at https://git.io/fj5vR.

 

 

  • Part1 Scale variation的現有方法

本文認爲現有的Scale variation的方法總共可以看成是兩類,一類是 Image Pyramid,一類是 Feature Pyramid。

兩者的優缺點(文中描述)如下:

說實話,這些話着實優點拗口而且比較難理解,我本人的理解如下圖,哪裏有問題大家可以直接評論指出來:

既然上面兩種作者覺得都有問題,那麼作者是怎麼解決的呢?

文章從Receptive field(感受野)入手,提出了自己的解決方案。

 

  • Part2 Receptive field是關鍵

大目標要對應大的感受野,小目標對應小的感受野,這樣卷積下去才能效果好嘛,感覺非常有道理的樣子,那究竟是不是這樣呢?作者設計了一個實驗來驗證自己的想法:

We conduct our pilot experiment using a Faster RCNN [35] detector with the ResNet-C4 backbone on the COCO [27] dataset. The results are reported in the COCOstyle mmAP on all objects and objects of small, medium and large sizes, respectively. We use ResNet-50 and ResNet-101 as the backbone networks and vary the dilation rate ds of the 3×3 convolutions from 1 to 3 for the residual blocks in the conv4 stage.

作者用COCO數據集爲benchmark,Faster-RCNN作爲檢測器,ResNet爲backbone進行測試,在conv4中修改普通卷積爲空洞卷積,空洞率1-3變化,用於改變感受野,空洞率越大感受野也越大,反之亦然。

空洞卷積是改變感受野很好的方法,文中也簡略總結了:

Dilated convolution with a dilation rate ds inserts ds - 1 zeros between consecutive filter values     空洞率ds即卷積核值間間隔ds-1個零。

則 3×3 的空洞卷積  =   3 + 2(ds - 1) 的普通卷積, 如果步長s,卷積層數n,則空洞卷積比普通卷積多增加的感受野爲2(ds - 1)sn。

(這裏我就不細算了,大家自行去補空洞卷積的知識)

結果如下:

效果非常明顯,不同感受野對不同的尺度的物體檢測影響很大。

 

  • Part3 提出TridentNet

Scale-Aware Trident Networks 由 weight sharing trident blocks 和 scale-aware training scheme 兩部分組成。

總體的結構和流程其實就是Figure2所表示的:

  • 3.1 Trident block

We construct TridentNets by replacing some convolution blocks with the proposed trident blocks in the backbone network of a detector.

作者構建TridentNet並不是說重新造了一個網絡,而是用Trident block替換backbone中的一個block,比如說將ResNet的一個跳連結構改成trident block,如下圖所示:

                                              

Trident block特點: 共享網絡參數, 只有空洞率不同

那爲啥要共享參數呢?(或者說共享參數有啥好處呢?)文中給出三點:

1.  It reduces the number of parameters and makes TridentNet need no extra parameters compared with the original detector.
我的理解:三條branches,那就是三倍的網絡參數呀!又費勁又容易過擬合,幸運的是,我們這三條分支除了空洞率之外長得是一模一樣(空洞卷積不會改變參數結構),正好,我們就共享參數了。

2. It also echoes with our motivation that objects of different scales should go through a uniform transformation with the same representational power.  我的理解:作者一直想強調一點就是不同尺度的物體也應該是統一的檢測方法,只是因爲尺度問題採取變感受野而已,本質上他們的檢測方法應該是一致的,而這個參數共享就正好體現了這一想法。

3. A final point is that transformation parameters could be trained on more object samples from all branches.                            我的理解:這個好處彷彿在逗我 ,說是這樣就用了來自所有分支的更多的樣本(然而只要你不用三叉戟,都會用所有的好嗎),不過相對於三條分支都不一樣的話,這的確是個好處。

總結:The same parameters are trained for different scale ranges under different receptive fields. 我們用三條不同感受野的分支去訓練不同尺度目標,得到的一組“大一統”參數,牛逼啊。

 

  • 3.2 Scale-aware Training Scheme

Thus, it is natural to detect objects of different scales on different branches. Here, we propose a scale-aware training scheme to improve the scale awareness of every branch and avoid training objects of extreme scales on mismatched branches.

相應分支訓練相應尺寸的目標,作者設計了Scale-aware training scheme 來實現這種訓練。

We define a valid range [li,ui] for each branch i. 對第i條分支來說,有效範圍[li, ui]

For an Region-of-Interest (RoI) with width w and height h on the input image(before resize), it is valid for branch i when:

文中設置的三個範圍分別是

APs, APm and APl on objects of small (less than 32×32), medium (from 32×32 to 96×96) and large (greater than 96×96) sizes.
 

 

  •  Part4 Inference

During inference, we generate detection results for all branches and then filter out the boxes which fall outside the valid range of each branch. We then use NMS or soft-NMS [3] to combine the detection outputs of multiple branches and obtain the final results.

三個分支全部使用,最後用NMS來帥選出最好的結果。

但是用三個分支着實是太費時間了,作者提出了一種快速近似的方法。

Fast Inference Approximation:

we propose TridentNet Fast, a fast approximation of TridentNet with only one branch during inference. 

也就是說,在inference的時候,只用中間的那條分支,並且作者發現,這種近似並不會降低太多精確度。(具體結果在Part5 的 Fast Inference Approximation)
 

  • Part5 Experiments

作者設計了幾個Ablation Studies去驗證文中所提出的想法究竟有沒有效果,這部分用的都是COCO minival的數據,part 6 用的纔是test-dev。

 

  • 5.1 Components of TridentNet.

首先是對multi-branch architecture, weight sharing design, and scale-aware training scheme這三個設計的驗證。

 這樣一看,果然是三個全用最好。

除了在使用Scale-aware時會導致APl下降,作者說可能是由於scale-aware導致過擬合問題:We conjecture that the scaleaware training design prevents each branch from training objects of extreme scales, but may also bring about the over-fitting problem in each branch caused by reduced effective samples.

但是Weight-sharing有效緩解了過擬合,所以可以看到e比d強一點。With the help of weight-sharing (Table 2(e)), all branches share the same parameters which are fully trained on objects of all scales, thus alleviating the over-fitting issue in scale-aware training (Table 2(d)).)

 

  • Number of branches.

然後是分支數量的實驗,結果如下,可以看到,三條分支是最好的。(注意這裏作者沒用Scale-aware,因爲太麻煩了)

  • Stage of Trident blocks. 

在backbone的哪部分替換爲Trident block最好,可以看到,conv4。

  

  • Number of trident blocks.

在主幹網絡中替換時,比如說在conv4中,替換成多少個trident blocks效果最好:

  •  Performance of each branch.

 每條分支的效果(這裏沒有排出不匹配尺寸,也就是說直接用每一個分支去測試):

這個表就很明顯,branch-1 的 APs 較好, branch-2 的 APm 較好, branch-3 的 APl 較好,3 Branches 就綜合了三個分支的好處。

  • Fast Inference Approximation

這部分解釋瞭如何Trident Fast,從Table5可以看出,Branch-2單獨的效果最好,因此用這部分做Fast inference approximation最好!但是從Table5看的話,Branch-2近似和Baseline也出不多,甚至還要差點,也是醉了。

同時,作者擴展了一下Scale-aware的範圍,原來是,效果是Table 2中的40.6。

這裏作者擴充了Scale-aware範圍,並用Branch-2做Fast預測的效果如下表:

很無語,這居然是(d)的情況最好,話說(d)不就是和沒用scale-aware一樣嗎???不過這個是Fast 的效果,不Fast的話好像還是用了scale-aware好一點(table2)。

作者對這種情況的解釋: We hypothesize this may due to the weight-sharing strategy. Since the weights of the major branch are shared on other branches, training all branches in the scale-agnostic scheme is equivalent to perform within-network multi-scale augmentation.

 

  • Part 6 Comparison with State-of-the-Arts

用COCO test-dev做測試,顯示結果。 

TridentNet* : we apply multi-scale training, soft-NMS, deformable convolutions, large-batch BN, and the 3× training scheme on TridentNet and get TridentNet*.

TridentNet* Fast + Image Pyramid achieves 47.6 AP.

和其他Scale處理手段相比,TridentNet也是最好的。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章