顯著性目標檢測之Selectivity or Invariance: Boundary-Aware Salient Object Detection

Selectivity or Invariance: Boundary-Aware Salient Object Detection


image.png

原始文檔: https://www.yuque.com/lart/papers/banet

主要貢獻

  1. 對於顯著性目標檢測任務進一步明確其主要的兩個難點,是一個對於變與不變都有需求的問題.
  2. 針對變與不變,提出了一種分而治之的模型,三支路各自實現不同的任務,三者相互補充.
  3. 提出了一種新穎的ASPP的替代結構.不同擴張率的分支的特徵逐個傳遞,實現了豐富的多尺度上下文信息的提取.

針對問題

這篇文章還是從顯著性目標邊界的角度入手,着重解決兩個問題:

  1. First, the interiors of a large salient object may have large appearance change, making it difficult to detect the salient object as a whole.
  2. Second, the boundaries of salient objects may be very weak so that they cannot be distinguished from the surrounding background regions.

這實際上也就是所謂的 selectivity–invariance dilemma(困境) (我們需要一組特徵,它們能夠選擇性地響應圖片中的重要部分,而對圖片中不重要部分的變化保持不變性)。顯着對象的不同區域(內部與邊界)對SOD模型提出了不同的要求,而這種困境實際上阻止了具有各種大小,外觀和上下文的顯着對象的完美分割。

In the interiors, the features extracted by a SOD model should be invariant to various appearance changes such as size, color and texture. Such invariant features ensure that the salient object can pop-out as a whole. However, the features at boundaries should be sufficiently selective at the same time so that the minor difference between salient objects and background regions can be well distinguished.

主要方法

image.png

圖中的特徵提取網絡使用的是ResNet50,但是對於第四個和第五個卷積塊的步長設置爲1,使其不改變分辨率,同時爲了擴大感受野,在這兩個結構裏又使用了擴張率分別爲2和4的擴張卷積。最終該網絡僅會下采樣輸入的1/8。

這篇文章提出的應對方法就是adopt different feature extraction strategies at object interiors and boundaries,在分析文章https://blog.csdn.net/c9yv2cf9i06k2a9e/article/details/99687783中指出,這裏與BASNet的策略有些相似,只是這裏使用了分支網絡來實現邊界的增強,而BASNet使用了損失函數來處理。

這裏在特徵提取網絡的基礎上構建了三個分支,來進行可選擇性(selective)和不變性(invariance)特徵的提取,同時修正邊界與內部過渡區域的誤判情況。分別稱爲:

  1. boundary localization stream:the boundary localization stream is a simple subnetwork that aims to extract selective features for detecting the boundaries of salient objects
  2. interior perception stream:the interior perception stream emphasizes the feature invariance in detecting the salient objects
  3. transition compensation stream:a transition compensation stream is adopted to amend the probable failures that may occur in the transitional regions between interiors and boundaries, where the feature requirement gradually changes from invariance to selectivity

同時也提出了integrated successive dilation module來增強interior perception和transition compensation兩個信息流,可以獲得豐富的上下文信息產生,以使其可以對於各樣的視覺模式都能提取不變的特徵,並引入來自低級特徵的跳躍連接以促進邊界的選擇性表示。該模塊結構如下:

image.png

The ISD module with N parallel branches with skip connections is denoted as ISD-N, and we show the structureof ISD-5 in Fig.5 as an example.
從實現的角度來看,五個支路是需要有先後順序的,得從左往右開構建。

  • The first layer of each branch is a convolutional layer with 1×1 kernels that is used for channel compression.
  • The second layer of each branch adopts dilated convolution, in which the dilation rates start from 1 in the first branch and double in the subsequent branch.
    • 通過不同分支之間的跳躍連接,the feature map from the first branch of the second layer is also encoded in the feature maps of subsequent branches, which actually gets processed by successive dilation rates.
  • After that, the third and the forth layers adopt 1×1 kernels to integrate feature maps formed under various dilation rates.

In practice, we use ISD-5 in the interior perception stream and ISD-3 in the transition compensation streams.

這裏損失函數包含幾部分:

  1. 邊界交叉熵損失E,這裏的GB表示the boundary map of salient objects:
    image.png

  2. 內部交叉熵損失E:
    image.png

  3. 最後的交叉熵損失E:
    image.png

這裏第三個損失中,Sig(M)表示最終的預測結果,這裏的M是對三路信息流特徵的整合,這裏的整合方式沒有使用常規的元素級加法或者拼接, 因爲相對而言效果不好. 作者自行設計了一種方法:

image.png
image.png
image.png

這裏的三個ϕ\phi表示B(邊界定位)/I(內部感知)/T(過渡補償)三個分支輸出的單通道特徵圖.
最終訓練用的損失是一個三者的組合:

image.png

實驗細節

  • The training images are not done with any special treatment except the horizontal flipping.
  • The training process takes about 15 hours and converges after 200k iterations with mini-batch of size 1.
  • During testing, the proposed network removes all the losses, and each image is directly fed into the network to obtain its saliency map without any pre-processing.
  • The proposed method runs at about 13 fps with about 400 × 300 resolution on our computer with a 3.60GHz CPU and a GTX 1080ti GPU.

image.png

image.png

參考鏈接

發佈了153 篇原創文章 · 獲贊 72 · 訪問量 11萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章