【IoU Loss】《UnitBox: An Advanced Object Detection Network》

原創

2020-06-28 04:40

ACM MM-2016（Proceedings of the 24th ACM international conference on Multimedia）

文章目錄

4 Experiments

5 Conclusion（own）

1 Background and Motivation

基於 CNN 的 object detection 方法，在各種應用中大顯身手，當前的方法基本都遵循着如下的 pipeline

提取 region proposals，eg，Selective Search, EdgeBoxes
用 CNN 對 region proposals 進行 recognition and categorization
bounding box regression methods 來精修 localization

遵循這樣的 pipeline，object detector 往往由於 region proposals methods 的effectiveness（僅利用 low-level feature 來產生，質量往往不行，sensitive to the local appearance changes）和 efficiency（多，密集—慢）而表現不佳

爲了克服上述困難，

faster rcnn 用 RPN 網絡來提速，但由於 ratio 和 scale 是 pre-designed and fixed，很難處理 large shape variations 和 small objects
DenseBox 直接回歸 pixel 與 gt 的四個邊界距離，然後用 l2 loss！如下圖

缺點是，孤立的來優化 four-side distances. It goes against the intuition that those variables are correlated and should be regressed jointly.

2 Advantages / Contributions

提出 IoU loss，

faster training convergence
enabled with variable-scale training
best performance among all published methods on the FDDB benchmark

3 Method

任意的 pixel $(i,j)$ ，GT 可以定義如下爲 4-d 的向量

$\widetilde{x}_{i,j} = (\widetilde{x}_{t_{i,j}},\widetilde{x}_{b_{i,j}},\widetilde{x}_{l_{i,j}},\widetilde{x}_{r_{i,j}})$

如圖1 所示， $t，b，l，r$ 分別上下左右， $\widetilde{x}_{t_{i,j}},\widetilde{x}_{b_{i,j}},\widetilde{x}_{l_{i,j}},\widetilde{x}_{r_{i,j}}$ 表示當前像素到GT的上下左右四個邊界的距離！

3.1 L2 Loss Layer

被用在 DenseBox 中

有兩個缺點

bbox 是用四個獨立的變量，沒有大局觀和整體性（優化的時候，可能一兩個變量優化的完美，其它的不太行，整體效果可能也不行），有的 bbox 和 GT loss 很小，但定位的很不準，例如下面這種情況

如果當前像素在兩個矩形的中間，一個矩形是 GT，一個矩形是預測的 bbox，那麼 loss 爲 0
unnormalized，沒有歸一化，相同 IoU 情況下，大 bbox 定位的不準確帶來的損失懲罰可能比小 bbox 的多（絕對大小——像素角度、相對大小 IoU 角度）

3.2 IoU Loss Layer: Forward

這就是給的圖一 IoU loss 的細節版，

$\widetilde{x} \neq 0$ 很關鍵，只統計落在 GT 範圍內的 pixel

$I$ 是交集， $U$ 是並集，最後 IoU Loss 爲交併比的負對數，

負對數的函數曲線如下所示，

import numpy as np
import matplotlib.pyplot as plt 

def log2x(x):
    return -np.math.log2(x)

x = np.arange(0.01,1,0.01)
y = [log2x(i) for i in x]

# gca = get current axis
ax = plt.gca() # x,y

# spines = 上下左右四條黑線
ax.spines['right'].set_color('none') # 讓右邊的黑線消失
ax.spines['top'].set_color('none')  # 讓上邊的黑線消失

ax.xaxis.set_ticks_position('bottom') # 把下面的黑線設置爲x軸
ax.yaxis.set_ticks_position('left')   #  把左邊的黑線設置爲y軸

ax.spines['bottom'].set_position(('data',0)) # 移動x軸到指定位置，本例子爲0
ax.spines['left'].set_position(('data',0))   # 移動y軸到指定位置，本例子爲0

    
plt.plot(x,y)
plt.show()

【python】matplotlib（上）

IoU越大，loss越小，重合的話，loss爲 0，

優點，IoU Loss 把 bbox 當成一個整體，IoU本身就屬於 [0,1] 之間，自帶歸一化性質

3.3 IoU Loss Layer: Backward

配合算法1 的公式，我們來看看 IoU Loss 的反向傳播

$X = (x_t +x_b) * (x_l + x_r)$
$I = I_h*I_w=[min(x_t,\widetilde{x}_t)+min(x_b,\widetilde{x}_b)]*[min(x_l,\widetilde{x}_l)+min(x_r,\widetilde{x}_r)]$
$U = X+\widetilde{X}-I$ ， $IoU = \frac{I}{U}$ ， $L=-ln(IoU)$

可以看到，反向傳播時候， $\bigtriangledown_xX$ 是懲罰預測 bbox 的面積的，面積越大，梯度越大，要更新的越多，說明錯誤的越多，反過來，面積越小，梯度越小！ $\bigtriangledown_xI$ 是懲罰重疊區域的，重疊的越多，梯度越小，重疊越少，梯度越大！從反向傳播可以看出，bbox 面積越小越好，重疊區域越大越好！，極限情況下重疊，上面的公式是等於 0 的

3.4 UnitBox Network

兩個分支，每個 pixel 有對應的4個座標，和對應的 score

網絡模仿 VGG，有三個輸入

原圖
confidence heatmap：GT 覆蓋範圍內外 positive 和 negative！與原圖大小一樣，二值mask，positive 區域應該是 FDDB 標籤中的橢圓區域
bounding box heatmaps：positive 區域中，與 GT 上下左右的距離

預測的時候，confidence heatmap 接的是 sigmoid activation function

4 Experiments

4.1 Datasets

FDDB

http://vis-www.cs.umass.edu/fddb/samples/

這個數據集是用橢圓來標註人臉的，以橢圓的中心爲中心，生成 bbox

4.2 Effectiveness of IoU Loss

VGG 初始化，WideFace 數據集 fine-tune

Convergence

可以看到 IoU loss 比 L2 loss 收斂的更快、更穩，miss rate 也更低
FP-recall Curves
Scale Variation

把測試圖片 resize 60-960 pixel 不等，比較 IoU loss 與 L2 loss 的 scale variation，可以看出，IoU 很強

4.3 Performance of UnitBox

ROC 曲線可以看出，領先還是挺明顯的

5 Conclusion（own）

從反向傳播的角度來解釋 Loss 的作用，很不錯喲
記住 FDDB 人臉數據集的標籤是橢圓，有章子怡和鞏俐
bbox 是用四個獨立的變量來 learning 的缺點，沒有大局觀和整體性（優化的時候，可能一兩個變量優化的完美，其它的不太行，整體效果可能也不行）
順便再回憶一下 P-R 曲線，ROC 曲線

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【IoU Loss】《UnitBox: An Advanced Object Detection Network》

文章目錄

1 Background and Motivation

2 Advantages / Contributions

3 Method

3.1 L2 Loss Layer

3.2 IoU Loss Layer: Forward

3.3 IoU Loss Layer: Backward

3.4 UnitBox Network

4 Experiments

4.1 Datasets

4.2 Effectiveness of IoU Loss

4.3 Performance of UnitBox

5 Conclusion（own）

探究職業發展的關鍵：能力模型解讀

高效率使用windows

智能決策新時代：可視化大屏是否能夠超越傳統白板？

解密Prompt系列28. LLM Agent之金融領域摸索：FinMem & FinAgent

分享幾個.NET開源的AI和LLM相關項目框架

【python】Stack / Queue

【python】Single / Single Cycle / Double Link List

【MoCo】《Momentum Contrast for Unsupervised Visual Representation Learning》

【python】Sort and Search

【Distilling】《Learning Efficient Object Detection Models with Knowledge Distillation》

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結