基於深度學習的目標檢測方法:R-CNN

原創

迷上微笑

2020-02-20 22:53

R-CNN

目標檢測方面比較全面的資料：https://handong1587.github.io/deep_learning/2015/10/09/object-detection.html

論文：Rich feature hierarchies for accurate object detection and semantic segmentation

該方法主要包括以下流程：候選框的選取、CNN、SVM分類器、修正目標框。

--------------------------------------------------------------------------------------------------------------------------------------------------------

候選框的選取

目前有很多選取候選框的方法，如：objectness, selective search，category-independent object proposals, constrainedparametric min-cuts (CPMC), multi-scale combinatorial grouping 等方法，在該論文中採用的是selective search。

採用selective search方法在一幅圖像中產生2k個region proposals，且它們的大小不統一。然而，在CNN結構中需要輸入固定大小的圖像。因此，在利用CNN提取region proposal的特徵時，需要對其進行處理，使其大小爲227*227。處理方法有：

(A) the original object proposal at its actual scale relative to the transformed CNN inputs; (B) tightest square with context; (C) tightest square without context; (D) warp. Within each column and example proposal, the top row corresponds top= 0 pixels of context padding while the bottom row has p= 16 pixels of context padding .

--------------------------------------------------------------------------------------------------------------------------------------------------------

CNN

在論文中利用CNN提取object proposals的特徵。由於樣本比較少，在訓練CNN時採用微調的方法，也就是先用ILSVRC2012數據集預訓練CNN，再用object proposals來進一步訓練CNN。採用如下方法構建樣本：如果object proposal與ground-truth box的IOU大於0.5，就將該object proposal標定爲正樣本，否則將其標定爲負樣本。這樣做的原因在於CNN需要大量的樣本，如果標定條件比較嚴格的話，那麼用於CNN的樣本將會很少。

--------------------------------------------------------------------------------------------------------------------------------------------------------

SVM分類器

在該論文中利用SVM分類器來進行分類，得到不同類別的評分結果。在訓練SVM分類器時，由於SVM分類器需要的樣本比較少，可以採用如下的方法構建樣本：將ground-truth boxes標定爲正樣本；如果object proposal與ground-truth box的IOU小於0.3，將其標定爲負樣本；如果object proposal與ground-truth box的IOU大於0.3，但不是ground-truth box，就將其捨棄。

在訓練SVM分類器的時候，採用了standard hard negative mining method技巧。關於該方法，從知乎中看到的解釋是：一般來說，訓練一個SVM分類器，需要正負兩個樣本，比如，對於person detection，一幅圖片中作爲正樣本的人物樣本很少，但是隨機產生用於訓練的負樣本（非人物樣本）可能遠遠大於正樣本，這樣訓練出來的SVM效果並不好，所以利用Hard negative mining方法，從負樣本中選取出一些有代表性的負樣本，使得分類器的訓練結果更好。

在測試時，對於得到的結果，需要進行非極大抑制(Nms)，也就是，對於每種類別，如果一個區域與另一個區域的IoU大於一定值，且該區域的得分小於另一個區域的得分，則將區域去除，這樣做的目的是消除多餘的目標框，找到最佳的物體檢測位置。

--------------------------------------------------------------------------------------------------------------------------------------------------------

修正目標框（bounding-box regression）

目標檢測的誤差也來源於目標框的位置不準確，所以在該論文中，採用線性迴歸模型來修正目標框，並用到了region proposal的pool5 feature。該回歸模型的輸入是 { (Pi, Gi) }，其中，i = 1, 2, ... , N, P = (Px, Py, Pw, Ph)，G = (Gx, Gy, Gw, Gh)，其目標是將proposed box (P) 轉換爲ground-truth box (G), 轉換方法爲：