Detecting Text in Natural Image with Connectionist Text Proposal Network

原創

DL-ML

2020-06-21 12:44

Weilin Huang——【ECCV2016】Detecting Text in Natural Image with Connectionist Text Proposal Network

作者和相關鏈接

個人主頁：Zhi Tian，黃偉林，Tong He，Pan He，喬宇
作者簡單信息：

論文下載：論文傳送門
代碼下載：代碼傳送門

幾個關鍵的Idea出發點

文本檢測和一般目標檢測的不同——文本線是一個sequence（字符、字符的一部分、多字符組成的一個sequence），而不是一般目標檢測中只有一個獨立的目標。這既是優勢，也是難點。優勢體現在同一文本線上不同字符可以互相利用上下文，可以用sequence的方法比如RNN來表示。難點體現在要檢測出一個完整的文本線，同一文本線上不同字符可能差異大，距離遠，要作爲一個整體檢測出來難度比單個目標更大——因此，作者認爲預測文本的豎直位置（文本bounding box的上下邊界）比水平位置（文本bounding box的左右邊界）更容易。
Top-down（先檢測文本區域，再找出文本線）的文本檢測方法比傳統的bottom-up的檢測方法（先檢測字符，再串成文本線）更好。自底向上的方法的缺點在於（這點在作者的另一篇文章中說的更清楚），總結起來就是沒有考慮上下文，不夠魯棒，系統需要太多子模塊，太複雜且誤差逐步積累，性能受限。
RNN和CNN的無縫結合可以提高檢測精度。CNN用來提取深度特徵，RNN用來序列的特徵識別（2類），二者無縫結合，用在檢測上性能更好。

方法概括

基本流程如Fig 1，整個檢測分六步：
- 第一，用VGG16的前5個Conv stage（到conv5）得到feature map(W*H*C)
- 第二，在Conv5的feature map的每個位置上取3*3*C的窗口的特徵，這些特徵將用於預測該位置k個anchor（anchor的定義和Faster RCNN類似）對應的類別信息，位置信息。
- 第三，將每一行的所有窗口對應的3*3*C的特徵（W*3*3*C）輸入到RNN（BLSTM）中，得到W*256的輸出
- 第四，將RNN的W*256輸入到512維的fc層
- 第五，fc層特徵輸入到三個分類或者回歸層中。第二個2k scores 表示的是k個anchor的類別信息（是字符或不是字符）。第一個2k vertical coordinate和第三個k side-refinement是用來回歸k個anchor的位置信息。2k vertical coordinate表示的是bounding box的高度和中心的y軸座標（可以決定上下邊界），k個side-refinement表示的bounding box的水平平移量。這邊注意，只用了3個參數表示迴歸的bounding box，因爲這裏默認了每個anchor的width是16，且不再變化（VGG16的conv5的stride是16）。迴歸出來的box如Fig.1中那些紅色的細長矩形，它們的寬度是一定的。
- 第六，用簡單的文本線構造算法，把分類得到的文字的proposal（圖Fig.1（b）中的細長的矩形）合併成文本線

Fig. 1: (a) Architecture of the Connectionist Text Proposal Network (CTPN). We densely slide a 3×3 spatial window through the last convolutional maps (conv5 ) of the VGG16 model [27]. The sequential windows in each row are recurrently connected by a Bi-directional LSTM (BLSTM) [7], where the convolutional feature (3×3×C) of each window is used as input of the 256D BLSTM (including two 128D LSTMs). The RNN layer is connected to a 512D fully-connected layer, followed by the output layer, which jointly predicts text/non-text scores, y-axis coordinates and side-refinement offsets of kanchors. (b) The CTPN outputs sequential fixed-width fine-scale text proposals. Color of each box indicates the text/non-text score. Only the boxes with positive scores are presented.

方法細節

Detecting Text in Fine-scale proposals
- k個anchor尺度和長寬比設置：寬度都是16，k = 10，高度從11~273（每次除於0.7）
- 迴歸的高度和bounding box的中心的y座標如下，帶*的表示是groundTruth，帶a的表示是anchor

- score閾值設置：0.7 （+NMS）
- 一般的RPN和採用本文的方法檢測出的效果對比

Recurrent Connectionist Text Proposals
- RNN類型：BLSTM（雙向LSTM），每個LSTM有128個隱含層
- RNN輸入：每個滑動窗口的3*3*C的特徵（可以拉成一列），同一行的窗口的特徵形成一個序列
- RNN輸出：每個窗口對應256維特徵
- 使用RNN和不適用RNN的效果對比，CTPN是本文的方法（Connectionist Text Proposal Network）

Side-refinement
- 文本線構造算法（多個細長的proposal合併成一條文本線）
  - 主要思想：每兩個相近的proposal組成一個pair，合併不同的pair直到無法再合併爲止（沒有公共元素）
  - 判斷兩個proposal，Bi和Bj組成pair的條件：
    1. Bj->Bi，且Bi->Bj。（Bj->Bi表示Bj是Bi的最好鄰居）
    2. Bj->Bi條件1：Bj是Bi的鄰居中距離Bi最近的，且該距離小於50個像素
    3. Bj->Bi條件2：Bj和Bi的vertical overlap大於0.7
- 固定要regression的box的寬度和水平位置會導致predict的box的水平位置不準確，所以作者引入了side-refinement，用於水平位置的regression。where x_side is the predicted x-coordinate of the nearest horizontal side (e.g., left or right side) to current anchor. x^∗ side is the ground truth (GT) side coordinate in x-axis, which is pre-computed from the GT bounding box and anchor location. c^a_xis the center of anchor in x-axis. wa is the width of anchor, which is fixed, w_a = 16

- 使用side-refinement的效果對比

實驗結果

時間：0.14s with GPU
ICDAR2011，ICDAR2013，ICDAR2015庫上檢測結果

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Detecting Text in Natural Image with Connectionist Text Proposal Network

Weilin Huang——【ECCV2016】Detecting Text in Natural Image with Connectionist Text Proposal Network

目錄

作者和相關鏈接

幾個關鍵的Idea出發點

方法概括

基本流程如Fig 1，整個檢測分六步：

方法細節

Detecting Text in Fine-scale proposals

Recurrent Connectionist Text Proposals

Side-refinement

實驗結果

Python 實現通過 RNN實現二進制的乘法

Detecting Text in Natural Image with Connectionist Text Proposal Network

Opencv4.12 Opencv-contrib4.12 for Android 編譯注意事項

Python 中@符號解釋和 *args **kwards

有用的網址(GMM TCN)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Detecting Text in Natural Image with Connectionist Text Proposal Network

Weilin Huang——【ECCV2016】Detecting Text in Natural Image with Connectionist Text Proposal Network

目錄

作者和相關鏈接

幾個關鍵的Idea出發點

方法概括

基本流程如Fig 1， 整個檢測分六步：

方法細節

Detecting Text in Fine-scale proposals

Recurrent Connectionist Text Proposals

Side-refinement

實驗結果

基本流程如Fig 1，整個檢測分六步：