Scene Text Detection(場景文本檢測)論文思路總結

原創

2020-06-16 07:27

任意角度的場景文本檢測
論文思路總結
共同點：重新添加分支的創新更突出
場景文本檢測
基於分割的檢測方法
spcnet(mask_rcnn+tcm+rescore)
psenet(漸進擴展)
mask text spottor(新加分割分支)
craft
incepText

基於迴歸的檢測方法：
r2cnn(類別分支，水平分支，傾斜分支)
rrpn(旋轉rpn)
textbox(ssd)
textbox++
sstd(tcm改進前身)
rtn
ctpn(微分)

基於分割和迴歸的混合方法：
spcnet
利用mask_rcnn來進行實例分割，通過新模塊tcm（獲取全局語義分割圖）以及rescore來提升準確率，實例分割映射在全局語義分割打分
pixel-anchor(deeplabv3+ssd):
分割的部分檢測中大目標，ssd檢測小目標
east(deeplabv3)
af-rpn
位於文本核心區域中的每個滑動點，直接預測從它到文本邊框頂點的偏移量
(採用ohem)

FPN官方給的訓練時候是前面共享參數的，對結果影響不大，說是特徵金字塔使得不同層學到了相同層次的語義特徵
FPN在得到多層金字塔模塊的proposals結果之後，放到一塊做nms處理
FPN每層金字塔模塊的scale都是一樣的，因爲對應到不同的feature map上面剛好檢測不同大小的物體

***********************論文名字後邊括號內容爲亮點部分********************

hybrid:---------------------------------------------------------------
1.af-rpn(af)
anchor-free
直接預測中心點到box的四個頂點偏移量，
避免了這種情況（to achieve high recall, anchors use various scales and shapes should be designed to cover the scale and shape variabilities of objects ）
scale-friendly
FPN對大中小三種尺度的目標分開檢測（實現細節與fpn有不同）

2.inceptext(inceptext)
整體就是 fpn+inception_module+deformable_conv+deformable PSROI pooling
inception-text
設計類似inception中(1*1，3*3，5*5)三種卷積覈對大中小三種不同尺度的目標進行檢測，
也加入deformable卷積來調整感受野,把檢測聚集到文字上面，不容易受方向限制；還有 two fused feature maps 增加多尺度信息。
deformable psroi pooling
(把檢測聚集到文字上面，不容易受方向限制)
加入offset集中檢測文字部分的信息，tend to learn the context surrounding the text
Each image is randomly cropped and scaled to have short edge of{640,800,960,1120}.
The anchor scales are {2,4,8,16}, and ratios are {0.2,0.5,2,5}.

3.rtn(無亮點)
一個多尺度的特徵，加上ctpn豎直框，加上只有迴歸的預測
hierarchical convolutional
獲得更強的語義特徵，融合了resnet的模塊4和模塊5
vertical proposal mechanism
用ctpn獲取豎直框，目的是去掉proposal的分類

regression:---------------------------------------------------------------
1.ctpn
detecting text in ﬁne-scale proposals
generate vertical proposals
recurrent connectionist text proposals
連接vertical proposals
side-reﬁnement
針對左右邊界的anchors預測文本行的邊界進行調整
2.textboxs
採用ssd來做std(multi-scale)
3.textboxs++
可以借鑑數據增強的方式 random crop
4.r2cnn(inclined box)
three ROIPoolings use different pooled sizes
anchor scales(4,8,16,32)
axis-aligned 和 inclined box一起預測且是包含關係
incline NMS
compute convolutional feature maps on an image pyramid(非主要)
augment ICDAR 2015
We rotate our image at the following angles (-90, -75, -60, -45, -30, -15, 0, 15, 30, 45, 60, 75, 90).
借鑑r2cnn的 ablation experiment
5.rrpn
rrpn
r-anchors(54,3*3*6),generate inclined proposals(representation,x,y,h,w,θ)
RROI pooling
skew NMS
image rotation strategy during data augmentation

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Scene Text Detection(場景文本檢測)論文思路總結

如何使用 JS 判斷用戶是否處於活躍狀態

Mono 支持LoongArch架構

lightdb秒級增加列和刪除列（not null帶默認值）

lightdb數據庫超時相關控制參數

通過HPA+CronHPA組合應對業務複雜彈性伸縮場景

❤️‍🔥 Solon Cloud Event 新的事務特性與應用

網絡爬蟲的祕密：如何高效地抓取JD.com視頻鏈接

lightdb mysql 8.0兼容之不可見主鍵

使用 JS 實現在瀏覽器控制檯打印圖片 console.image()

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（四）使用域名訪問網站應用

Scene Text Detection(場景文本檢測)論文思路總結

mack pro 安裝anaconda+pytorch+torchvision+jupyter

opencv圖像處理常用小程序

在CentOS下安裝Python3

SSD+Tensorflow

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結