Centerline Segmentation: 中心線分割的主要目的是,找到文本的中心線。主要方法是將文本的中心線附近的像素預測爲1,其餘像素預測爲0(也就是非文本區域)。爲了解決中心線區域像素與非文本像素個數不均衡的問題,本文參考了[40],採用**online hard example mining(OHEM)**方法。
端到端 vs. 非端到端:圖6中可看出,端到端訓練可以提升非顯著文本的檢測率。 RoISlide vs. RoIRotate:表2和3和圖6(c,d)中可看出,RoIRotate[29]不適合彎曲文本檢測,RoISlide和RoIRotate對於常規文本有着相似的效果。 Spotting with vs. without LSTM:基於CNN的文本識別器比LSTM快4倍。
參考文獻
列出博文中引用原文的部分文獻
[32] Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, and Cong Yao. Textsnake: A flexible rep- resentation for detecting text of arbitrary shapes. In Euro- pean Conference on Computer Vision (ECCV), pages 19–35. Springer, 2018.
[31] Yuliang Liu, Lianwen Jin, Shuaitao Zhang, and Sheng Zhang. Detecting curve text in the wild: New dataset and new solution. In arXiv preprint arXiv:1712.02170, 2017.
[46] Xiaobing Wang, Yingying Jiang, Zhenbo Luo, Cheng-Lin Liu, Hyunsoo Choi, and Sungjin Kim. Arbitrary shape scene text detection with adaptive text region representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[39] Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. Robust scene text recognition with auto- matic rectification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4168–4176, 2016.
[28] Wei Liu, Chaofeng Chen, Kwan-Yee K Wong, Zhizhong Su, and Junyu Han. Star-net: A spatial attention residue network for scene text recognition. In BMVC, volume 2, page 7, 2016.
[5] ZhanzhanCheng,XuyangLiu,FanBai,YiNiu,ShiliangPu, and Shuigeng Zhou. Arbitrarily-oriented text recognition. In arXiv preprint arXiv:1711.04226, 2017.
[25] Hui Li, Peng Wang, and Chunhua Shen. Towards end-to- end text spotting with convolutional recurrent neural net- works. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 5238–5246, 2017.
[29] Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, and Junjie Yan. Fots: Fast oriented text spotting with a unified network. In Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 5676– 5685, 2018.
[35] Yash Patel, Michal Busˇta, and Jiri Matas. E2e-mlt - an unconstrained end-to-end method for multi-language scene text. In arXiv preprint arXiv:1801.09919, 2018.
[40] Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. Training region-based object detectors with online hard ex- ample mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), page 761–769, 2016.
[36] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with re- gion proposal networks. In Advances in Neural Information Processing Systems (NIPS), pages 91–99, 2015.
[45] Tao Wang, David J Wu, Adam Coates, and Andrew Y Ng. End-to-end text recognition with convolutional neural net- works. In Proceedings of the International Conference on Pattern Recognition (ICPR), pages 3304–3308. IEEE, 2012.
[46] Xiaobing Wang, Yingying Jiang, Zhenbo Luo, Cheng-Lin Liu, Hyunsoo Choi, and Sungjin Kim. Arbitrary shape scene text detection with adaptive text region representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[9] Alex Graves, Santiago Ferna ́ndez, Faustino Gomez, and Ju ̈rgen Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the International Conference on Machine Learning (ICML), pages 369–376. ACM, 2006.
[11] Kaiming He, Georgia Gkioxari, Piotr Dolla ́r, and Ross Gir- shick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2980–2988. IEEE, 2017.
[33] Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In European Conference on Computer Vision (ECCV), September 2018.