Character Filter

轉載自：http://blog.csdn.net/PeaceInMind/article/details/50003319

導語

在上一章節中我們介紹了怎麼在一幅圖片中提取潛在的字符(character proposals)。一般情況下基本上都會發生兩種不想要的情況。第一種就是有些字符沒提取出來，稱之爲false negative，這個可以通過多通道（如梯度幅值或者其他顏色通道）提取MSER來減輕。另外一種是提取出來的字符有很多不是真的字符，稱之爲false positive，這個需要一些過濾算法來過濾.這一節主要關注第二點，怎麼去過濾false positive，這其中主要會介紹文獻[2]RST的MSER Tree過濾，文獻[3]的EST特徵分類過濾和CNN過濾，CNN在論文中用的不多，不過個人做了一點實驗，順便也對比下,其中疏漏與錯誤，也請批評與指正。

1 MSER Tree過濾

MSER tree過濾的核心思想是字不能包含字。MSER就是一個個的字符，“包含”代表的就是父子關係。因此它的意思就是如果一個MSER節點是字，那麼它的子節點、孫節點、曾孫節點裏面都不會是字，同理它的父節點，祖父節點也都不會字。那麼我們就可以過濾掉很多不是字的MSER。那我們怎麼去判斷一個MSER到底是不是字呢，在這裏，論文並沒有用特徵分類器的方法，而是用了一個取巧的方法。我們不需要去判斷一個MSER到底是不是字，只要判斷一棵樹裏面哪些最像字。而像不像用的是MSER長寬比（aspect ratio）矯正後的variation進行評價，參見下面的公式，矯正後的variation越小，是字的可能性越大，其中的參數如下，論文沒有提怎麼得到，應該是多次試驗得到或者搜索得到。

因此接下來我們就要去分析哪些最像字，去提取"disconnectedMSER"。這裏並不是想象中那麼非常簡單，只要比較父節點和子節點，其實還是比叫孫節點，曾孫節點，祖父節點，曾祖父節點點。另外一顆樹中也不一定只有一個字，也有多個"disconnectedMSER"的可能，比如在一個父節點下有多個子節點，而父節點的variation較大，我們就選了下面的幾個子節點。一個可能的例子是一個大框裏面有多個字符。因此這裏面會涉及一些算法的問題，主要有兩個算法，LinearReduction和Tree Accumulation，兩者順序執行。

LinearReduction主要是應對子節點只有一個的MSER。如果是當前節點更像MSER，我們就可以把它的子節點去掉，把孫節點直接鏈接到當前節點，反之類似。如果一個節點有多個子節點，我們就對這幾個子節點做Linear Reduction,經過這一步處理後，MSER節點的子節點數目就不會等於1.僞代碼如下

個人寫的OPENCV代碼如下（由於更好的演示，修改了一些Contour的一些成員名）

[cpp] view plain copy print?

CvContour* LinearReduction(CvContour* root)
{
switch (root->childNum)
{
case 0:
{
return root;
break;
}
case 1:
{
CvContour* c = LinearReduction((CvContour*)(root->v_next));
if (c->variation < root->variation)
{
return c;
}
else
{
//link c's children to root
CvSeq* cc = c->v_next;
root->v_next = cc;
while (cc != NULL)
{
cc->v_prev = (CvSeq*)root;
cc = cc->h_next;
}
return root;
}
break;
}
default:
{
CvSeq* c = root->v_next;
vector<CvContour*> children;
while (c != NULL)
{
CvContour* tmp = LinearReduction((CvContour*)c);
children.push_back(tmp);
tmp->v_prev = (CvSeq*)root;// reset parents;
c = c->h_next;
}
root->v_next = (CvSeq*)(children[0]);
for (size_t i = 0; i < children.size() - 1; ++i)//reset prev and next
{
children[i]->h_next = (CvSeq*)(children[i + 1]);
children[i + 1]->h_prev = (CvSeq*)(children[i]);
}
return root;
break;
}
}
}

      CvContour* LinearReduction(CvContour* root)
    {
        switch (root->childNum)
        {
        case 0:
            {
                return root;
                break;
            }
        case 1:
            {
                CvContour* c = LinearReduction((CvContour*)(root->v_next));
                if (c->variation  < root->variation)
                {
                    return c;
                }
                else
                {
                    //link c's children to root
                    CvSeq* cc = c->v_next;
                    root->v_next = cc;
                    while (cc != NULL)
                    {
                        cc->v_prev = (CvSeq*)root;
                        cc = cc->h_next;
                    }
                    return root;
                }
                break;
            }


        default:
            {
                CvSeq* c = root->v_next;
                vector<CvContour*> children;
                while (c != NULL)
                {
                    CvContour* tmp = LinearReduction((CvContour*)c);
                    children.push_back(tmp);
                    tmp->v_prev = (CvSeq*)root;// reset parents;
                    c = c->h_next;
                }


                root->v_next = (CvSeq*)(children[0]);
                for (size_t i = 0; i < children.size() - 1; ++i)//reset prev and next
                {
                    children[i]->h_next = (CvSeq*)(children[i + 1]);
                    children[i + 1]->h_prev = (CvSeq*)(children[i]);
                }


                return  root;
                break;
            }
        }
    }

Tree Accumulation是用於子節點大於1的情況下，經過LinearReduction後，節點的子節點數目要麼是0。要麼就大於1.對於子節點數目大於2的，我們去找到子樹裏面像字的，然後看子樹裏有沒有比當前節點更像字的節點，如果有，則保存子樹中像字的，否則，就保存當前節點。執行完後我們就獲得了disconnected MSER，僞代碼和c++的代碼如下,c++代碼爲了演示方便，修改了contour的一些成員名。

[cpp] view plain copy print?

vector<CvContour*> RobustSceneText::TreeAccumulation(CvContour* root)
{
//need to recalcu childNum due to linear reduction
vector<CvContour*> vTmp;
vTmp.push_back(root);
CalcChildNum(vTmp);
assert(root->childNum!=1);
vector<CvContour*> result;
if ( root->childNum >= 2 )
{
CvContour* c = (CvContour*)(root->v_next);
while ( c != NULL )
{
vector<CvContour*> tmp;
tmp = TreeAccumulation(c);
result.insert( result.end(), tmp.begin(), tmp.end());
c = (CvContour*)c->h_next;
}
for (size_t i = 0; i < result.size(); ++i)
{
if ( std::abs(result[i]->variation) < std::abs( root->variation) )
{
return result;
}
}
result.clear();
result.push_back(root);
return result;
}
else
{
result.push_back(root);
return result;
}
}

vector<CvContour*> RobustSceneText::TreeAccumulation(CvContour* root)
    {
        //need to recalcu childNum due to linear reduction
        vector<CvContour*> vTmp;
        vTmp.push_back(root);
        CalcChildNum(vTmp);
        assert(root->childNum!=1);
        vector<CvContour*> result;
        if ( root->childNum >= 2 )
        {
            CvContour* c = (CvContour*)(root->v_next);
            while ( c != NULL )
            {
                vector<CvContour*> tmp;
                tmp = TreeAccumulation(c);
                result.insert( result.end(), tmp.begin(), tmp.end());
                c = (CvContour*)c->h_next;
            }
            for (size_t i = 0; i < result.size(); ++i)
            {
                if ( std::abs(result[i]->variation) < std::abs( root->variation) )
                {
                    return result;
                }
            }
            result.clear();
            result.push_back(root);
            return result;
        }
        else
        {
            result.push_back(root);
            return result;
        }
    }

下圖展示了對比圖，MSER是依照文獻[3]在HSV裏H和V兩通道中提取

2 特徵分類過濾

特徵分類過濾主要是利用人工設計的一些特徵，比如說Stroke width, Stroke variance, Aspectratio, hull ratio等等並送到分類器中進行分類，我們就知道哪些是字符，哪些不是字符。但是請注意由於特徵設計和分類器的不同，會導致判別的錯誤。在這裏我們主要是介紹stroke相關的知識，主要是文獻[1]裏的SWT（stroke widthtransform）和文獻[3]的stroke supportpixels(SPPs).

Stroke width稱之爲筆畫寬度，最早見於文獻[1],並且作者申請了專利，一般算法很難申請專利，所以可見其獨到之處。當我們在用中性筆在寫字的時候，一撇一劃的筆畫寬度一般都固定在一定的範圍之類，與你筆芯的滾珠有關係，比如下圖的'h',它的筆畫寬度大約在2左右。

文獻[1]中的計算算法比較複雜，本人也實現了c++的版本，但今天介紹一種更簡單的近似解法，方便講解，利用opencv也能很快實現。具體的計算方法如下

（1）第一步首先根據MSER或者連通域等構建出二值圖像，比如背景是0，前景是255

（2）接着對所有前景像素計算它與離其最近的0點的距離，這裏的參數與文獻[3]保持一致，得到distance map

（3）接着提取出字符的骨架，我這裏用的是Guo_Huo_Thinner算法,得到skeleton

（4）計算骨架上像素的distance的均值，就得到了我們的stroke width.過濾的時候還可以用strokewidth variance，這幅圖上計算出來Stroke均值和方差分別爲2.27和0.52

個人覺得stroke的特徵還是非常好用的，能區分很多的字與非字。文獻[3]stroke均值和方差特徵進行了改變，提出了strokesupport pixels(SSPs)和Strke Area ratio. SSPs跟上面的步驟的不同之處是不再找字符的骨架，而是利用第二步中的ditance map圖找局部最大點，如下圖的紅色點（論文中局部是3*3的），我們把這些點稱之爲SSPs。

SSPs一般都在stroke的中間位置。最後通過這些點去估計整個字符的stroke Area ratio，計算公式如下，Ni是3*3局部區域內SSPs的的個數。當你的筆畫很工整時，一般這個值會接近1（一個特殊的情況是當stroke width爲1的時候這個值遠超1，按照論文的公式strokeAreaRatio在0.88左右。

按照文獻[3]訓練的分類器的精度不是特高，在85%-90%左右，可能是本人訓練的原因或是程序bug.

3 CNN字符過濾

現在的論文用CNN過濾的不太很多。CNN用起來比較簡單，不需要手工去設計特徵，但是對硬件要求比較高，如果說硬件比較挫，速度可能就跟不上。但是CNN有一個好處是它能根據你的數據集提從更高層面上區分字和非字符。以英文爲例，我可以設計出很多不是字，但是按照人工特徵很難區分的圖形，比如下圖

但是CNN會一定程度上會判斷這個圖像是不是像26個字母中的一個，因此能過濾更多的非字符。個人做了一些實驗，如下圖，請注意這裏採用了了梯度幅值通道，並且首先利用了MSER tree進行過濾。但是這樣一來，它不能適用於所有文字。不過個人感覺這種情況現實中不太多。

至此，這一小節就已講完，錯誤與疏漏，懇請批評和指正。

上一博客文字檢測與識別1-MSER

下一博客文字檢測與識別3-字符合並

[1]Epshtein B, Ofek E, WexlerY. Detecting text in natural scenes with stroke width transform[C]//ComputerVision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010:2963-2970.

[2]Yin X C, Yin X, Huang K, etal. Robust text detection in natural scene images[J]. Pattern Analysis andMachine Intelligence, IEEE Transactions on, 2014, 36(5): 970-983.

[3]Neumann L, Matas J.Efficient Scene Text Localization and Recognition with Local CharacterRefinement[J]. arXiv preprint arXiv:1504.03522, 2015.

[4]Zhu Y, Yao C, Bai X. Scenetext detection and recognition: Recent advances and future trends[J]. Frontiersof Computer Science, 2015.

[5]Zhang Z, Shen W, Yao C, etal. Symmetry-Based Text Line Detection in Natural Scenes[C]//Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition. 2015: 2558-2567.

[6]Huang W, Qiao Y,Tang X. Robust scene text detection with convolution neural network inducedmser trees[M]//Computer Vision–ECCV 2014.Springer International Publishing, 2014: 497-511.

[7]Sun L, Huo Q, Jia W, et al.Robust Text Detection in Natural Scene Images by Generalized Color-EnhancedContrasting Extremal Region and Neural Networks[C]//Pattern Recognition (ICPR),2014 22nd International Conference on. IEEE, 2014: 2715-2720.

[8]Jaderberg M, Simonyan K,Vedaldi A, et al. Reading text in the wild with convolutional neuralnetworks[J]. International Journal of Computer Vision, 2014: 1-20.

[9]Jaderberg M,Vedaldi A, Zisserman A. Deep features for text spotting[M]//Computer Vision–ECCV 2014. Springer International Publishing, 2014: 512-528.

[10]Gomez L, Karatzas D. A fasthierarchical method for multi-script and arbitrary oriented scene textextraction[J]. arXiv preprint arXiv:1407.7504, 2014.

[11]Coates A, Carpenter B, CaseC, et al. Text detection and character recognition in scene images withunsupervised feature learning[C]//Document Analysis and Recognition (ICDAR),2011 International Conference on. IEEE, 2011: 440-445.

[12]Neumann L, Matas J.Real-time scene text localization and recognition[C]//Computer Vision andPattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012: 3538-3545.

[13]Shi B, Yao C, Zhang C, etal. Automatic Script Identification in the Wild[J]. arXiv preprintarXiv:1505.02982, 2015.

[14]Wang T, Wu D J, Coates A,et al. End-to-end text recognition with convolutional neuralnetworks[C]//Pattern Recognition (ICPR), 2012 21st International Conference on.IEEE, 2012: 3304-3308.

關於遊戲付費的一點想法

我通過CKA和CKS啦！

LibSVM庫的使用說明

Tesseract-OCR學習系列（四）API

樸素貝葉斯算法的理解與實現

Character Filter

過分割和beam search

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結