C++实现DPM/LatentSVM 完整代码下载 --- 第四篇

    这篇文章的目的是解释一下FastDPM的工作流程。。。


    有些人对我公布的FastDPM代码(见其它几篇博文)有兴趣,想读这个代码,所以给我发邮件询问工作原理的,我曾单独给了邮件回答,这里把其中一个往来邮件贴在这里。

    话说这个貌似是个老外,用英语写的邮件。。。

1.来自Maxwell的询问邮件:

<span style="font-family:Arial;font-size:18px;">Hi yuxianguo,

My name is Maxwell i appreciate a lot the tool that you implemented
"Fast DPM" Now i'm working in person identification and the latentsvm
algorithm is one of the stuff that i should deal, i don't want your
source code but if you can give some orientation in terms of workflow
or specifications to implement my own latentSVM in C/C++

I'm student in Computer science: Computer Vision

Cordially
Maxwell
skype: ***********</span>

2.我的回复:

<span style="font-family:Arial;font-size:18px;">Hi Maxwell,

Well, it has been a long time since I finished my Fast-DPM. Now I can only tell you something I still have in my memory. Maybe right, or wrong.
The DPM works well especially for pedestrian detection. It uses a root template and several part templates to detect whole objet and object parts simultaneously. Cues from root response and parts response are aggregated to give final proposals. The workflow of detection algorithm is like this:
(1) Features: It calculates a feature pyramid of image.
(2) Filtering: It slides all templates (root & parts) in the feature pyramid to get the response of every template at every position. This is the most time-consuming part, takes more than 90% of total time.
(3) Integration: It integrates root response and part responses using two rules -- the deformation rule and the structural rule. Though the grammar model makes it difficult to understand this. It's acturally very simple. The structural rule say that if there is an object at (x,y), then there should be a part at (x+delta_x,y+delta_y). So we add root response at (x,y) and the corresponding part responses. In DPM training, it is assumed that object parts might be better detected at the 2 times resolution of  object. Then the deformation rule say that if part should be at  (x+delta_x,y+delta_y), then we'd better try neighboring positions of  (x+delta_x,y+delta_y) because the object may deform.
(4) NMS: after integration, we get a final score map, each score represents the likelihood of a object be located at that position. In practice, we got a pyrmid of scores to search for different scales. The non-maximum-suppression is used to select proposals in the score maps.
(5) It should be noticed that to capture different poses of objects, the model training stage splits samples into several sub-categories, each representing one pose. For each sub-category, a DPM model is trained then. Thus we have several DPMs in a model file, each is called a component model. In the detection procedure, all component models are used to find object proposals separately. And for every position, its object proposal (score) is selected as the maximum across all components.

--
YU</span>

    关于DPM或者我的实现代码,其工作流程大概就如邮件中所述,多看几遍其实会越来越觉得很简单。DPM的精华在于训练时用了LatentSVM来挖掘最佳的part-representation,其检测部分没太多花哨的。

     顺便所依据,有些网友给我发的邮件看起来像QQ聊天,没有擡头也没有署名,也不分段落,看起来很不友好。不会写邮件的可以向上面那位MaxWell同学学习一下,擡头至少要有的。

     没了...

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章