

結合這周看的論文,我對這周研究的Histogram of oriented gradients(HOG)談談自己的理解:

HOG descriptors 是應用在計算機視覺和圖像處理領域,用於目標檢測的特徵描述器。這項技術是用來計算局部圖像梯度的方向信息的統計值。這種方法跟邊緣方向直方圖(edge orientation histograms)、尺度不變特徵變換(scale-invariant feature transform descriptors)以及形狀上下文方法( shape contexts)有很多相似之處,但與它們的不同點是:HOG描述器是在一個網格密集的大小統一的細胞單元(dense grid of uniformly spaced cells)上計算,而且爲了提高性能,還採用了重疊的局部對比度歸一化(overlapping local contrast normalization)技術。

這篇文章的作者Navneet Dalal和Bill Triggs是法國國家計算機技術和控制研究所French National Institute for Research in Computer Science and Control (INRIA)的研究員。他們在這篇文章中首次提出了HOG方法。這篇文章被髮表在2005年的CVPR上。他們主要是將這種方法應用在靜態圖像中的行人檢測上,但在後來,他們也將其應用在電影和視頻中的行人檢測,以及靜態圖像中的車輛和常見動物的檢測。

HOG描述器最重要的思想是:在一副圖像中,局部目標的表象和形狀(appearance and shape)能夠被梯度或邊緣的方向密度分佈很好地描述。具體的實現方法是:首先將圖像分成小的連通區域,我們把它叫細胞單元。然後採集細胞單元中各像素點的梯度的或邊緣的方向直方圖。最後把這些直方圖組合起來就可以構成特徵描述器。爲了提高性能,我們還可以把這些局部直方圖在圖像的更大的範圍內(我們把它叫區間或block)進行對比度歸一化(contrast-normalized),所採用的方法是:先計算各直方圖在這個區間(block)中的密度,然後根據這個密度對區間中的各個細胞單元做歸一化。通過這個歸一化後,能對光照變化和陰影獲得更好的效果。

與其他的特徵描述方法相比,HOG描述器後很多優點。首先,由於HOG方法是在圖像的局部細胞單元上操作,所以它對圖像幾何的(geometric)和光學的(photometric)形變都能保持很好的不變性,這兩種形變只會出現在更大的空間領域上。其次,作者通過實驗發現,在粗的空域抽樣(coarse spatial sampling)、精細的方向抽樣(fine orientation sampling)以及較強的局部光學歸一化(strong local photometric normalization)等條件下,只要行人大體上能夠保持直立的姿勢,就容許行人有一些細微的肢體動作,這些細微的動作可以被忽略而不影響檢測效果。綜上所述,HOG方法是特別適合於做圖像中的行人檢測的。


上圖是作者做的行人檢測試驗,其中(a)表示所有訓練圖像集的平均梯度(average gradient across their training images);(b)和(c)分別表示:圖像中每一個區間(block)上的最大最大正、負SVM權值;(d)表示一副測試圖像;(e)計算完R-HOG後的測試圖像;(f)和(g)分別表示被正、負SVM權值加權後的R-HOG圖像。


色彩和伽馬歸一化(color and gamma normalization)


梯度的計算(Gradient computation)

最常用的方法是:簡單地使用一個一維的離散微分模板(1-D centered, point discrete derivative mask)在一個方向上或者同時在水平和垂直兩個方向上對圖像進行處理,更確切地說,這個方法需要使用下面的濾波器核濾除圖像中的色彩或變化劇烈的數據(color or intensity data)


作者也嘗試了其他一些更復雜的模板,如3×3 Sobel 模板,或對角線模板(diagonal masks),但是在這個行人檢測的實驗中,這些複雜模板的表現都較差,所以作者的結論是:模板越簡單,效果反而越好。作者也嘗試了在使用微分模板前加入一個高斯平滑濾波,但是這個高斯平滑濾波的加入使得檢測效果更差,原因是:許多有用的圖像信息是來自變化劇烈的邊緣,而在計算梯度之前加入高斯濾波會把這些邊緣濾除掉。

構建方向的直方圖(creating the orientation histograms)

第三步就是爲圖像的每個細胞單元構建梯度方向直方圖。細胞單元中的每一個像素點都爲某個基於方向的直方圖通道(orientation-based histogram channel)投票。投票是採取加權投票(weighted voting)的方式,即每一票都是帶權值的,這個權值是根據該像素點的梯度幅度計算出來。可以採用幅值本身或者它的函數來表示這個權值,實際測試表明:使用幅值來表示權值能獲得最佳的效果,當然,也可以選擇幅值的函數來表示,比如幅值的平方根(square root)、幅值的平方(square of the gradient magnitude)、幅值的截斷形式(clipped version of the magnitude)等。細胞單元可以是矩形的(rectangular),也可以是星形的(radial)。直方圖通道是平均分佈在0-1800(無向)或0-3600(有向)範圍內。作者發現,採用無向的梯度和9個直方圖通道,能在行人檢測試驗中取得最佳的效果。

把細胞單元組合成大的區間(grouping the cells together into larger blocks)

由於局部光照的變化(variations of illumination)以及前景-背景對比度(foreground-background contrast)的變化,使得梯度強度(gradient strengths)的變化範圍非常大。這就需要對梯度強度做歸一化,作者採取的辦法是:把各個細胞單元組合成大的、空間上連通的區間(blocks)。這樣以來,HOG描述器就變成了由各區間所有細胞單元的直方圖成分所組成的一個向量。這些區間是互有重疊的,這就意味着:每一個細胞單元的輸出都多次作用於最終的描述器。區間有兩個主要的幾何形狀——矩形區間(R-HOG)和環形區間(C-HOG)。R-HOG區間大體上是一些方形的格子,它可以有三個參數來表徵:每個區間中細胞單元的數目、每個細胞單元中像素點的數目、每個細胞的直方圖通道數目。作者通過實驗表明,行人檢測的最佳參數設置是:3×3細胞/區間、6×6像素/細胞、9個直方圖通道。作者還發現,在對直方圖做處理之前,給每個區間(block)加一個高斯空域窗口(Gaussian spatial window)是非常必要的,因爲這樣可以降低邊緣的周圍像素點(pixels around the edge)的權重。

R-HOG跟SIFT描述器看起來很相似,但他們的不同之處是:R-HOG是在單一尺度下、密集的網格內、沒有對方向排序的情況下被計算出來(are computed in dense grids at some single scale without orientation alignment);而SIFT描述器是在多尺度下、稀疏的圖像關鍵點上、對方向排序的情況下被計算出來(are computed at sparse, scale-invariant key image points and are rotated to align orientation)。補充一點,R-HOG是各區間被組合起來用於對空域信息進行編碼(are used in conjunction to encode spatial form information),而SIFT的各描述器是單獨使用的(are used singly)。


作者發現C-HOG的這兩種形式都能取得相同的效果。C-HOG區間(blocks)可以用四個參數來表徵:角度盒子的個數(number of angular bins)、半徑盒子個數(number of radial bins)、中心盒子的半徑(radius of the center bin)、半徑的伸展因子(expansion factor for the radius)。通過實驗,對於行人檢測,最佳的參數設置爲:4個角度盒子、2個半徑盒子、中心盒子半徑爲4個像素、伸展因子爲2。前面提到過,對於R-HOG,中間加一個高斯空域窗口是非常有必要的,但對於C-HOG,這顯得沒有必要。C-HOG看起來很像基於形狀上下文(Shape Contexts)的方法,但不同之處是:C-HOG的區間中包含的細胞單元有多個方向通道(orientation channels),而基於形狀上下文的方法僅僅只用到了一個單一的邊緣存在數(edge presence count)。

區間歸一化(Block normalization)

作者採用了四中不同的方法對區間進行歸一化,並對結果進行了比較。引入v表示一個還沒有被歸一化的向量,它包含了給定區間(block)的所有直方圖信息。| | vk | |表示v的k階範數,這裏的k去1、2。用e表示一個很小的常數。這時,歸一化因子可以表示如下:




還有第四種歸一化方式:L2-Hys,它可以通過先進行L2-norm,對結果進行截短(clipping),然後再重新歸一化得到。作者發現:採用L2-Hys, L2-norm, 和 L1-sqrt方式所取得的效果是一樣的,L1-norm稍微表現出一點點不可靠性。但是對於沒有被歸一化的數據來說,這四種方法都表現出來顯著的改進。

SVM分類器(SVM classifier)


OpenCV2.0提供了行人檢測的例子,用的是法國人Navneet Dalal最早在CVPR2005會議上提出的方法。
1、VC 2008 Express下安裝OpenCV2.0–可以直接使用2.1,不用使用CMake進行編譯了,避免編譯出錯
在DOS界面,進入如下路徑: C:\OpenCV2.0\samples\c  peopledetect.exe filename.jpg
 創建一個控制檯程序,從C:\OpenCV2.0\samples\c下將peopledetect.cpp加入到工程中;按步驟1的方法進行設置。編譯成功,但是在DEBUG模式下生成的EXE文件運行出錯,很奇怪 :roll:
1) getDefaultPeopleDetector() 獲得3780維檢測算子(105 blocks with 4 histograms each and 9 bins per histogram there are 3,780 values)–(爲什麼是105blocks?)
2).cv::HOGDescriptor hog; 創建類的對象 一系列變量初始化  
winSize(64,128), blockSize(16,16), blockStride(8,8),
cellSize(8,8), nbins(9), derivAperture(1), winSigma(-1),
histogramNormType(L2Hys), L2HysThreshold(0.2), gammaCorrection(true)
3). 調用函數:detectMultiScale(img, found, 0, cv::Size(8,8), cv::Size(24,16), 1.05, 2); 
  參數分別爲待檢圖像、返回結果列表、門檻值hitThreshold、窗口步長winStride、圖像padding margin、比例係數、門檻值groupThreshold;通過修改參數發現,就所用的某圖片,參數0改爲0.01就檢測不到,改爲0.001可以;1.05改爲1.1就不行,1.06可以;2改爲1可以,0.8以下不行,(24,16)改成(0,0)也可以,(32,32)也行
(1) 得到層數 levels 
某圖片(530,402)爲例,lg(402/128)/lg1.05=23.4 則得到層數爲24
 (2) 循環levels次,每次執行內容如下
HOGThreadData& tdata = threadData[getThreadNum()];
Mat smallerImg(sz, img.type(), tdata.smallerImgBuf.data);
detect(smallerImg, tdata.locations, hitThreshold, winStride, padding);
(b)創建類的對象 HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride); 在創建過程中,首先初始化 HOGCache::init,包括:計算梯度 descriptor->computeGradient、得到塊的個數105、每塊參數個數36 
    (c)獲得窗口個數nwindows,以第一層爲例,其窗口數爲(530+32*2-64)/8+1、(402+32*2-128)/8+1 =67*43=2881,其中(32,32)爲winStride參數,也可用(24,16)
在105個塊中執行循環,每個塊內容爲:通過getblock函數計算HOG特徵並歸一化,36個數分別與算子中對應數進行相應運算;判斷105個塊的總和 s >= hitThreshold 則認爲檢測到目標 
文獻NavneetDalalThesis.pdf 78頁圖5.5描述了The complete object detection algorithm.
For each scale Si = [Ss, SsSr, … , Sn]
(a) Rescale the input image using bilinear interpolation
(b) Extract features (Fig. 4.12) and densely scan the scaled image with stride Ns for object/non-object detections
(c) Push all detections with t(wi) > c to a list
Non-maximum suppression
(a) Represent each detection in 3-D position and scale space yi
(b) Using (5.9), compute the uncertainty matrices Hi for each point
(c) Compute the mean shift vector (5.7) iteratively for each point in the list until it converges to a mode
(d) The list of all of the modes gives the final fused detections
(e) For each mode compute the bounding box from the final centre point and scale


4. Histogram of Oriented Gradients Based Encoding of Images
Default Detector.
As a yardstick for the purpose of comparison, throughout this section we compare results to our
default detector which has the following properties: input image in RGB colour space (without
any gamma correction); image gradient computed by applying [?1, 0, 1] filter along x- and yaxis
with no smoothing; linear gradient voting into 9 orientation bins in 0_–180_; 16×16 pixel
blocks containing 2×2 cells of 8×8 pixel; Gaussian block windowing with _ = 8 pixel; L2-Hys
(Lowe-style clipped L2 norm) block normalisation; blocks spaced with a stride of 8 pixels (hence
4-fold coverage of each cell); 64×128 detection window; and linear SVM classifier. We often
quote the performance at 10?4 false positives per window (FPPW) – the maximum false positive
rate that we consider to be useful for a real detector given that 103–104 windows are tested for
each image.
4.3.2 Gradient Computation
The simple [?1, 0, 1] masks give the best performance.
4.3.3 Spatial / Orientation Binning
Each pixel contributes a weighted vote for orientation based on the orientation of the gradient element centred on it.
The votes are accumulated into orientation bins over local spatial regions that we call cells.
To reduce aliasing, votes are interpolated trilinearly between the neighbouring bin centres in both orientation and position.
Details of the trilinear interpolation voting procedure are presented in Appendix D.
The vote is a function of the gradient magnitude at the pixel, either the magnitude itself, its square, its
square root, or a clipped form of the magnitude representing soft presence/absence of an edge at the pixel. In practice, using the magnitude itself gives the best results.
4.3.4 Block Normalisation Schemes and Descriptor Overlap
good normalisation is critical and including overlap significantly improves the performance.
Figure 4.4(d) shows that L2-Hys, L2-norm and L1-sqrt all perform equally well for the person detector.
such as cars and motorbikes, L1-sqrt gives the best results.
4.3.5 Descriptor Blocks
For human detection, 3×3 cell blocks of 6×6 pixel cells perform best with 10.4% miss-rate
at 10?4 FPPW. Our standard 2×2 cell blocks of 8×8 cells are a close second.
We find 2×2 and 3×3 cell blocks work best.
4.3.6 Detector Window and Context
Our 64×128 detection window includes about 16 pixels of margin around the person on all four
4.3.7 Classifier
By default we use a soft (C=0.01) linear SVM trained with SVMLight [Joachims 1999].We modified
SVMLight to reduce memory usage for problems with large dense descriptor vectors.
5. Multi-Scale Object Localisation
the detector scans the image with a detection window at all positions and scales, running the classifier in each window and fusing multiple overlapping detections to yield the final object detections.
We represent detections using kernel density estimation (KDE) in 3-D position and scale space. KDE is a data-driven process where continuous densities are evaluated by applying a smoothing kernel to observed data points. The bandwidth of the smoothing kernel defines the local neighbourhood. The detection scores are incorporated by weighting the observed detection points by their score values while computing the density estimate. Thus KDE naturally incorporates the first two criteria. The overlap criterion follows from the fact that detections at very different scales or positions are far off in 3-D position and scale space, and are thus not smoothed together. The modes (maxima) of the density estimate correspond to the positions and scales of final detections.
Let xi = [xi, yi] and s0i denote the detection position and scale, respectively, for the i-th detection.
the detections are represented in 3-D space as y = [x, y, s], where s = log(s’).
the variable bandwidth mean shift vector is defined as (5.7)

For each of the n point the mean shift based iterative procedure is guaranteed to converge to a mode2.
Detection Uncertainty Matrix Hi.
One key input to the above mode detection algorithm is the amount of uncertainty Hi to be associated with each point. We assume isosymmetric covariances, i.e. the Hi’s are diagonal matrices.
Let diag [H] represent the 3 diagonal elements of H. We use scale dependent covariance
matrices such that diag
[Hi] = [(exp(si)_x)2, (exp(si)_y)2, (_s)2] (5.9)
where _x, _y and _s are user supplied smoothing values.

The term t(wi) provides the weight for each detection. For linear SVMs we usually use threshold = 0.
the smoothing parameters _x, _y,and _s used in the non-maximum suppression stage. These parameters can have a significant impact on performance so proper evaluation is necessary. For all of the results here, unless otherwise noted, a scale ratio of 1.05, a stride of 8 pixels, and _x = 8, _y = 16, _s = log(1.3) are used as default values.
A scale ratio of 1.01 gives the best performance, but significantly slows the overall process.
Scale smoothing of log(1.3)–log(1.6) gives good performance for most object classes.
We group these mode candidates using a proximity measure. The final location is the ode corresponding to the highest density.
附錄 A. INRIA Static Person Data Set
The (centred and normalised) positive windows are supplied by the user, and the initial set of negatives is created once and for all by randomly sampling negative images.A preliminary classifier is thus trained using these. Second, the preliminary detector is used to exhaustively scan the negative training images for hard examples (false positives). The classifier is then re-trained using this augmented training set (user supplied positives, initial negatives and hard examples) to produce the final detector.
INRIA Static Person Data Set
As images of people are highly variable, to learn an effective classifier, the positive training examples need to be properly normalized and centered to minimize the variance among them. For this we manually annotated all upright people in the original images.
The image regions belonging to the annotations were cropped and rescaled to 64×128 pixel image windows. On average the subjects height is 96 pixels in these normalised windows to allow for an approximately16 pixel margin on each side. In practise we leave a further 16 pixel margin around each side of the image window to ensure that flow and gradients can be computed without boundary effects. The margins were added by appropriately expanding the annotations on each side before cropping the image regions.


關於INRIA Person Dataset的更多介紹,見以下鏈接
Original Images
            Folders ‘Train’ and ‘Test’ correspond, respectively, to original training and test images. Both folders have three sub folders: (a) ‘pos’ (positive training or test images), (b) ‘neg’ (negative training or test images), and (c) ‘annotations’ (annotation files for positive images in Pascal Challenge format).
Normalized Images
        Folders ‘train_64x128_H96’ and ‘test_64x128_H96’ correspond to normalized dataset as used in above referenced paper. Both folders have two sub folders: (a) ‘pos’ (normalized positive training or test images centered on the person with their left-right reflections), (b) ‘neg’ (containing original negative training or test images). Note images in folder ‘train/pos’ are of 96x160 pixels (a margin of 16 pixels around each side), and images in folder ‘test/pos’ are of 70x134 pixels (a margin of 3 pixels around each side). This has been done to avoid boundary conditions (thus to avoid any particular bias in the classifier). In both folders, use the centered 64x128 pixels window for original detection task.
Negative windows
        To generate negative training windows from normalized images, a fixed set of 12180 windows (10 windows per negative image) are sampled randomly from 1218 negative training photos providing the initial negative training set. For each detector and parameter combination, a preliminary detector is trained and all negative training images are searched exhaustively (over a scale-space pyramid) for false positives (`hard examples’). All examples with score greater than zero are considered hard examples. The method is then re-trained using this augmented set (initial 12180 + hard examples) to produce the final detector. The set of hard examples is subsampled if necessary, so that the descriptors of the final training set fit into 1.7 GB of RAM for SVM training.


原作者對 OpenCV2.0 peopledetect 進行了2次更新

#include “cvaux.h”
#include “highgui.h”
#include <stdio.h>
#include <string.h>
#include <ctype.h>
using namespace cv;
using namespace std;
int main(int argc, char** argv)
Mat img;
FILE* f = 0;
char _filename[1024];
if( argc == 1 )
printf(“Usage: peopledetect (<image_filename> | <image_list>.txt)\n”);
return 0;
img = imread(argv[1]);
if( img.data )
strcpy(_filename, argv[1]);
f = fopen(argv[1], “rt”);
fprintf( stderr, “ERROR: the specified file could not be loaded\n”);
return -1;
HOGDescriptor hog;
char* filename = _filename;
if(!fgets(filename, (int)sizeof(_filename)-2, f))
//while(*filename && isspace(*filename))
// ++filename;
if(filename[0] == ‘#’)
int l = strlen(filename);
while(l > 0 && isspace(filename[l-1]))
filename[l] = ‘\0’;
img = imread(filename);
printf(“%s:\n”, filename);
vector<Rect> found, found_filtered;
double t = (double)getTickCount();
// run the detector with default parameters. to get a higher hit-rate
// (and more false alarms, respectively), decrease the hitThreshold and
// groupThreshold (set groupThreshold to 0 to turn off the grouping completely).
int can = img.channels();
hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);
t = (double)getTickCount() - t;
printf(“tdetection time = %gms\n”, t*1000./cv::getTickFrequency());
size_t i, j;
for( i = 0; i < found.size(); i++ )
Rect r = found[i];
for( j = 0; j < found.size(); j++ )
if( j != i && (r & found[j]) == r)
if( j == found.size() )
for( i = 0; i < found_filtered.size(); i++ )
Rect r = found_filtered[i];
// the HOG detector returns slightly larger rectangles than the real objects.
// so we slightly shrink the rectangles to get a nicer output.
r.x += cvRound(r.width*0.1);
r.width = cvRound(r.width*0.1);
r.y += cvRound(r.height*0.07);
r.height = cvRound(r.height*0.1);
rectangle(img, r.tl(), r.br(), cv::Scalar(0,255,0), 3);
imshow(“people detector”, img);
int c = waitKey(0) & 255;
if( c == ‘q’ || c == ‘Q’ || !f)
return 0;


將需要批量檢測的圖片,構造一個TXT文本,文件名爲filename.txt, 其內容如下

然後在DOS界面輸入 peopledetect filename.txt , 即可自動檢測每個圖片。

//////////////////////////////////////////////////////////////////——————————Navneet Dalal的OLT工作流程描述

Navneet Dalal在以下網站提供了INRIA Object Detection and Localization Toolkit
Wilson Suryajaya Leoputra提供了它的windows版本
需要 Copy all the dll’s (boost_1.34.1*.dll, blitz_0.9.dll, opencv*.dll) into “<ROOT_PROJECT_DIR>/debug/”
Navneet Dalal提供了linux下的可執行程序,借別人的linux系統,運行一下,先把總體流程瞭解了。
1.下載 INRIA person detection database 解壓到OLTbinaries\;把其中的’train_64x128_H96’ 重命名爲 ‘train’ ; ‘test_64x128_H96’ 重命名爲 ‘test’.
2.在linux下運行 ‘runall.sh’ script.
等待結果出來後,打開matlab 運行 plotdet.m 可繪製 DET曲線;
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 –wtscale 2 –maxvalue 0.2 – epsilon 1 –fullcirc 0 -v 3 –proc rgb_sqrt –norm l2hys -s 1 train/pos.lst  HOG/train_pos.RHOG
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 –wtscale 2 –maxvalue 0.2 – epsilon 1 –fullcirc 0 -v 3 –proc rgb_sqrt –norm l2hys -s 10 train/neg.lst HOG/train_neg.RHOG
./bin//dump4svmlearn -p HOG/train_pos.RHOG -n HOG/train_neg.RHOG HOG/train_BiSVMLight.blt -v
4.創建 model file: HOG/model_4BiSVMLight.alt
./bin//svm_learn -j 3 -B 1 -z c -v 1 -t 0 HOG/train_BiSVMLight.blt HOG/model_4BiSVMLight.alt
mkdir -p HOG/hard
./bin//classify_rhog train/neg.lst HOG/hard/list.txt HOG/model_4BiSVMLight.alt -d HOG/hard/hard_neg.txt -c HOG/hard/hist.txt -m 0 -t 0 –no_nonmax 1 –avsize 0 –margin 0 –scaleratio 1.2 -l N -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 –wtscale 2 –maxvalue 0.2 –
epsilon 1 –fullcirc 0 -v 3 –proc rgb_sqrt –norm l2hys
false +/- 分類結果會寫入 HOG/hard/hard_neg.txt
7. 將hard加入到neg,再次計算RHOG特徵
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 –wtscale 2 –maxvalue 0.2 – epsilon 1 –fullcirc 0 -v 3 –proc rgb_sqrt –norm l2hys -s 0 HOG/hard/hard_neg.txt OG/train_hard_neg.RHOG –poscases 2416 –negcases 12180 –dumphard 1 –hardscore 0 – memorylimit 1700
./bin//dump4svmlearn -p HOG/train_pos.RHOG -n HOG/train_neg.RHOG -n HOG/train_hard_neg.RHOG HOG/train_BiSVMLight.blt -v 4
./bin//svm_learn -j 3 -B 1 -z c -v 1 -t 0 HOG/train_BiSVMLight.blt HOG/model_4BiSVMLight.alt
Opencv中用到的3780 個值,應該就在這個模型裏面model_4BiSVMLight.alt,不過它的格式未知,無法直接讀取,但是可以研究svm_learn程序是如何生成它的;此外,該模型由程序classify_rhog調用,研究它如何調用,估計是一個解析此格式的思路
mkdir -p HOG/WindowTest_Negative
./bin//classify_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 –wtscale 2 –maxvalue 0.2 –epsilon 1 –fullcirc 0 -v 3 –proc rgb_sqrt –norm l2hys -p 1 –no_nonmax 1 –nopyramid 0 - -scaleratio 1.2 -t 0 -m 0 –avsize 0 –margin 0 test/neg.lst HOG/WindowTest_Negative/list.txt HOG/model_4BiSVMLight.alt -c HOG/WindowTest_Negative/histogram.txt
mkdir -p HOG/WindowTest_Positive
./bin//classify_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 –wtscale 2 –maxvalue 0.2 – epsilon 1 –fullcirc 0 -v 3 –proc rgb_sqrt –norm l2hys -p 1 –no_nonmax 1 –nopyramid 1 -t 0 -m 0 –avsize 0 –margin 0 test/pos.lst HOG/WindowTest_Positive/list.txt  HOG/model_4BiSVMLight.alt -c HOG/WindowTest_Positive/histogram.txt



對比INRIAPerson\INRIAPerson\Train\pos(原始圖片),INRIAPerson\train_64x128_H96\pos(生成樣本)可以發現,作者從原始圖片裁剪出一些站立的人,要求該人不被遮擋,然後對剪裁的圖片left-right reflect。以第一張圖片爲例crop001001,它剪裁了2個不被遮擋的人,再加上原照片,共3張,再加左右鏡像,總共6張。
 可以利用Acdsee軟件,Tools/open in editor,進去後到Resize選項; tools/rotate還可實現left-right reflect


4. 製作pos.lst列表
  進入dos界面,定位到需要製作列表的圖片文件夾下,輸入 dir /b> pos.lst,即可生成文件列表;


#include “cv.h”

#include “highgui.h”
#include “cvaux.h”

int main(int argc,char * argv[])
IplImage* src ;
IplImage* dst = 0;

CvSize dst_size;

FILE* f = 0;
char _filename[1024];
int l;

f = fopen(argv[1], “rt”);
fprintf( stderr, “ERROR: the specified file could not be loaded\n”);
return -1;

char* filename = _filename;
if(!fgets(filename, (int)sizeof(_filename)-2, f))
if(filename[0] == ‘#’)
l = strlen(filename);
while(l > 0 && isspace(filename[l-1]))
filename[l] = ‘\0’;

dst_size.width = 96;
dst_size.height = 160;
char* filename2 = _filename;char* filename3 = _filename; filename3=”_96x160.jpg”;
strncat(filename2, filename,l-4);
strcat(filename2, filename3);

cvSaveImage(filename2, dst);


cvReleaseImage( &src );
cvReleaseImage( &dst );

return 0;


