博客:http://blog.csdn.net/qianxin_dh
《Struck:Structured Output Tracking with Kernels》是 Sam Hare, Amir Saffari, Philip H. S. Torr等人於2011年發表在Computer Vision (ICCV)上的一篇文章。Struck與傳統跟蹤算法的不同之處在於:傳統跟蹤算法(下圖右手邊)將跟蹤問題轉化爲一個分類問題,並通過在線學習技術更新目標模型。然而,爲了達到更新的目的,通常需要將一些預估計的目標位置作爲已知類別的訓練樣本,這些分類樣本並不一定與實際目標一致,因此難以實現最佳的分類效果。而Struck算法(下圖左手邊)主要提出一種基於結構輸出預測的自適應視覺目標跟蹤的框架,通過明確引入輸出空間滿足跟蹤功能,能夠避免中間分類環節,直接輸出跟蹤結果。同時,爲了保證實時性,該算法還引入了閾值機制,防止跟蹤過程中支持向量的過增長。
最後,大牛作者們也提供了c++版源代碼,有興趣的朋友可以下載下來,體驗下該算法的強大。代碼下載地址:https://github.com/gnebehay/STRUCK,代碼調試需要Opencv2.1以上版本以及Eigen v2.0.15。
在論文理解以及代碼調試的過程中,主要參考了以下資料,感到獲益匪淺,列舉如下:
1) Eigen的安裝及學習
2) 支持向量機通俗導論(理解SVM的三重境界)
http://blog.csdn.net/v_july_v/article/details/7624837
第二部分
目前在目標檢測任務中,由於svm自身具有較好的推廣能力以及對分類的魯棒性,得到了越來越多的應用。 Struck算法便使用了在線結構輸出svm學習方法去解決跟蹤問題。不同於常規算法訓練一個分類器,Struck算法直接通過預測函數:,來預測每幀之間目標位置發生的變化,其中表示搜尋空間,例如,,上一幀中目標的新位置爲Pt-1,則在當前幀中,目標位置就爲(可見其實就是表示幀間目標位置變化關係的集合)。因此,在Struck算法中,已知類型的樣本用(x,y)表示,而不再是(x,+1)或者(x,-1)了。
那麼y預測函數怎麼獲得呢?這就需要用到結構輸出svm方法了(svm基本概念學習可參考我上篇文章中給出的svm三重境界的鏈接),它在該算法中引入了一個判別函數,通過公式找到概率最大的對目標位置進行預測,也就是說,因爲我們還不知道當前幀的目標位置,那麼我們首先想到在上一幀中的目標能夠通過一些位置變化關係,出現在當前幀中的各處,但是呢,實際的目標只有一個,所以這些變換關係中也必然只有一個是最佳的,因此,我們需要找到這個最佳的,並通過,就可以成功找到目標啦~,至於搜尋空間如何取,在程序解讀時大家就會看到了。
那麼如何找到呢?我個人理解是:將判別函數形式轉換爲:,其中,表示映射函數,是從輸入空間到某個特徵空間的映射,進而實現對樣本線性可分。因爲當分類平面(輸入空間中的超平面)離數據點的“間隔”越大,分類的確信度越大,所以需讓所選擇的分類平面最大化這個“間隔”值,這裏我們通過最小化凸目標函數來實現,該函數應滿足條件:和,其中,,(表示兩個框之間的覆蓋率)。優化的目的是確保F(目標)>>F(非目標)。
接下來問題又來了,如何獲得最小的w??文中採取的求解方式是利用拉格朗日對偶性,通過求解與原問題等價的對偶問題(dual problem),得到原始問題的最優解。通過給每一個約束條件加上一個拉格朗日乘子alpha,定義拉格朗日函數L(w,b,alpha)。一般對偶問題的求解過程如下:1)固定alpha,求L關於w,b的最小化。2)求L對alpha的極大。3)利用SMO算法求得拉格朗日乘子alpha。爲了簡化對偶問題求解,這裏定義了參數beta,可見論文中的Eq.(8)。
算法主要流程:
1. 首先讀入config.txt,初始化程序參數,這一過程主要由Config類實現;
2. 判斷是否使用攝像頭進行跟蹤,如使用攝像頭進行跟蹤,則initBB=(120,80,80,80);
若使用視頻序列進行跟蹤,initBB由相應txt文件給出;
3. 將讀入的每幀圖像統一爲320*240。
4. 由當前第一幀以及框initBB,實現對跟蹤算法的初始化。
4.1 Initialise(frame,bb)
由於我們之前獲取的initBB的座標定義爲float型,在這裏首先將其轉換爲int型。
程序中選取haar特徵,gaussian核函數, 初始化參數m_needsIntegralImage=true,m_needsIntegralHist=false。因此在這裏,ImageRep image()主要實現了積分圖的計算(如果特徵爲histogram,則可實現積分直方圖的計算)。ImageRep類中的類成員包括frame,積分圖,積分直方圖。
4.2 UpdateLearner(image)
該函數主要實現對預測函數的更新,首先通過RadialSamples()獲得5*16=80個樣本,再加上原始目標,總共含有81個樣本。之後判斷這81個樣本是否有超出圖像邊界的,超出的捨棄。將剩餘的樣本存入keptRects,其中,原始目標樣本存入keptRects[0]。定義一個多樣本類MultiSample,該類中的類成員主要包括樣本框以及ImageRep image。並通過Update(sample,0)來實現預測函數的更新。
4.3 Update(sample,0)
該函數定義在LaRank類下,文章中參考文獻《Solving multiclass support vector machines with LaRank》提到了這種算法。當我們分析LaRank頭文件時,可看到struck算法重要步驟全部聚集在這個類中。該類中的類成員包括支持模式SupportPattern,支持向量SupportVector,Config類對象m_config,Features類對象m_features,Kernel類對象m_kernel,存放SupportPattern的m_sps,存放SupportVector的m_svs,用於顯示的m_debugImage,目標函數中的係數m_C,矩陣m_K。
查看SupportPattern的定義,我們知道該結構主要包括x(存放特徵值),yv(,存放目標變化關係),images(存放圖片樣本),y(索引值,表明指定樣本存放位置),refCount(統計sv的個數??)。同樣,查看SupportVector的定義可知,該結構包括一個SupportPattern,y(索引值,表明指定樣本存放位置),b(beta),g(gradient),image(存放圖片樣本)。
在函數Update(sample,0)中,定義了一個SupportPattern* sp。首先對於每個樣本框,其x,y座標分別減去原始目標框的x,y座標,將結果存入sp->yv。然後對於每個樣本框內的圖片統一尺寸爲30*30,並存入sp->images。對於每個樣本框,計算其haar特徵值,並存入sp->x。令sp->y=y=0,sp->refCount=0,最後將當前sp存入m_sps。
4.3.1 ProcessNew(int ind)
之後執行ProcessNew(int ind),其中ind=m_sps.size()-1。由於每處理一幀圖像,m_sps的數量都增加1,這樣定義ind能夠保證ProcessNew所處理的樣本都是最新的樣本。在ProcessNew處理之前,首先看函數AddSupportVector(SupportPattern*
x,int y,double g)的定義:
SupportVector* sv=new SupportVector;定義了一個支持向量。
爲支持向量賦初值:sv->b=0.0,sv->x=x,sv->y=y,sv->g=g,並將該向量存入m_svs。接下來通過調用Kernel類中的Eval()函數更新核矩陣,即m_K,以後用於Algorithm 1計算。
現在再回到ProcessNew函數:
第一個AddSupportVector(),將目標框作爲參數,增加一個支持向量存入m_svs,此時,m_svs.size()=1,m_K(0,0)=1.0,函數返回ip=0。
之後執行MinGradiernt(int ind),求得公式10中的g最小值。返回最小梯度的數值以及對應的樣本框存放位置。
第二個AddSupportVector(),將具有最小梯度的樣本框作爲參數,增加一個特徵向量存入m_svs,此時,m_svs.size()=2,並求得m_K(0,1),m_K(1,0),m_K(1,1)。函數返回in=1。
之後進行SMO算法進行計算,若某向量的beta值爲0,則捨棄該支持向量。
4.3.2 BudgetMaintenance()
再之後執行函數BudgetMaintenance(),保證支持向量個數沒有超過100。
4.3.3 Reprocess()
進行Reprocess()步驟,一個Reprocess()包括1個ProcessOld()和10個Optimize();
ProcessOld()主要對已經存在的SupportPattern進行隨機選取並處理。和ProcessNew不同的地方是,這裏將滿足梯度最大以及滿足的支持向量作爲正支持向量。負支持向量依然根據梯度最小進行選取。之後再次執行SMO算法,判斷這些支持向量是否有效。
Optimize()也是對已經存在的SupportPattern進行隨機選取並處理,但僅僅是對現有的支持向量的beta值進行調整,並不加入新的支持向量。正負支持向量的選取方式和ProcessOld()一樣。
4.3.4 BudgetMaintenance()
執行函數BudgetMaintenance(),保證支持向量個數沒有超過100。
5.跟蹤模塊(Algorithm 2)
首先通過ImageRep image()實現積分圖的計算,然後進行抽樣(這裏抽樣的結果和初始化時的抽樣結果不一樣,大概抽取幾千個樣本)。將超出圖像範圍的框捨棄,剩餘的保留在keptRects中。對keptRects中的每一個框,計算F函數,即,將結果保存在scores裏,並記錄值最大的那一個,將其作爲跟蹤結果。
UpdateDebugImage()函數主要實現程序運行時顯示的界面。UpdateLearner(image)同步驟4一致。
main.cpp
- /*
- * Struck: Structured Output Tracking with Kernels
- *
- * Code to accompany the paper:
- * Struck: Structured Output Tracking with Kernels
- * Sam Hare, Amir Saffari, Philip H. S. Torr
- * International Conference on Computer Vision (ICCV), 2011
- *
- * Copyright (C) 2011 Sam Hare, Oxford Brookes University, Oxford, UK
- *
- * This file is part of Struck.
- *
- * Struck is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 3 of the License, or
- * (at your option) any later version.
- *
- * Struck is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with Struck. If not, see <http://www.gnu.org/licenses/>.
- *
- */
- #include "Tracker.h"
- #include "Config.h"
- #include <iostream>
- #include <fstream>
- #include <opencv/cv.h>
- #include <opencv/highgui.h>
- #include "vot.hpp"
- using namespace std;
- using namespace cv;
- static const int kLiveBoxWidth = 80;
- static const int kLiveBoxHeight = 80;
- void rectangle(Mat& rMat, const FloatRect& rRect, const Scalar& rColour)
- {
- IntRect r(rRect);
- rectangle(rMat, Point(r.XMin(), r.YMin()), Point(r.XMax(), r.YMax()), rColour);
- }
- int main(int argc, char* argv[])
- {
- // 讀取文件對程序參數進行初始化
- string configPath = "config.txt";
- if (argc > 1)
- {
- configPath = argv[1];
- }
- Config conf(configPath); //Config類主要讀取config.txt中的參數
- if (conf.features.size() == 0)
- {
- cout << "error: no features specified in config" << endl;
- return EXIT_FAILURE;
- }
- Tracker tracker(conf);
- //Check if --challenge was passed as an argument
- bool challengeMode = false;
- for (int i = 1; i < argc; i++) {
- if (strcmp("--challenge", argv[i]) == 0) { //判斷是否有挑戰模式(vot挑戰)
- challengeMode = true;
- }
- }
- if (challengeMode) { //VOT(Visual object tracking)挑戰,它提供了一個公共平臺,目標是比較各種跟蹤算法再短期跟蹤內的性能,討論視覺跟蹤領域的發展。
- //load region, images and prepare for output
- Mat frameOrig;
- Mat frame;
- VOT vot_io("region.txt", "images.txt", "output.txt");
- vot_io.getNextImage(frameOrig);
- resize(frameOrig, frame, Size(conf.frameWidth, conf.frameHeight));
- cv::Rect initPos = vot_io.getInitRectangle();
- vot_io.outputBoundingBox(initPos);
- float scaleW = (float)conf.frameWidth/frameOrig.cols;
- float scaleH = (float)conf.frameHeight/frameOrig.rows;
- FloatRect initBB_vot = FloatRect(initPos.x*scaleW, initPos.y*scaleH, initPos.width*scaleW, initPos.height*scaleH);
- tracker.Initialise(frame, initBB_vot);
- while (vot_io.getNextImage(frameOrig) == 1){
- resize(frameOrig, frame, Size(conf.frameWidth, conf.frameHeight));
- tracker.Track(frame);
- const FloatRect& bb = tracker.GetBB();
- float x = bb.XMin()/scaleW;
- float y = bb.YMin()/scaleH;
- float w = bb.Width()/scaleW;
- float h = bb.Height()/scaleH;
- cv::Rect output = cv::Rect(x,y,w,h);
- vot_io.outputBoundingBox(output);
- }
- return 0;
- }
- ofstream outFile;
- if (conf.resultsPath != "")
- {
- outFile.open(conf.resultsPath.c_str(), ios::out); //將程序寫入resultpath
- if (!outFile)
- {
- cout << "error: could not open results file: " << conf.resultsPath << endl;
- return EXIT_FAILURE;
- }
- }
- // if no sequence specified then use the camera
- bool useCamera = (conf.sequenceName == "");
- VideoCapture cap;
- int startFrame = -1;
- int endFrame = -1;
- FloatRect initBB;
- string imgFormat;
- float scaleW = 1.f;
- float scaleH = 1.f;
- if (useCamera)
- {
- if (!cap.open(0))
- {
- cout << "error: could not start camera capture" << endl;
- return EXIT_FAILURE;
- }
- startFrame = 0;
- endFrame = INT_MAX; /* maximum (signed) int value */
- Mat tmp;
- cap >> tmp;
- scaleW = (float)conf.frameWidth/tmp.cols;
- scaleH = (float)conf.frameHeight/tmp.rows;
- initBB = IntRect(conf.frameWidth/2-kLiveBoxWidth/2, conf.frameHeight/2-kLiveBoxHeight/2, kLiveBoxWidth, kLiveBoxHeight);
- cout << "press 'i' to initialise tracker" << endl;
- }
- else
- {
- // parse frames file
- string framesFilePath = conf.sequenceBasePath+"/"+conf.sequenceName+"/"+conf.sequenceName+"_frames.txt"; //girl_frames.txt的文件路徑,該文件放在girl文件夾裏,內容爲0,501。
- ifstream framesFile(framesFilePath.c_str(), ios::in);
- if (!framesFile)
- {
- cout << "error: could not open sequence frames file: " << framesFilePath << endl;
- return EXIT_FAILURE;
- }
- string framesLine;
- getline(framesFile, framesLine);
- sscanf(framesLine.c_str(), "%d,%d", &startFrame, &endFrame); //startFrame=0;endFrame=501;
- if (framesFile.fail() || startFrame == -1 || endFrame == -1)
- {
- cout << "error: could not parse sequence frames file" << endl;
- return EXIT_FAILURE;
- }
- imgFormat = conf.sequenceBasePath+"/"+conf.sequenceName+"/imgs/img%05d.png";
- // read first frame to get size
- char imgPath[256];
- sprintf(imgPath, imgFormat.c_str(), startFrame); //sprintf把格式化的數據寫入某個字符串緩衝區(imgPath);
- Mat tmp = cv::imread(imgPath, 0);
- scaleW = (float)conf.frameWidth/tmp.cols; //=1;
- scaleH = (float)conf.frameHeight/tmp.rows; //=1;
- // read init box from ground truth file
- string gtFilePath = conf.sequenceBasePath+"/"+conf.sequenceName+"/"+conf.sequenceName+"_gt.txt"; //讀取girl_gt.txt文件
- ifstream gtFile(gtFilePath.c_str(), ios::in);
- if (!gtFile)
- {
- cout << "error: could not open sequence gt file: " << gtFilePath << endl;
- return EXIT_FAILURE;
- }
- string gtLine;
- getline(gtFile, gtLine);
- float xmin = -1.f;
- float ymin = -1.f;
- float width = -1.f;
- float height = -1.f;
- sscanf(gtLine.c_str(), "%f,%f,%f,%f", &xmin, &ymin, &width, &height); //128,46,104,127
- if (gtFile.fail() || xmin < 0.f || ymin < 0.f || width < 0.f || height < 0.f)
- {
- cout << "error: could not parse sequence gt file" << endl;
- return EXIT_FAILURE;
- }
- initBB = FloatRect(xmin*scaleW, ymin*scaleH, width*scaleW, height*scaleH);
- }
- if (!conf.quietMode)
- {
- namedWindow("result");
- }
- Mat result(conf.frameHeight, conf.frameWidth, CV_8UC3);
- bool paused = false;
- bool doInitialise = false;
- srand(conf.seed);
- for (int frameInd = startFrame; frameInd <= endFrame; ++frameInd) //逐幀處理
- {
- Mat frame;
- if (useCamera) //若使用攝像頭
- {
- Mat frameOrig;
- cap >> frameOrig;
- resize(frameOrig, frame, Size(conf.frameWidth, conf.frameHeight));
- flip(frame, frame, 1);
- frame.copyTo(result);
- if (doInitialise)
- {
- if (tracker.IsInitialised())
- {
- tracker.Reset();
- }
- else
- {
- tracker.Initialise(frame, initBB);
- }
- doInitialise = false;
- }
- else if (!tracker.IsInitialised())
- {
- rectangle(result, initBB, CV_RGB(255, 255, 255));
- }
- }
- else //若讀取圖片序列
- {
- char imgPath[256];
- sprintf(imgPath, imgFormat.c_str(), frameInd);
- Mat frameOrig = cv::imread(imgPath, 0);
- if (frameOrig.empty())
- {
- cout << "error: could not read frame: " << imgPath << endl;
- return EXIT_FAILURE;
- }
- resize(frameOrig, frame, Size(conf.frameWidth, conf.frameHeight)); //將讀取的每幀圖像統一爲320*240;
- cvtColor(frame, result, CV_GRAY2RGB);
- if (frameInd == startFrame)
- {
- tracker.Initialise(frame, initBB); //對第一幀進行初始化
- }
- }
- if (tracker.IsInitialised())
- {
- tracker.Track(frame); //開始跟蹤
- if (!conf.quietMode && conf.debugMode)
- {
- tracker.Debug(); //用於顯示樣本圖像
- }
- rectangle(result, tracker.GetBB(), CV_RGB(0, 255, 0));
- if (outFile)
- {
- const FloatRect& bb = tracker.GetBB();
- outFile << bb.XMin()/scaleW << "," << bb.YMin()/scaleH << "," << bb.Width()/scaleW << "," << bb.Height()/scaleH << endl;
- } //輸出跟蹤結果座標
- }
- if (!conf.quietMode)
- {
- imshow("result", result); //顯示跟蹤畫面
- int key = waitKey(paused ? 0 : 1);
- if (key != -1)
- {
- if (key == 27 || key == 113) // esc q
- {
- break;
- }
- else if (key == 112) // p
- {
- paused = !paused;
- }
- else if (key == 105 && useCamera)
- {
- doInitialise = true;
- }
- }
- if (conf.debugMode && frameInd == endFrame)
- {
- cout << "\n\nend of sequence, press any key to exit" << endl;
- waitKey();
- }
- }
- }
- if (outFile.is_open())
- {
- outFile.close();
- }
- return EXIT_SUCCESS;
- }
Tracker.cpp
- #include "Tracker.h"
- #include "Config.h"
- #include "ImageRep.h"
- #include "Sampler.h"
- #include "Sample.h"
- #include "GraphUtils/GraphUtils.h"
- #include "HaarFeatures.h"
- #include "RawFeatures.h"
- #include "HistogramFeatures.h"
- #include "MultiFeatures.h"
- #include "Kernels.h"
- #include "LaRank.h"
- #include <opencv/cv.h>
- #include <opencv/highgui.h>
- #include <Eigen/Core>
- #include <vector>
- #include <algorithm>
- using namespace cv;
- using namespace std;
- using namespace Eigen;
- Tracker::Tracker(const Config& conf) : //構造函數,對參數進行初始化
- m_config(conf),
- m_initialised(false),
- m_pLearner(0),
- m_debugImage(2*conf.searchRadius+1, 2*conf.searchRadius+1, CV_32FC1),
- m_needsIntegralImage(false)
- {
- Reset();
- }
- Tracker::~Tracker()
- {
- delete m_pLearner;
- for (int i = 0; i < (int)m_features.size(); ++i)
- {
- delete m_features[i];
- delete m_kernels[i];
- }
- }
- void Tracker::Reset() //因爲初始化爲haar特徵核高斯核函數,所以m_needsIntegralImage = true,m_needsIntegralHist = false;
- {
- m_initialised = false;
- m_debugImage.setTo(0);
- if (m_pLearner) delete m_pLearner;
- for (int i = 0; i < (int)m_features.size(); ++i)
- {
- delete m_features[i];
- delete m_kernels[i];
- }
- m_features.clear();
- m_kernels.clear();
- m_needsIntegralImage = false;
- m_needsIntegralHist = false;
- int numFeatures = m_config.features.size();
- vector<int> featureCounts;
- for (int i = 0; i < numFeatures; ++i)
- {
- switch (m_config.features[i].feature)
- {
- case Config::kFeatureTypeHaar:
- m_features.push_back(new HaarFeatures(m_config));
- m_needsIntegralImage = true;
- break;
- case Config::kFeatureTypeRaw:
- m_features.push_back(new RawFeatures(m_config));
- break;
- case Config::kFeatureTypeHistogram:
- m_features.push_back(new HistogramFeatures(m_config));
- m_needsIntegralHist = true;
- break;
- }
- featureCounts.push_back(m_features.back()->GetCount());
- switch (m_config.features[i].kernel)
- {
- case Config::kKernelTypeLinear:
- m_kernels.push_back(new LinearKernel());
- break;
- case Config::kKernelTypeGaussian:
- m_kernels.push_back(new GaussianKernel(m_config.features[i].params[0]));
- break;
- case Config::kKernelTypeIntersection:
- m_kernels.push_back(new IntersectionKernel());
- break;
- case Config::kKernelTypeChi2:
- m_kernels.push_back(new Chi2Kernel());
- break;
- }
- }
- if (numFeatures > 1)
- {
- MultiFeatures* f = new MultiFeatures(m_features);
- m_features.push_back(f);
- MultiKernel* k = new MultiKernel(m_kernels, featureCounts);
- m_kernels.push_back(k);
- }
- m_pLearner = new LaRank(m_config, *m_features.back(), *m_kernels.back());
- }
- void Tracker::Initialise(const cv::Mat& frame, FloatRect bb)
- {
- m_bb = IntRect(bb);//將目標框座標轉爲int型
- //該類主要實現了積分圖計算
- ImageRep image(frame, m_needsIntegralImage, m_needsIntegralHist); //後兩個參數分別爲true,false
- for (int i = 0; i < 1; ++i)
- {
- UpdateLearner(image);// 更新預測函數
- }
- m_initialised = true;
- }
- void Tracker::Track(const cv::Mat& frame)
- {
- assert(m_initialised);
- ImageRep image(frame, m_needsIntegralImage, m_needsIntegralHist); //獲得當前幀的積分圖
- vector<FloatRect> rects = Sampler::PixelSamples(m_bb, m_config.searchRadius); //抽樣
- vector<FloatRect> keptRects;
- keptRects.reserve(rects.size());
- for (int i = 0; i < (int)rects.size(); ++i)
- {
- if (!rects[i].IsInside(image.GetRect())) continue;
- keptRects.push_back(rects[i]); //將超出圖像範圍的框捨棄,剩餘的保留在keptRects中
- }
- MultiSample sample(image, keptRects); //多樣本類,主要包括樣本框以及ImageRep image
- vector<double> scores;
- m_pLearner->Eval(sample, scores); //scores裏存放的是論文中公式(10)後半部分
- double bestScore = -DBL_MAX;
- int bestInd = -1;
- for (int i = 0; i < (int)keptRects.size(); ++i)
- {
- if (scores[i] > bestScore)
- {
- bestScore = scores[i];
- bestInd = i; //找到bestScore
- }
- }
- UpdateDebugImage(keptRects, m_bb, scores);//更新debug圖像,用於顯示
- if (bestInd != -1)
- {
- m_bb = keptRects[bestInd];
- UpdateLearner(image);
- #if VERBOSE
- cout << "track score: " << bestScore << endl;
- #endif
- }
- }
- void Tracker::UpdateDebugImage(const vector<FloatRect>& samples, const FloatRect& centre, const vector<double>& scores)
- {
- double mn = VectorXd::Map(&scores[0], scores.size()).minCoeff(); //Map:將現存的結構映射到Eigen的數據結構裏,進行計算
- double mx = VectorXd::Map(&scores[0], scores.size()).maxCoeff(); //R.minCoeff()=min(R(:)), R.maxCoeff()=max(R(:));
- m_debugImage.setTo(0); //置爲全黑色
- for (int i = 0; i < (int)samples.size(); ++i)
- {
- int x = (int)(samples[i].XMin() - centre.XMin());
- int y = (int)(samples[i].YMin() - centre.YMin());
- m_debugImage.at<float>(m_config.searchRadius+y,m_config.searchRadius+x)=(float)((scores[i]-mn)/(mx-mn));//scores得分越大的框,會在m_debugImage上具有越大的值,即該點越亮(類似於置信圖)
- }
- }
- void Tracker::Debug()
- {
- imshow("tracker", m_debugImage); //顯示m_debugImage圖像
- m_pLearner->Debug();
- }
- void Tracker::UpdateLearner(const ImageRep& image) //更新預測函數
- {
- // note these return the centre sample at index 0
- vector<FloatRect> rects = Sampler::RadialSamples(m_bb, 2*m_config.searchRadius, 5, 16);//5*16=80,加上一個原始rect,共包含81個rect
- //vector<FloatRect> rects = Sampler::PixelSamples(m_bb, 2*m_config.searchRadius, true);
- vector<FloatRect> keptRects;
- keptRects.push_back(rects[0]); // 原始目標框
- for (int i = 1; i < (int)rects.size(); ++i)
- {
- if (!rects[i].IsInside(image.GetRect())) continue; //判斷生成的樣本框是否超出圖像範圍,超出的捨棄
- keptRects.push_back(rects[i]);
- }
- #if VERBOSE
- cout << keptRects.size() << " samples" << endl;
- #endif
- MultiSample sample(image, keptRects); //多樣本類對象sample,包含ImageRep& image,以及保留下來樣本框
- m_pLearner->Update(sample, 0); //更新,在LaRank類下實現
- }
LaRank.h
- #ifndef LARANK_H
- #define LARANK_H
- #include "Rect.h"
- #include "Sample.h"
- #include <vector>
- #include <Eigen/Core>
- #include <opencv/cv.h>
- class Config;
- class Features;
- class Kernel;
- class LaRank //文獻《Solving multiclass support vector machine with LaRank》,該類實現了struck算法的主要步驟
- {
- public:
- LaRank(const Config& conf, const Features& features, const Kernel& kernel); //初始化參數,特徵值,核
- ~LaRank();
- virtual void Eval(const MultiSample& x, std::vector<double>& results);
- virtual void Update(const MultiSample& x, int y);
- virtual void Debug();
- private:
- struct SupportPattern
- {
- std::vector<Eigen::VectorXd> x; //特徵值
- std::vector<FloatRect> yv; //變化關係
- std::vector<cv::Mat> images; //圖像片
- int y; //索引值
- int refCount; //統計sp的個數?
- };
- struct SupportVector
- {
- SupportPattern* x;
- int y;
- double b; //beta
- double g; //gradient
- cv::Mat image;
- };
- const Config& m_config;
- const Features& m_features;
- const Kernel& m_kernel;
- std::vector<SupportPattern*> m_sps;
- std::vector<SupportVector*> m_svs;
- cv::Mat m_debugImage;
- double m_C;
- Eigen::MatrixXd m_K;
- inline double Loss(const FloatRect& y1, const FloatRect& y2) const //損失函數
- {
- // overlap loss
- return 1.0-y1.Overlap(y2);
- // squared distance loss
- //double dx = y1.XMin()-y2.XMin();
- //double dy = y1.YMin()-y2.YMin();
- //return dx*dx+dy*dy;
- }
- double ComputeDual() const;
- void SMOStep(int ipos, int ineg);
- std::pair<int, double> MinGradient(int ind);
- void ProcessNew(int ind);
- void Reprocess();
- void ProcessOld();
- void Optimize();
- int AddSupportVector(SupportPattern* x, int y, double g);
- void RemoveSupportVector(int ind);
- void RemoveSupportVectors(int ind1, int ind2);
- void SwapSupportVectors(int ind1, int ind2);
- void BudgetMaintenance();
- void BudgetMaintenanceRemove();
- double Evaluate(const Eigen::VectorXd& x, const FloatRect& y) const;
- void UpdateDebugImage();
- };
- #endif
LaRank.cpp
- #include "LaRank.h"
- #include "Config.h"
- #include "Features.h"
- #include "Kernels.h"
- #include "Sample.h"
- #include "Rect.h"
- #include "GraphUtils/GraphUtils.h"
- #include <Eigen/Array>
- #include <opencv/highgui.h>
- static const int kTileSize = 30;
- using namespace cv;
- using namespace std;
- using namespace Eigen;
- static const int kMaxSVs = 2000; // TODO (only used when no budget)
- LaRank::LaRank(const Config& conf, const Features& features, const Kernel& kernel) :
- m_config(conf),
- m_features(features),
- m_kernel(kernel),
- m_C(conf.svmC)
- {
- int N = conf.svmBudgetSize > 0 ? conf.svmBudgetSize+2 : kMaxSVs; //N=100+2,特徵向量的個數不能超過這個閾值
- m_K = MatrixXd::Zero(N, N); //m_K表示核矩陣,102*102
- m_debugImage = Mat(800, 600, CV_8UC3);
- }
- LaRank::~LaRank()
- {
- }
- double LaRank::Evaluate(const Eigen::VectorXd& x, const FloatRect& y) const //論文中公式10後半部分計算,即F
- {
- double f = 0.0;
- for (int i = 0; i < (int)m_svs.size(); ++i)
- {
- const SupportVector& sv = *m_svs[i];
- f += sv.b*m_kernel.Eval(x, sv.x->x[sv.y]); //beta*高斯核
- }
- return f;
- }
- void LaRank::Eval(const MultiSample& sample, std::vector<double>& results)
- {
- const FloatRect& centre(sample.GetRects()[0]); //原始目標框
- vector<VectorXd> fvs;
- const_cast<Features&>(m_features).Eval(sample, fvs); //fvs存放haar特徵值
- results.resize(fvs.size());
- for (int i = 0; i < (int)fvs.size(); ++i)
- {
- // express y in coord frame of centre sample
- FloatRect y(sample.GetRects()[i]);
- y.Translate(-centre.XMin(), -centre.YMin()); //將每個框的橫縱座標分別減去原始目標框的橫縱座標
- results[i] = Evaluate(fvs[i], y); //計算每個框的F函數,結果保存在results中。
- }
- }
- void LaRank::Update(const MultiSample& sample, int y)
- {
- // add new support pattern
- SupportPattern* sp = new SupportPattern; //定義一個sp
- const vector<FloatRect>& rects = sample.GetRects(); //獲得所有的樣本框
- FloatRect centre = rects[y]; //原始目標框
- for (int i = 0; i < (int)rects.size(); ++i)
- {
- // express r in coord frame of centre sample
- FloatRect r = rects[i];
- r.Translate(-centre.XMin(), -centre.YMin()); //這就表示幀間目標位置變化關係
- sp->yv.push_back(r);
- if (!m_config.quietMode && m_config.debugMode)
- {
- // store a thumbnail for each sample
- Mat im(kTileSize, kTileSize, CV_8UC1);
- IntRect rect = rects[i];
- cv::Rect roi(rect.XMin(), rect.YMin(), rect.Width(), rect.Height()); //感興趣的區域是那些抽取的樣本區域
- cv::resize(sample.GetImage().GetImage(0)(roi), im, im.size()); //0表示通道數,將感興趣區域統一爲30*30,並保存在sp裏的images
- sp->images.push_back(im);
- }
- }
- // evaluate features for each sample
- sp->x.resize(rects.size()); //有多少個感興趣的框,就有多少個特徵值向量。
- const_cast<Features&>(m_features).Eval(sample, sp->x); //將每個樣本框計算得到的haar特徵存入sp->x,這裏關於haar特徵的代碼不再列出,我將代碼提取出來單獨寫出一篇博客《http://blog.csdn.net/qianxin_dh/article/details/39268113》
- sp->y = y;
- sp->refCount = 0;
- m_sps.push_back(sp); //存儲sp
- ProcessNew((int)m_sps.size()-1); //執行該步驟,添加支持向量,並對beta值進行調整
- BudgetMaintenance(); //保證支持向量沒有超出限定閾值
- for (int i = 0; i < 10; ++i)
- {
- Reprocess(); //包括processold:增加新的sv;optimize:在現有的sv基礎上調整beta值
- BudgetMaintenance();
- }
- }
- void LaRank::BudgetMaintenance()
- {
- if (m_config.svmBudgetSize > 0)
- {
- while ((int)m_svs.size() > m_config.svmBudgetSize)
- {
- BudgetMaintenanceRemove(); //支持向量的個數超出閾值後,找到對於F函數影響最小的負sv,並移除。
- }
- }
- }
- void LaRank::Reprocess()
- {
- ProcessOld(); //每個processold步驟伴隨着10個optimize步驟。
- for (int i = 0; i < 10; ++i)
- {
- Optimize();
- }
- }
- double LaRank::ComputeDual() const
- {
- double d = 0.0;
- for (int i = 0; i < (int)m_svs.size(); ++i)
- {
- const SupportVector* sv = m_svs[i];
- d -= sv->b*Loss(sv->x->yv[sv->y], sv->x->yv[sv->x->y]);
- for (int j = 0; j < (int)m_svs.size(); ++j)
- {
- d -= 0.5*sv->b*m_svs[j]->b*m_K(i,j);
- }
- }
- return d;
- }
- void LaRank::SMOStep(int ipos, int ineg)
- {
- if (ipos == ineg) return;
- SupportVector* svp = m_svs[ipos]; //定義一個正支持向量
- SupportVector* svn = m_svs[ineg]; //定義一個負支持向量
- assert(svp->x == svn->x);
- SupportPattern* sp = svp->x; //定義一個支持模式sp,將正支持向量的支持模式賦予sp
- #if VERBOSE
- cout << "SMO: gpos:" << svp->g << " gneg:" << svn->g << endl;
- #endif
- if ((svp->g - svn->g) < 1e-5)
- {
- #if VERBOSE
- cout << "SMO: skipping" << endl;
- #endif
- }
- else
- { //論文中的Algorithm步驟
- double kii = m_K(ipos, ipos) + m_K(ineg, ineg) - 2*m_K(ipos, ineg);
- double lu = (svp->g-svn->g)/kii;
- // no need to clamp against 0 since we'd have skipped in that case
- double l = min(lu, m_C*(int)(svp->y == sp->y) - svp->b);
- svp->b += l;
- svn->b -= l;
- // update gradients
- for (int i = 0; i < (int)m_svs.size(); ++i)
- {
- SupportVector* svi = m_svs[i];
- svi->g -= l*(m_K(i, ipos) - m_K(i, ineg));
- }
- #if VERBOSE
- cout << "SMO: " << ipos << "," << ineg << " -- " << svp->b << "," << svn->b << " (" << l << ")" << endl;
- #endif
- }
- // check if we should remove either sv now
- if (fabs(svp->b) < 1e-8) //beta爲0,該向量被移除
- {
- RemoveSupportVector(ipos);
- if (ineg == (int)m_svs.size())
- {
- // ineg and ipos will have been swapped during sv removal
- ineg = ipos;
- }
- }
- if (fabs(svn->b) < 1e-8) //beta=0,該向量被移除
- {
- RemoveSupportVector(ineg);
- }
- }
- pair<int, double> LaRank::MinGradient(int ind)
- {
- const SupportPattern* sp = m_sps[ind];
- pair<int, double> minGrad(-1, DBL_MAX);
- for (int i = 0; i < (int)sp->yv.size(); ++i)
- {
- double grad = -Loss(sp->yv[i], sp->yv[sp->y]) - Evaluate(sp->x[i], sp->yv[i]);//通過公式10找到最小梯度對應的樣本框
- if (grad < minGrad.second)
- {
- minGrad.first = i;
- minGrad.second = grad;
- }
- }
- return minGrad;
- }
- void LaRank::ProcessNew(int ind) //可以添加新的支持向量,增加的正負支持向量(sv)具有相同的支持模式(sp)
- {
- // gradient is -f(x,y) since loss=0
- int ip = AddSupportVector(m_sps[ind], m_sps[ind]->y, -Evaluate(m_sps[ind]->x[m_sps[ind]->y],m_sps[ind]->yv[m_sps[ind]->y])); //處理當前新樣本,將上一幀目標位置作爲正向量加入
- pair<int, double> minGrad = MinGradient(ind); //int,double分別是具有最小梯度的樣本框存放的位置,最小梯度的數值
- int in = AddSupportVector(m_sps[ind], minGrad.first, minGrad.second); //將當前具有最小梯度的樣本作爲負向量加入
- SMOStep(ip, in); //Algorithm 1,更新beta和gradient值
- }
- void LaRank::ProcessOld() //可以添加新的支持向量
- {
- if (m_sps.size() == 0) return;
- // choose pattern to process
- int ind = rand() % m_sps.size(); //隨機選取sp
- // find existing sv with largest grad and nonzero beta
- int ip = -1;
- double maxGrad = -DBL_MAX;
- for (int i = 0; i < (int)m_svs.size(); ++i)
- {
- if (m_svs[i]->x != m_sps[ind]) continue;
- const SupportVector* svi = m_svs[i];
- if (svi->g > maxGrad && svi->b < m_C*(int)(svi->y == m_sps[ind]->y)) //找出符合該條件的,作爲y+,後一個條件保證了y+是從現存的sv中找出,因此不會增加新的向量
- {
- ip = i;
- maxGrad = svi->g;
- }
- }
- assert(ip != -1);
- if (ip == -1) return;
- // find potentially new sv with smallest grad
- pair<int, double> minGrad = MinGradient(ind);
- int in = -1;
- for (int i = 0; i < (int)m_svs.size(); ++i)
- {
- if (m_svs[i]->x != m_sps[ind]) continue; //找出滿足該條件的,作爲y-
- if (m_svs[i]->y == minGrad.first)
- {
- in = i;
- break;
- }
- }
- if (in == -1)
- {
- // add new sv
- in = AddSupportVector(m_sps[ind], minGrad.first, minGrad.second); //將該樣本作爲負sv加入
- }
- SMOStep(ip, in); //更新beta和gradient的值
- }
- void LaRank::Optimize() //
- {
- if (m_sps.size() == 0) return;
- // choose pattern to optimize
- int ind = rand() % m_sps.size(); //隨機處理現存的sp
- int ip = -1;
- int in = -1;
- double maxGrad = -DBL_MAX;
- double minGrad = DBL_MAX;
- for (int i = 0; i < (int)m_svs.size(); ++i)
- {
- if (m_svs[i]->x != m_sps[ind]) continue;
- const SupportVector* svi = m_svs[i];
- if(svi->g>maxGrad&&svi->b<m_C*(int)(svi->y==m_sps->[y])) //將滿足該條件的作爲y+
- {
- ip = i;
- maxGrad = svi->g;
- }
- if (svi->g < minGrad) //將滿足該條件的作爲y-
- {
- in = i;
- minGrad = svi->g;
- }
- }
- assert(ip != -1 && in != -1);
- if (ip == -1 || in == -1)
- {
- // this shouldn't happen
- cout << "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" << endl;
- return;
- }
- SMOStep(ip, in); //更新beta和gradient
- }
- int LaRank::AddSupportVector(SupportPattern* x, int y, double g)
- {
- SupportVector* sv = new SupportVector;
- sv->b = 0.0; //beta初始化爲0
- sv->x = x;
- sv->y = y;
- sv->g = g;
- int ind = (int)m_svs.size();
- m_svs.push_back(sv);
- x->refCount++;
- #if VERBOSE
- cout << "Adding SV: " << ind << endl;
- #endif
- // update kernel matrix
- for (int i = 0; i < ind; ++i) //計算核矩陣
- {
- m_K(i,ind) = m_kernel.Eval(m_svs[i]->x->x[m_svs[i]->y], x->x[y]);
- m_K(ind,i) = m_K(i,ind);
- }
- m_K(ind,ind) = m_kernel.Eval(x->x[y]);
- return ind;
- }
- void LaRank::SwapSupportVectors(int ind1, int ind2)
- {
- SupportVector* tmp = m_svs[ind1];
- m_svs[ind1] = m_svs[ind2];
- m_svs[ind2] = tmp;
- VectorXd row1 = m_K.row(ind1);
- m_K.row(ind1) = m_K.row(ind2);
- m_K.row(ind2) = row1;
- VectorXd col1 = m_K.col(ind1);
- m_K.col(ind1) = m_K.col(ind2);
- m_K.col(ind2) = col1;
- }
- void LaRank::RemoveSupportVector(int ind)
- {
- #if VERBOSE
- cout << "Removing SV: " << ind << endl;
- #endif
- m_svs[ind]->x->refCount--;
- if (m_svs[ind]->x->refCount == 0)
- {
- // also remove the support pattern
- for (int i = 0; i < (int)m_sps.size(); ++i)
- {
- if (m_sps[i] == m_svs[ind]->x)
- {
- delete m_sps[i];
- m_sps.erase(m_sps.begin()+i);
- break;
- }
- }
- }
- // make sure the support vector is at the back, this
- // lets us keep the kernel matrix cached and valid
- if (ind < (int)m_svs.size()-1)
- {
- SwapSupportVectors(ind, (int)m_svs.size()-1);
- ind = (int)m_svs.size()-1;
- }
- delete m_svs[ind];
- m_svs.pop_back();
- }
- void LaRank::BudgetMaintenanceRemove()
- {
- // find negative sv with smallest effect on discriminant function if removed
- double minVal = DBL_MAX;
- int in = -1;
- int ip = -1;
- for (int i = 0; i < (int)m_svs.size(); ++i)
- {
- if (m_svs[i]->b < 0.0) //找到負sv
- {
- // find corresponding positive sv
- int j = -1;
- for (int k = 0; k < (int)m_svs.size(); ++k)
- {
- if (m_svs[k]->b > 0.0 && m_svs[k]->x == m_svs[i]->x) //找到同一支持模式下的正sv
- {
- j = k;
- break;
- }
- }
- double val = m_svs[i]->b*m_svs[i]->b*(m_K(i,i) + m_K(j,j) - 2.0*m_K(i,j));
- if (val < minVal) //找到對F影響最小的sv
- {
- minVal = val;
- in = i;
- ip = j;
- }
- }
- }
- // adjust weight of positive sv to compensate for removal of negative
- m_svs[ip]->b += m_svs[in]->b; //將負sv移除,其相應的beta值需補償到正sv上。
- // remove negative sv
- RemoveSupportVector(in);
- if (ip == (int)m_svs.size())
- {
- // ip and in will have been swapped during support vector removal
- ip = in;
- }
- if (m_svs[ip]->b < 1e-8) //beta值爲0,移除該向量
- {
- // also remove positive sv
- RemoveSupportVector(ip);
- }
- // update gradients
- // TODO: this could be made cheaper by just adjusting incrementally rather than recomputing
- for (int i = 0; i < (int)m_svs.size(); ++i)
- {
- SupportVector& svi = *m_svs[i];
- svi.g = -Loss(svi.x->yv[svi.y],svi.x->yv[svi.x->y]) - Evaluate(svi.x->x[svi.y], svi.x->yv[svi.y]);
- }
- }
- void LaRank::Debug()
- {
- cout << m_sps.size() << "/" << m_svs.size() << " support patterns/vectors" << endl;
- UpdateDebugImage();
- imshow("learner", m_debugImage);
- }
- void LaRank::UpdateDebugImage() //該函數主要用於樣本顯示,與算法關係不大,這裏不做分析了
- {
- m_debugImage.setTo(0);
- int n = (int)m_svs.size();
- if (n == 0) return;
- const int kCanvasSize = 600;
- int gridSize = (int)sqrtf((float)(n-1)) + 1;
- int tileSize = (int)((float)kCanvasSize/gridSize);
- if (tileSize < 5)
- {
- cout << "too many support vectors to display" << endl;
- return;
- }
- Mat temp(tileSize, tileSize, CV_8UC1);
- int x = 0;
- int y = 0;
- int ind = 0;
- float vals[kMaxSVs];
- memset(vals, 0, sizeof(float)*n);
- int drawOrder[kMaxSVs];
- for (int set = 0; set < 2; ++set)
- {
- for (int i = 0; i < n; ++i)
- {
- if (((set == 0) ? 1 : -1)*m_svs[i]->b < 0.0) continue;
- drawOrder[ind] = i;
- vals[ind] = (float)m_svs[i]->b;
- ++ind;
- Mat I = m_debugImage(cv::Rect(x, y, tileSize, tileSize));
- resize(m_svs[i]->x->images[m_svs[i]->y], temp, temp.size());
- cvtColor(temp, I, CV_GRAY2RGB);
- double w = 1.0;
- rectangle(I, Point(0, 0), Point(tileSize-1, tileSize-1), (m_svs[i]->b > 0.0) ? CV_RGB(0, (uchar)(255*w), 0) : CV_RGB((uchar)(255*w), 0, 0), 3);
- x += tileSize;
- if ((x+tileSize) > kCanvasSize)
- {
- y += tileSize;
- x = 0;
- }
- }
- }
- const int kKernelPixelSize = 2;
- int kernelSize = kKernelPixelSize*n;
- double kmin = m_K.minCoeff();
- double kmax = m_K.maxCoeff();
- if (kernelSize < m_debugImage.cols && kernelSize < m_debugImage.rows)
- {
- Mat K = m_debugImage(cv::Rect(m_debugImage.cols-kernelSize, m_debugImage.rows-kernelSize, kernelSize, kernelSize));
- for (int i = 0; i < n; ++i)
- {
- for (int j = 0; j < n; ++j)
- {
- Mat Kij = K(cv::Rect(j*kKernelPixelSize, i*kKernelPixelSize, kKernelPixelSize, kKernelPixelSize));
- uchar v = (uchar)(255*(m_K(drawOrder[i], drawOrder[j])-kmin)/(kmax-kmin));
- Kij.setTo(Scalar(v, v, v));
- }
- }
- }
- else
- {
- kernelSize = 0;
- }
- Mat I = m_debugImage(cv::Rect(0, m_debugImage.rows - 200, m_debugImage.cols-kernelSize, 200));
- I.setTo(Scalar(255,255,255));
- IplImage II = I;
- setGraphColor(0);
- drawFloatGraph(vals, n, &II, 0.f, 0.f, I.cols, I.rows);
- }