RANSAC算法——看完保證你理解

原創

2020-04-15 06:58

RANSAC全程Random sample consensus，中文即“隨機抽樣一致算法”，該方法採用迭代的方式從一組包含離羣（outlier或者錯誤數據）的被觀測數據中估算出數學模型的參數，相比最小二乘方法，它融合了剔除不合格數據的思想，因此對於有部分錯誤數據的數據樣本，能夠更快更準的給出辨識結果。該算法首先由Fischler和Bolles 於1981提出，他們採用該方法來解決圖像定位問題(LDP)。目前在圖像以及辨識等領域廣泛應用。

1 最小二乘算法的缺陷

最小二乘方法，即通過最小化誤差的平方和尋找數據的最佳函數匹配，廣泛應用於數據辨識領域，但是其對於某些數據的擬合有一定的缺陷，最小二乘得到的是針對所有數據的全局最優，但是並不是所有數據都是適合去擬合的，也就是說數據可能存在較大的誤差甚至差錯，而這種差錯的識別是需要一定的代價的，如下示意圖（僅示意）就是一個典型的例子：

很顯然，上述線性擬合的結果並不是我們想要的，我們真正需要的是一下擬合的效果：

這正是RANSAC算法的擬合效果！因爲剔除了不想被參與擬合的紅點而選擇在一定範圍內的藍點，這樣保證了樣本數據的乾淨，保證擬合效果更接近真實。

2 RANSAC算法

2.1 原理

通用的RANSAC算法的工作流程如下[2]:

給定如下:
data – 一組觀測數據組.
model – 擬合模型（例如線性、二次曲線等等）.
n – 用於擬合的最小數據組數.
k – 算法規定的最大遍歷次數.
t – 數據和模型匹配程度的閾值，在t範圍內即inliers，在範圍外即outliers.
d – 表示模型合適的最小數據組數.

返回如下:
bestFit – 一組最匹配的模型參數，即model的參數

以上前提中，data和model以及model的參數我們很容易理解，但是n、k、t、d的含義可能有點模糊，不妨先放一放，容後續慢慢理解。

函數體的僞代碼如下：

參數初始化：
iterations = 0 /// 遍歷次數
bestFit = nul
bestErr = something really large ///

/// 遍歷
while iterations < k do
    maybeInliers := n /// 從數據中隨機選擇n個擬合的數據組
    maybeModel := model parameters fitted to maybeInliers /// 根據以上n個數據獲得模型的參數
    alsoInliers := empty set  /// 初始化空數據組
    
    for every point in data not in maybeInliers do  /// 遍歷：將處maybeInliers外的其他數據組一一與模型進行比較
        if point fits maybeModel with an error smaller than t /// 如果比較下來誤差在t範圍內，則調價到inliers集合中
             add point to alsoInliers
    end for
    
    if the number of elements in alsoInliers is > d then /// 如果當前得到的Inliers集合中的數據組數量大於d
        // This implies that we may have found a good model ///意味着該模型是個“好”模型（即好參數）
        // now test how good it is.
        betterModel := model parameters fitted to all points in maybeInliers and alsoInliers
        thisErr := a measure of how well betterModel fits these points
        /// 如果當前模型的與Inliers中數據的誤差比之前得到的最小誤差更小，則更新最小誤差，
        /// 最優模型參數設置爲當前模型參數
        if thisErr < bestErr then
            bestFit := betterModel
            bestErr := thisErr
        end if
    end if
    
    increment iterations  /// 繼續遍歷
end while

多看兩邊以上的以上的僞代碼，相信應該是可以理解四個參數的含義了，如果還不理解，沒關係，還有更實際的例子。往下看:

2.2 實例

以下是用MATLAB實現的一個採用RANSAC算法模型選爲線性函數的函數與實例。

function [bestParameter1,bestParameter2] = ransac_demo(data,num,iter,threshDist,inlierRatio)
 % data: a 2xn dataset with #n data points
 % num: the minimum number of points. For line fitting problem, num=2
 % iter: the number of iterations
 % threshDist: the threshold of the distances between points and the fitting line
 % inlierRatio: the threshold of the number of inliers 
 
 %% Plot the data points
 figure;plot(data(1,:),data(2,:),'o');hold on;
 number = size(data,2); % Total number of points
 bestInNum = 0; % Best fitting line with largest number of inliers
 bestParameter1=0;bestParameter2=0; % parameters for best fitting line
 for i=1:iter
 %% Randomly select 2 points
     idx = randperm(number,num); sample = data(:,idx);   
 %% Compute the distances between all points with the fitting line 
     kLine = sample(:,2)-sample(:,1);% two points relative distance
     kLineNorm = kLine/norm(kLine);
     normVector = [-kLineNorm(2),kLineNorm(1)];%Ax+By+C=0 A=-kLineNorm(2),B=kLineNorm(1)
     distance = normVector*(data - repmat(sample(:,1),1,number));
 %% Compute the inliers with distances smaller than the threshold
     inlierIdx = find(abs(distance)<=threshDist);
     inlierNum = length(inlierIdx);
 %% Update the number of inliers and fitting model if better model is found     
     if inlierNum>=round(inlierRatio*number) && inlierNum>bestInNum
         bestInNum = inlierNum;
         parameter1 = (sample(2,2)-sample(2,1))/(sample(1,2)-sample(1,1));
         parameter2 = sample(2,1)-parameter1*sample(1,1);
         bestParameter1=parameter1; bestParameter2=parameter2;
     end
 end
 
 %% Plot the best fitting line
 xAxis = -number/2:number/2; 
 yAxis = bestParameter1*xAxis + bestParameter2;
 plot(xAxis,yAxis,'r-','LineWidth',2);
end


%% Generate random data for test
data = 150*(2*rand(2,100)-1); data = data.*rand(2,100);
ransac_demo(data,2,100,10,0.1);

結果如下：

這裏，數據(data )集隨機產生的100組(x,y)數據集，模型(model)爲y=ax+b,需要辨識的參數即a和b，即n爲2，遍歷次數k爲100（數據量本身較少，可以全部遍歷），t爲10，即只有滿足到直線y=ax+b的距離小於10的點纔會被認爲是inliers，d指的是認定模型及參數爲好的inliers集的最小數量值，這裏採用比值表示，0.1表示100的十分之一即數據量達到10即可認爲已經滿足要求。

2.3 參數

可以看到，RANSAC算法除了選擇合適的數據和模型外，還需要選擇合適的4個參數n、k、t、d，其中n、t、d可根據經驗得到，那麼k可以根據如下公式計算:

其中， $p$ 表示爲RANSAC算法結果有用的概率， $w$ 爲數據在inliers集中的概率，那麼對於模型擬合一次需要的n個數據，其均在inliers集中的概率爲 $w^n$ （放回取樣概率），不在inliers集中的概率則爲 $1-w^n$ ，因此k次迭代的結果滿足：

從而可有得到k的計算公式。

事實上在MATLAB中也存已有的函數ransac，該函數設計的更加通用，感興趣的可以自己學習。

最後，本文的例子和原理主要參考自：《Random sample consensus》。

參考

維基百科：《Random sample consensus》。
Martin A. Fischler and Robert C. Bolles (June 1981). “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”. Comm. of the ACM 24: 381–395. doi:10.1145/358669.358692.
MATLAB官網關於RANSAC的介紹: https://www.mathworks.com/discovery/ransac.html

感謝閱讀

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

RANSAC算法——看完保證你理解

目錄

1 最小二乘算法的缺陷

2 RANSAC算法

2.1 原理

2.2 實例

2.3 參數

參考

感謝閱讀

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

nodejs學習06——小案例

評估統計算法在銀行僞造鈔票檢測中的價值

C# Xmlserializer 程序集內存泄露

MATLAB基於視覺實現車道線檢測

相機內參座標系及其在MATLAB 中的表示

旋轉座標系

MATLAB 最新版試用的一些感受

Linux 實用工具——Tree 命令，文件目錄列表

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結