RANSAC通俗講解

轉載自 https://www.cnblogs.com/xingshansi/p/6763668.html

作者：桂。

時間：2017-04-25 21:05:07

鏈接：http://www.cnblogs.com/xingshansi/p/6763668.html

前言

仍然是昨天的問題，別人問到最小二乘、霍夫變換、RANSAC在直線擬合上的區別。昨天梳理了霍夫變換，今天打算抽空梳理一下RANSAC算法，主要包括：
　　1）RANSAC理論介紹
　　2）RANSAC應用簡介；
內容爲自己的學習記錄，其中很多地方借鑑了別人，最後一起給出鏈接。

一、RANSAC理論介紹

普通最小二乘是保守派：在現有數據下，如何實現最優。是從一個整體誤差最小的角度去考慮，儘量誰也不得罪。

RANSAC是改革派：首先假設數據具有某種特性（目的），爲了達到目的，適當割捨一些現有的數據。

給出最小二乘擬合（紅線）、RANSAC（綠線）對於一階直線、二階曲線的擬合對比：

可以看到RANSAC可以很好的擬合。RANSAC可以理解爲一種採樣的方式，所以對於多項式擬合、混合高斯模型（GMM）等理論上都是適用的。

RANSAC的算法大致可以表述爲（來自wikipedia）：

Given:
    data – a set of observed data points
    model – a model that can be fitted to data points
    n – the minimum number of data values required to fit the model
    k – the maximum number of iterations allowed in the algorithm
    t – a threshold value for determining when a data point fits a model
    d – the number of close data values required to assert that a model fits well to data

Return:
    bestfit – model parameters which best fit the data (or nul if no good model is found)

iterations = 0
bestfit = nul
besterr = something really large
while iterations < k {
    maybeinliers = n randomly selected values from data
    maybemodel = model parameters fitted to maybeinliers
    alsoinliers = empty set
    for every point in data not in maybeinliers {
        if point fits maybemodel with an error smaller than t
             add point to alsoinliers
    }
    if the number of elements in alsoinliers is > d {
        % this implies that we may have found a good model
        % now test how good it is
        bettermodel = model parameters fitted to all points in maybeinliers and alsoinliers
        thiserr = a measure of how well model fits these points
        if thiserr < besterr {
            bestfit = bettermodel
            besterr = thiserr
        }
    }
    increment iterations
}
return bestfit

RANSAC簡化版的思路就是：

第一步：假定模型（如直線方程），並隨機抽取Nums個（以2個爲例）樣本點，對模型進行擬合：

第二步：由於不是嚴格線性，數據點都有一定波動，假設容差範圍爲：sigma，找出距離擬合曲線容差範圍內的點，並統計點的個數：

第三步：重新隨機選取Nums個點，重複第一步~第二步的操作，直到結束迭代：

第四步：每一次擬合後，容差範圍內都有對應的數據點數，找出數據點個數最多的情況，就是最終的擬合結果：

至此：完成了RANSAC的簡化版求解。

這個RANSAC的簡化版，只是給定迭代次數，迭代結束找出最優。如果樣本個數非常多的情況下，難不成一直迭代下去？其實RANSAC忽略了幾個問題：

每一次隨機樣本數Nums的選取：如二次曲線最少需要3個點確定，一般來說，Nums少一些易得出較優結果；
抽樣迭代次數Iter的選取：即重複多少次抽取，就認爲是符合要求從而停止運算？太多計算量大，太少性能可能不夠理想；
容差Sigma的選取：sigma取大取小，對最終結果影響較大；
這些參數細節信息參考：維基百科。

RANSAC的作用有點類似：將數據一切兩段，一部分是自己人，一部分是敵人，自己人留下商量事，敵人趕出去。RANSAC開的是家庭會議，不像最小二乘總是開全體會議。

附上最開始一階直線、二階曲線擬合的code(只是爲了說明最基本的思路，用的是RANSAC的簡化版):

一階直線擬合：

clc;clear all;close all;
 set(0,'defaultfigurecolor','w');
%Generate data
param = [3 2];
npa = length(param);
x = -20:20;
y = param*[x; ones(1,length(x))]+3*randn(1,length(x));
data = [x randi(20,1,30);...
    y randi(20,1,30)];
%figure
figure
subplot 221
plot(data(1,:),data(2,:),'k*');hold on;
%Ordinary least square mean
p = polyfit(data(1,:),data(2,:),npa-1);
flms = polyval(p,x);
plot(x,flms,'r','linewidth',2);hold on;
title('最小二乘擬合');
%Ransac
Iter = 100;
sigma = 1;
Nums = 2;%number select
res = zeros(Iter,npa+1);
for i = 1:Iter
idx = randperm(size(data,2),Nums);
if diff(idx) ==0
    continue;
end
sample = data(:,idx);
pest = polyfit(sample(1,:),sample(2,:),npa-1);%parameter estimate
res(i,1:npa) = pest;
res(i,npa+1) = numel(find(abs(polyval(pest,data(1,:))-data(2,:))<sigma));
end
[~,pos] = max(res(:,npa+1));
pest = res(pos,1:npa);
fransac = polyval(pest,x);
%figure
subplot 222
plot(data(1,:),data(2,:),'k*');hold on;
plot(x,flms,'r','linewidth',2);hold on;
plot(x,fransac,'g','linewidth',2);hold on;
title('RANSAC');