RANSAC簡介
RANSAC(RAndom SAmple Consensus,隨機採樣一致)算法是從一組含有“外點”(outliers)的數據中正確估計數學模型參數的迭代算法。“外點”一般指的的數據中的噪聲,比如說匹配中的誤匹配和估計曲線中的離羣點。所以,RANSAC也是一種“外點”檢測算法。RANSAC算法是一種不確定算法,它只能在一種概率下產生結果,並且這個概率會隨着迭代次數的增加而加大(之後會解釋爲什麼這個算法是這樣的)。RANSAC算最早是由Fischler和Bolles在SRI上提出用來解決LDP(Location Determination Proble)問題的。
對於RANSAC算法來說一個基本的假設就是數據是由“內點”和“外點”組成的。“內點”就是組成模型參數的數據,“外點”就是不適合模型的數據。同時RANSAC假設:在給定一組含有少部分“內點”的數據,存在一個程序可以估計出符合“內點”的模型。
算法基本思想和流程
RANSAC是通過反覆選擇數據集去估計出模型,一直迭代到估計出認爲比較好的模型。
具體的實現步驟可以分爲以下幾步:
- 選擇出可以估計出模型的最小數據集;(對於直線擬合來說就是兩個點,對於計算Homography矩陣就是4個點)
- 使用這個數據集來計算出數據模型;
- 將所有數據帶入這個模型,計算出“內點”的數目;(累加在一定誤差範圍內的適合當前迭代推出模型的數據)
- 比較當前模型和之前推出的最好的模型的“內點“的數量,記錄最大“內點”數的模型參數和“內點”數;
- 重複1-4步,直到迭代結束或者當前模型已經足夠好了(“內點數目大於一定數量”)。
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 30 20:07:19 2018
@author: Yuki
"""
import random
import numpy as np
from matplotlib import pyplot as plt
# Magic Numbers
# Controls the inlier range
THRESHOLD = 0.1
# Finds random potential fit lines
def RANSAC(data):
n = len(data)
# Another magic number
NUM_TRIALS = n // 2
best_in_count = 0
for i in range(0, NUM_TRIALS):
r = random.sample(data, 2)
r = np.array(r)
# linear regression on two points will just give the line through both points
m, b = lin_reg(r)
# finds the line with the most inliers
in_count = 0
for j in data:
# if the distance between the line and point is less than or equal to THRESHOLD it is an inlier
if abs(j[1] - ((m * j[0]) + b)) <= THRESHOLD:
in_count = in_count + 1
# Tracks the best fit line so far
if in_count > best_in_count:
best_in_count = in_count
best_m = m
best_b = b
# record both inliers and outliers to make end graph pretty
in_line = []
out_line = []
for j in data:
if abs(j[1] - ((best_m * j[0]) + best_b)) <= THRESHOLD:
in_line.append(j)
else:
out_line.append(j)
# returns two lists, inliers and outliers
return in_line, out_line
# performs the linear regression as described on the assignment sheet
def lin_reg(data):
n = float(len(data))
x_sum = 0.0
y_sum = 0.0
# averages the x and y values
for i in data:
x_sum = x_sum + i[0]
y_sum = y_sum + i[1]
x_average = x_sum / n
y_average = y_sum / n
# initializes slope numerator and denominator
# note denominator should not be zero with data
m_numerator = 0.0
m_denominator = 0.0
# calculates the slope
for i in data:
m_numerator = m_numerator + ((i[0] - x_average)*(i[1] - y_average))
m_denominator = m_denominator + ((i[0] - x_average)*(i[0] - x_average))
m = m_numerator / m_denominator
# finds the intercept
b = y_average - (m * x_average)
# returns slope and intercept
return m, b
def plot_best_fit(data):
# Get our inlier and outlier points
in_line, out_line = RANSAC(data)
# find the best fit line for inliers
m, b = lin_reg(in_line)
# This was the hardest part
# Could not find a function that would make a non line segment so I just covered our domain
# Admittedly with potential error on giant domains
x_min = 100000.0
x_max = -100000.0
for i in data:
if i[0] > x_max:
x_max = i[0]
if i[0] < x_min:
x_min = i[0]
domain = [x_min, x_max]
line_points = [m * i + b for i in domain]
line_points_top= [m * i + 0.5 * b for i in domain]
line_points_bottom = [m * i + 1.2 * b for i in domain]
# Plot the inliers as blue dots
in_line = np.array(in_line)
x, y = in_line.T
plt.scatter(x, y)
# plot the outliers as red x's
# if statement for if outliers is empty, which it is for the easy case
if out_line != []:
out_line = np.array(out_line)
x, y = out_line.T
plt.scatter(x, y, s=30, c='r', marker='x')
# plot our best fit line
plt.plot(domain, line_points, '-')
plt.plot(domain, line_points_bottom, '-')
plt.plot(domain, line_points_top, '-')
plt.gca().invert_yaxis()
# show the plot
plt.title("Road-Line-Estimation")
plt.xlabel('1/X')
plt.ylabel('Laser')
plt.show()
# return slope and intercept for answers
return m, b
# ----------------------------------------------------------------------------------------------------
#測試栗子
'''
data = []
with open('noisy_data_medium.txt') as file:
# Creates 2D array to hold the data
for l in file:
data.append(l.split())
# removes comma from first entry
for i in data:
i[0] = float(i[0][:-1])
i[1] = float(i[1])
# function also returns slop and intercept should you want them
m, b = plot_best_fit(data)
print(m, b)
'''