【翻譯:OpenCV-Python教程】SIFT(Scale-Invariant Feature Transform) 介紹

⚠️由於自己的拖延症,3.4.3翻到一半,OpenCV發佈了4.0.1了正式版,所以接下來是按照4.0.1翻譯的。

⚠️除了版本之外,其他還是照舊,Introduction to SIFT (Scale-Invariant Feature Transform),原文

目標

在本章,

  • 我們將會學習 SIFT 算法的概念
  • 我們將學着找出 SIFT 的關鍵點以及描述符

理論

在前兩章,我們看到了一些角點檢測的算法,比如哈里斯等等。這些算法都是旋轉-不變的,意思就是說,即使圖像有旋轉,我們也能找出相同的角點。這是明擺着的事,因爲在旋轉之後的圖像中,角點還是角點,但如果圖像放縮呢?一個角點有可能在放縮之後就不再是一個角點了。例如下面這個圖,在一張小圖片中的角被放大之後,從一個與之前圖片同樣大小的窗口中看上去就感覺是平的了。所以哈里斯角並不是放縮-不變的。

sift_scale_invariant.jpg

於是,在2004年,英屬哥倫比亞大學的D.Lowe,在他的論文 Distinctive Image Features from Scale-Invariant Keypoints (從比例不變的關鍵點獲取的獨特的圖像特徵)中想出了一個新的算法,Scale Invariant Feature Transform (SIFT)(譯者注:放縮不變的特徵轉換,縮寫是“篩選”),這個算法提取了一些關鍵點並且比較了他們的描述符。*(這篇論文非常簡單易懂,被認爲是學習 SIFT 可用的最佳教材。因此本篇的解釋就僅僅是該論文的一個小小摘要)*.

在 SIFT 算法中主要涉及的步驟有4個。我們會一個一個的來看它們。

1. 放縮空間極值檢測

From the image above, it is obvious that we can't use the same window to detect keypoints with different scale. It is OK with small corner. But to detect larger corners we need larger windows. For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with various σ values. LoG acts as a blob detector which detects blobs in various sizes due to change in σ. In short, σ acts as a scaling parameter. For eg, in the above image, gaussian kernel with low σ gives high value for small corner while gaussian kernel with high σ fits well for larger corner. So, we can find the local maxima across the scale and space which gives us a list of (x,y,σ) values which means there is a potential keypoint at (x,y) at σ scale.

But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of an image with two different σ, let it be σ and kσ. This process is done for different octaves of the image in Gaussian Pyramid. It is represented in below image:

sift_dog.jpg

image

Once this DoG are found, images are searched for local extrema over scale and space. For eg, one pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that keypoint is best represented in that scale. It is shown in below image:

sift_local_extrema.jpg

image

Regarding different parameters, the paper gives some empirical data which can be summarized as, number of octaves = 4, number of scale levels = 5, initial σ=1.6, k=2‾√ etc as optimal values.

2. 關鍵點定位

Once potential keypoints locations are found, they have to be refined to get more accurate results. They used Taylor series expansion of scale space to get more accurate location of extrema, and if the intensity at this extrema is less than a threshold value (0.03 as per the paper), it is rejected. This threshold is called contrastThreshold in OpenCV

DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the principal curvature. We know from Harris corner detector that for edges, one eigen value is larger than the other. So here they used a simple function,

If this ratio is greater than a threshold, called edgeThreshold in OpenCV, that keypoint is discarded. It is given as 10 in paper.

So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest points.

3. Orientation Assignment

Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A neighbourhood is taken around the keypoint location depending on the scale, and the gradient magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created (It is weighted by gradient magnitude and gaussian-weighted circular window with σ equal to 1.5 times the scale of keypoint). The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints with same location and scale, but different directions. It contribute to stability of matching.

4. 關鍵點描述

Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is divided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor. In addition to this, several measures are taken to achieve robustness against illumination changes, rotation etc.

5. 關鍵點匹配

Keypoints between two images are matched by identifying their nearest neighbours. But in some cases, the second closest-match may be very near to the first. It may happen due to noise or some other reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is greater than 0.8, they are rejected. It eliminates around 90% of false matches while discards only 5% correct matches, as per the paper.

So this is a summary of SIFT algorithm. For more details and understanding, reading the original paper is highly recommended. Remember one thing, this algorithm is patented. So this algorithm is included in the opencv contrib repo

OpenCV裏的SIFT

So now let's see SIFT functionalities available in OpenCV. Let's start with keypoint detection and draw them. First we have to construct a SIFT object. We can pass different parameters to it which are optional and they are well explained in docs.

import numpy as np

import cv2 as cv

img = cv.imread('home.jpg')

gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)

sift = cv.xfeatures2d.SIFT_create()

kp = sift.detect(gray,None)

img=cv.drawKeypoints(gray,kp,img)

cv.imwrite('sift_keypoints.jpg',img)

sift.detect() function finds the keypoint in the images. You can pass a mask if you want to search only a part of image. Each keypoint is a special structure which has many attributes like its (x,y) coordinates, size of the meaningful neighbourhood, angle which specifies its orientation, response that specifies strength of keypoints etc.

OpenCV also provides cv.drawKeyPoints() function which draws the small circles on the locations of keypoints. If you pass a flag, cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS to it, it will draw a circle with size of keypoint and it will even show its orientation. See below example.

img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)

cv.imwrite('sift_keypoints.jpg',img)

See the two results below:

sift_keypoints.jpg

image

Now to calculate the descriptor, OpenCV provides two methods.

  1. Since you already found keypoints, you can call sift.compute() which computes the descriptors from the keypoints we have found. Eg: kp,des = sift.compute(gray,kp)
  2. If you didn't find keypoints, directly find keypoints and descriptors in a single step with the function, sift.detectAndCompute().

We will see the second method:

sift = cv.xfeatures2d.SIFT_create()

kp, des = sift.detectAndCompute(gray,None)

Here kp will be a list of keypoints and des is a numpy array of shape Number_of_Keypoints×128.

So we got keypoints, descriptors etc. Now we want to see how to match keypoints in different images. That we will learn in coming chapters.

額外資源

練習


上篇:【翻譯:OpenCV-Python教程】史-托馬斯角點檢測&用於追蹤的好特徵

下篇:【翻譯:OpenCV-Python教程】圖像金字塔

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章