SIFT論文整理

Distinctive Image Featuresfrom Scale-Invariant Keypoints

本文主要是對Lowe SIFT論文的提煉,標註自己閱讀論文時需要重點理解的知識點,以備日後回顧時,無需從頭看論文。(僅供他人蔘考)

1. Introduction

  • Scale-space extrema detection:
  • Keypoint localization
  • Orientation assignment
  • Keypoint descriptor

…..

3. Detection of scale-space extrema

Detecting locations that areinvariant to scale change of the image can be accomplished by searching for stable featuresacross all possible scales, using a continuous function of scale known as scale space (Witkin,1983).

  • 構建尺度空間
    figure1
  • LoG近似DoG找到關鍵點<檢測DOG尺度空間極值點> Figure 2

3.1 Local extrema detection

In order to detect the local maxima and minima of D(x, y, σ), each sample point is comparedto its eight neighbors in the current image and nine neighbors in the scale above and below(see Figure 2). It is selected only if it is larger than all of these neighbors or smaller than allof them. The cost of this check is reasonably low due to the fact that most sample points willbe eliminated following the first few checks.

Figure 3

3.2 Frequency of sampling in scale

​ To summarize, these experiments show that the scale-space difference-of-Gaussian func-tion has a large number of extrema and that it would be very expensive to detect them all.Fortunately, we can detect the most stable and useful subset even with a coarse sampling of scales.

3.3 Frequency of sampling in the spatial domain

Figure 4

Just as we determined the frequency of sampling per octave of scale space, so we must de-termine the frequency of sampling in the image domain relative to the scale of smoothing.Given that extrema can be arbitrarily close together, there will be a similar trade-off betweensampling frequency and rate of detection. Figure 4 shows an experimental determination ofthe amount of prior smoothing, σ, that is applied to each image level before building thescale space representation for an octave.

Of course, if we pre-smooth the image before extrema detection, we are effectively dis-carding the highest spatial frequencies. Therefore, to make full use of the input, the imagecan be expanded to create more sample points than were present in the original. We double the size of the input image using linear interpolation prior to building the first level ofthe pyramid.

4. Accurate keypoint localization

Once a keypoint candidate has been found by comparing a pixel to its neighbors, the nextstep is to perform a detailed fit to the nearby data for location, scale, and ratio of principalcurvatures. This information allows points to be rejected that have low contrast (and aretherefore sensitive to noise) or are poorly localized along an edge.

4.1 Eliminating edge responses

For stability, it is not sufficient to reject keypoints with low contrast. The difference-of-Gaussian function will have a strong response along edges, even if the location along theedge is poorly determined and therefore unstable to small amounts of noise.

5. Orientation assignment

By assigning a consistent orientation to each keypoint based on local image properties, the keypoint descriptor can be represented relative to this orientation and therefore achieve in-variance to image rotation. This approach contrasts with the orientation invariant descriptorsof Schmid and Mohr (1997), in which each image property is based on a rotationally invariant measure. The disadvantage of that approach is that it limits the descriptors that can be usedand discards image information by not requiring all measures to be based on a consistentrotation.

​ Peaks in the orientation histogram correspond to dominant directions of local gradients.The highest peak in the histogram is detected, and then any other local peak that is within 80% of the highest peak is used to also create a keypoint with that orientation. Therefore, forlocations with multiple peaks of similar magnitude, there will be multiple keypoints created atthe same location and scale but different orientations. Only about 15% of points are assignedmultiple orientations, but these contribute significantly to the stability of matching. Finally, aparabola is fit to the 3 histogram values closest to each peak to interpolate the peak positionfor better accuracy.

Figure 10

Figure 12

6. The local image descriptor - 給特徵點賦值一個128維方向參數

Figure 13

Figure 15

6.1 Descriptor representation

Figure 7

7. Application to object recognition

7.1 Keypoint matching

Figure 11

7.2 Efficient nearest neighbor indexing

No algorithms are known that can identify the exact nearest neighbors of points in high di-mensional spaces that are any more efficient than exhaustive search. Our keypoint descriptorhas a 128-dimensional feature vector, and the best algorithms, such as the k-d tree (Friedmanet al., 1977) provide no speedup over exhaustive search for more than about 10 dimensionalspaces. Therefore, we have used an approximate algorithm, called the Best-Bin-First (BBF) algorithm (Beis and Lowe, 1997).

7.3 Clustering with the Hough transform

To maximize the performance of object recognition for small or highly occluded objects, wewish to identify objects with the fewest possible number of feature matches. We have foundthat reliable recognition is possible with as few as 3 features .

References

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章