翻譯 | ORB: An efficient alternative to SIFT or SURF(ORB:對SIFT或SURF的一種有效選擇)



ORB: an efficient alternative to SIFT or SURF

譯者:Michael Beechan(陳兵) 重慶理工大學

Ethan Rublee Vincent Rabaud Kurt Konolige Gary Bradski

Willow Garage, Menlo Park, California

引用:Rublee E, Rabaud V, Konolige K, et al. ORB: An efficient alternative to SIFT or SURF[C]// International Conference on Computer Vision. IEEE Computer Society, 2011:2564-2571.

ORB-SLAM1/2 : https://github.com/MichaelBeechan/ORB_SLAM2





SIFT關鍵點檢測器和描述符[17],雖然十多年前,已被證明在使用視覺特徵的許多應用中取得了顯着成功,包括目標識別[17],圖像拼接[28],視覺映射[25]等。然而,它施加了大的計算負擔,特別是對於諸如視覺里程計的實時系統,或者對於諸如手機的低功率設備。 這驅使了用更低計算成本的密集搜索替換; 可以說,最好的是SURF[2]。還有研究旨在加速SIFT的計算,最顯着的是GPU設備[26]。



BRIEF來自使用二進制測試來訓練一組分類樹的研究[4]。一旦對一組500個典型的關鍵點進行訓練,樹可以用於返回任意關鍵點的簽名[5]。以類似的方式,我們尋找對方向最不敏感的測試。發現不相關測試的經典方法是主成分分析(PCA); 例如,已經表明,用於SIFT的PCA可以幫助消除大量的冗餘信息[12]。然而,二進制測試的可能性空間太大,無法執行PCA,而是使用詳盡的搜索。





3.1 FAST檢測器




3.2 強度重心方向(Orientation by Intensity Centroid)

我們方法使用簡單但有效的角點方向的測量,即強度重心[22]。強度質心假定角點的強度偏離其中心,並且該向量可以用於估計取向。 Rosin的定義爲:


我們可以從角點中心構造一個向量O,對於重心。The orientation of the patch then simply is:

其中atan2是arctan的quadrant-aware version。松香提到,考慮到角點是暗還是亮; 然而,爲了我們的目的,我們可以忽略這一點,因爲角度測量是一致的,而不管角點類型。

爲了改善這種措施的旋轉不變性,我們確保使用x和y計算半徑r的圓形區域內的力矩。 我們經驗地選擇r作爲補丁大小,以便x和y從[-r,r]運行。由於| C | 接近0,測量變得不穩定; 對於FAST角點,我們發現這是很少的情況。

我們將質心法與兩種基於梯度的測量BIN和MAX進行了比較。在這兩種情況下,在平滑圖像上計算X和Y梯度。 MAX選擇關鍵點補丁中最大的梯度; BIN以10度的間隔形成梯度方向的直方圖,並選擇最大值。 BIN類似於SIFT算法,儘管它只選擇一個方向。 模擬數據集中的方向(面內旋轉加上附加噪聲)的方差如圖2所示。兩個梯度測量都不是非常好的,而質心即使在較大的圖像噪聲下也能給出均勻的取向。


Figure 2. Rotation measure. The intensity centroid (IC) performs best on recovering the orientation of artificially rotated noisy patches, compared to a histogram (BIN) and MAX method.

4. rBRIEF: Rotation-Aware Brief 

在本節中,我們首先介紹一個引導的BRIEF描述符,顯示如何有效地計算它,並展示爲什麼在旋轉方面它實際上表現不佳。然後,我們引入一個學習步驟,找到較少關聯的二進制測試,推導出更好的描述符r BRIEF,爲此,我們提供了與SIFT和SURF的比較。

4.1 BRIEF算子的有效旋轉




[6]中考慮了許多不同類型的測試分佈。在這裏,我們使用性能最好的,高斯分佈圍繞補丁的中心。我們也選擇矢量長度n = 256。



我們想允許BRIEF對於在平面旋轉是不變的。BRIEF的匹配性能在平面內旋轉超過幾度時急劇下降(見圖7)。 Calonder [6]建議爲每個補丁的一組旋轉和視角扭曲計算一個BRIEF描述符,但是這個解決方案顯然是昂貴的。一個更有效的方法是根據關鍵點的方向來引導BRIEF。爲每個特徵集在位置做n個二進制測試,定義2 x n矩陣:



我們將角度離散到2π/ 30(12度)的增量,並構建預先計算的BRIEF模式的查找表。只要關鍵點方向θ在視圖中是一致的,則將使用正確的點集合來計算其描述符。

4.2 方差和相關性


高方差使特徵更具區別性,因爲它對輸入產生差異性的反應。另一個滿意的屬性是使測試不相關,因爲每個測試都將有助於結果。爲了分析BRIEF 向量中測試的相關性和方差,我們研究了BRIEF和引導BRIEF對100k個關鍵點的響應。結果如圖4所示。使用PCA對數據進行處理​,我們繪製最高的40個特徵值(之後兩個描述符收斂)。BRIEF和引導BRIEF展示了高初始特徵值,表明二進制測試之間的相關性——基本上所有的信息都包含在前10或15個組件中。然而,由引導BRIEF具有特別低的方差和較低的特徵值,因此不具有區別性。顯然,BRIEF取決於關鍵點的隨機取向以獲得良好的性能。另一個影響引導BRIEF的觀點顯示在內點和異常值之間的距離分佈(圖5)。注意到,對於引導BRIEF,異常值的平均值被推至左側,與內部變量有更多的重疊。


Figure 3. Distribution of means for feature vectors: BRIEF, steered BRIEF (Section 4.1), and r BRIEF (Section 4.3). The X axis is the distance to a mean of 0.5


Figure 4. Distribution of eigenvalues in the PCA decomposition over 100k keypoints of three feature vectors: BRIEF, steered BRIEF (Section 4.1), and r BRIEF (Section 4.3).


Figure 5. The dotted lines show the distances of a keypoint to outliers, while the solid lines denote the distances only between inlier matches for three feature vectors: BRIEF, steered BRIEF (Section 4.1), and r BRIEF (Section 4.3).

4.3 學習良好的二進制特徵


方法如下。我們首先在PASCAL 2006上描述圖像集[8],設定一個大約300k個關鍵點的訓練集。我們還列舉了31×31像素塊繪製的所有可能的二進制測試。每個測試是塊的一對5×5子窗口。如果我們注意到我們的塊的寬度爲wp = 31,測試子窗口的寬度爲wt = 5,那麼我們有N = (wp–wt)2個可能的子窗口。我們想從這些中選擇兩對,所以我們有二進制測試。我們消除重疊的測試,所以我們最終得到M = 205590可能的測試。算法是:





(b)從T進行下一次測試,並將其與R中的所有測試進行比較。如果其絕對相關性大於閾值,則將其丟棄; 否則添加到R.


這個算法是貪婪搜索一組不相關的測試,均值接近0.5。結果稱爲rBRIEF。rBRIEF在引導BRIEF中的方差和相關性有顯着改善(見圖4)。PCA的特徵值較高,並且快速下降。有趣的是看到算法產生的高方差二進制測試(圖6)。在未研究的測試(左圖)中存在非常顯着的垂直趨勢,其高度相關; 學習測試顯示出更好的多樣性和較低的相關性。

4.4 估計



Figure 6. A subset of the binary tests generated by considering high-variance under orientation (left) and by running the learning algorithm to reduce correlation (right). Note the distribution of the tests around the axis of the keypoint orientation, which is pointing up. The color coding shows the maximum pairwise correlation of each test, with black and purple being the lowest. The learned tests clearly have a better distribution and lower correlation.


Figure 7. Matching performance of SIFT, SURF, BRIEF with FAST, and ORB (o FAST +r BRIEF) under synthetic rotations with Gaussian noise of 10.




Figure 8. Matching behavior under noise for SIFT and r BRIEF. The noise levels are 0, 5, 10, 15, 20, and 25. SIFT performance degrades rapidly, while r BRIEF is relatively unaffected.


Figure 9. Real world data of a table full of magazines and an out-door scene. The images in the first column are matched to those in the second. The last column is the resulting warp of the first onto the second.








[1] M. Aly, P. Welinder, M. Munich, and P. Perona. Scaling object recognition: Benchmark of current state of the art techniques. In First IEEE Workshop on Emergent Issues in Large Amounts of Visual Data (WS-LAVD), IEEE International Conference on Computer Vision (ICCV), September

2009. 6

[2] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. In European Conference on Computer Vision,May 2006. 1, 2

[3] M. Brown, S. Winder, and R. Szeliski. Multi-image matching using multi-scale oriented patches. In Computer Vision and Pattern Recognition, pages 510–517, 2005. 2

[4] M. Calonder, V. Lepetit, and P. Fua. Keypoint signatures for fast learning and recognition. In European Conference on Computer Vision, 2008. 2

[5] M. Calonder, V. Lepetit, K. Konolige, P. Mihelich, and P. Fua. High-speed keypoint description and matching using dense signatures. In Under review, 2009. 2

[6] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: Binary robust independent elementary features. In In European Conference on Computer Vision, 2010. 1, 2, 3, 5

[7] O. Chum and J. Matas. Matching with PROSAC - progressive sample consensus. In C. Schmid, S. Soatto, and C. Tomasi, editors, Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 220–226, Los Alamitos, USA, June 2005. IEEE Computer Society. 7

[8] M. Everingham. The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results. http://pascallin.ecs.soton.ac.uk/challenges/VOC/databases.html.4

[9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2009 (VOC2009) Results. http://www.pascalnetwork.org/challenges/VOC/voc2009/workshop/index.html. 6, 7

[10] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK, pages 518–529. Morgan Kaufmann, 1999. 6

[11] C. Harris and M. Stephens. A combined corner and edge detector. In Alvey Vision Conference, pages 147–151, 1988.2

[12] Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In Computer Vision and Pattern Recognition, pages 506–513, 2004. 2

[13] G. Klein and D. Murray. Parallel tracking and mapping for small AR workspaces. In Proc. Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR’07), Nara, Japan, November 2007. 1

[14] G. Klein and D. Murray. Improving the agility of keyframe-based SLAM. In European Conference on Computer Vision,2008. 2

[15] G. Klein and D. Murray. Parallel tracking and mapping on a camera phone. In Proc. Eigth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR’09), Orlando, October 2009. 7

[16] V. Lepetit, F. Moreno-Noguer, and P. Fua. EPn P: An accurate O(n) solution to the pnp problem. Int. J. Comput. Vision, 81:155–166, February 2009. 7

[17] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. 1, 2

[18] Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, pages 950–961. VLDB Endowment, 2007. 6

[19] M. Martinez, A. Collet, and S. S. Srinivasa. MOPED: A Scalable and low Latency Object Recognition and Pose Estimation System. In IEEE International Conference on Robotics and Automation. IEEE, 2010. 7

[20] M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP, 2009.6

[21] D. Nist´er and H. Stew´enius. Scalable recognition with a vocabulary tree. In CVPR, 2006. 2, 6

[22] P. L. Rosin. Measuring corner properties. Computer Vision and Image Understanding, 73(2):291 – 307, 1999. 2

[23] E. Rosten and T. Drummond. Machine learning for high-speed corner detection. In European Conference on Computer Vision, volume 1, 2006. 1

[24] E. Rosten, R. Porter, and T. Drummond. Faster and better: A machine learning approach to corner detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 32:105–119, 2010. 1

[25] S. Se, D. Lowe, and J. Little. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. International Journal of Robotic Research, 21:735–758, August 2002. 1

[26] S. N. Sinha, J. michael Frahm, M. Pollefeys, and Y. Genc. Gpu-based video feature tracking and matching. Technical report, In Workshop on Edge Computing Using New Commodity Architectures, 2006. 1

[27] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. International Conference on Computer Vision, page 1470, 2003. 2, 6

[28] N. Snavely, S. M. Seitz, and R. Szeliski. Skeletal sets for efficient structure from motion. In Proc. Computer Vision and Pattern Recognition, 2008. 1

[29] G. Wang, Y. Zhang, and L. Fei-Fei. Using dependent regions for object categorization in a generative framework, 2006. 6

[30] A. Weimert, X. Tan, and X. Yang. Natural feature detection on mobile phones with 3D FAST. Int. J. of Virtual Reality, 9:29–34, 2010. 7

還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.