論文筆記(關於圖像檢索的總結性論文):Content-Based Image Retrieval and Feature Extraction: A Comprehensive Review(上)

放上引用:Latif, Afshan and Rasheed, Aqsa and Sajid, Umer and Jameel, Ahmed and Ali, Nouman and Ratyal, Naeem Iqbal and Zafar, Bushra and Dar, Saadat and Sajid, Muhammad and Khalil, Tehmina:Content-Based Image Retrieval and Feature Extraction: A Comprehensive Review,Mathematical Problems in Engineering,Mathematical Problems in Engineering

這是巴基斯坦的一個團隊的研究論文,因爲無意中看到其實還挺全面且詳細的。一切論文都不是完全正確且最新的,這裏就當和大家一起基於這篇論文重新整理一下關於Content-Based 圖像檢索和特徵抽取的種種。然後也是爲了跟着這篇文章的參考文獻思路,各取所需。以下內容僅代表個人觀點,有問題歡迎交流。

關於什麼叫 content-based,參考以下論文:

Gudivada, Venkat N., and Vijay V. Raghavan. "Content-based image retrieval systems." Computer 28.9 (1995): 18-22.

 

先看abstract我們可以知道作者寫這一篇文章的目的是:

We analyzed the main aspects of various image retrieval and image representation models from low-level feature extraction to recent semantic deep-learning approaches. )e important concepts and major research studies based on CBIR and image representation are discussed in detail, and future research directions are concluded to inspire further research in this area.

比起以前使用metadata以及圖像描述的檢索,近年CBIR的技術得到了發展,然後他們這篇論文呢,就是爲了總結從低level的特徵抽出圖像表現到近年的基於深度學習的圖像描述和檢索技術,基於圖像內容解析的研究,包括對未來這個領域走向的一些預想。

下面進入introduction:

作者說,現在很多檢索是基於圖像描述以及用戶query的關鍵詞匹配,比如以下幾篇論文:

[4] S. Yang, L. Li, S. Wang, W. Zhang, Q. Huang, and Q. Tian,“SkeletonNet: a hybrid network with a skeleton-embedding process for multi-view image representation learning,” IEEETransactions on Multimedia, vol. 1, no. 1, 2019.

[5] W. Zhao, L. Yan, and Y. Zhang, “Geometric-constrained multi-view image matching method based on semi-global optimization,” Geo-Spatial Information Science, vol. 21, no. 2,pp. 115–126, 2018.

[6] W. Zhou, H. Li, and Q. Tian, “Recent advance in contentbased image retrieval: a literature survey,” 2017, https://arxiv.

org/abs/1706.06064.

其實我個人角度(關於[4]:地址:https://ieeexplore.ieee.org/document/8695120 ,我覺得這篇作者更多是做了個multi-view subspace learning的無監督學習的方法吧。[5]的話,主要是在加入多視角遙感圖像的幾何特徵的圖像匹配方法。[6]的話,主要是一個同這篇論文性質差不多的content-based image retrieval的總結,但是總結了2003-2016年的技術,個人還是比較推薦的,地址:https://arxiv.org/pdf/1706.06064.pdf

然後作者介紹了CBIR的基礎概念和所使用的特徵,然後作者敘述了特徵選擇的背景意義:

According to theliterature, the selection of visual features for any system is dependent on the requirements of the end user.

具體的特徵選擇還要看用戶端的需求,爲了提高檢索效果可能很會消耗很高的計算成本:

[19] N. Ali, Image Retrieval Using Visual Image Features and Automatic Image Annotation, University of Engineering and

Technology, Taxila, Pakistan, 2016.

[20] B. Zafar, R. Ashraf, N. Ali et al., “Intelligent image classification-based on spatial weighted histograms of concentric

circles,” Computer Science and Information Systems, vol. 15, no. 3, pp. 615–633, 2018.

不正確的特徵選擇反而會影響系統的表現比如:

[12]L. Piras and G. Giacinto, “Information fusion in content based image retrieval: a comprehensive overview,” Information

Fusion, vol. 37, pp. 50–60, 2017.

然後作者也提了現在各種特徵可以廣泛被運用在機器學習和深度學習之中而收穫好的效果:

ML:

[1] D. Zhang, M. M. Islam, and G. Lu, “A review on automatic image annotation techniques,” Pattern Recognition, vol. 45, no. 1, pp. 346–362, 2012.

[2] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of contentbased image retrieval with high-level semantics,” Pattern Recognition, vol. 40, no. 1, pp. 262–282, 2007.

DL(作者也吐槽了句計算消耗比較大):

[21] G. Qi, H. Wang, M. Haner, C. Weng, S. Chen, and Z. Zhu,“Convolutional neural network based detection and judgement of environmental obstacle in vehicle operation,” CAAI Transactions on Intelligence Technology, vol. 4, no. 2,pp. 80–91, 2019.

[22] U. Markowska-Kaczmar and H. Kwa´snicka, “Deep learning––a new era in bridging the semantic gap,” in Bridging the Semantic Gap in Image and Video Analysis, pp. 123–159, Springer, Basel, Switzerland, 2018.

[23] F. Riaz, S. Jabbar, M. Sajid, M. Ahmad, K. Naseer, and N. Ali,“A collision avoidance scheme for autonomous vehicles inspired by human social norms,” Computers & Electrical Engineering, vol. 69, pp. 690–704, 2018.

所以作者表示,這篇文章的一大目標就是綜合總結分析一下各種各樣的特徵:底層特徵(幾何紋理色彩等)會怎樣影響檢索的效果?如何縮小圖像底層表現和高層語意表現的溝壑?圖像的空間佈局對圖像的檢索和表現有多麼重要?DL ,ML的導入會怎樣的提高CBIR的表現?

然後作者介紹了下文章結構:

=================================================================================

Section 2 顏色特徵

Section 3 紋理特徵

Section 4 形狀特徵

Section 5 空間特徵

Section 6 底層特徵融合

Section 7  局部特徵

Section 8 基於深度學習的檢索

Section 9 關於人臉識別的特徵抽出

Section 10 關於距離計算

Section 11 關於特徵抽出和CBIR的評價標準

Section 12 關於相關技術的未來

=================================================================================

考慮到閱讀的疲憊可能性,本筆記分上中下三部分構成,以上紅色的內容在(上)部分放置

(以下內容對2015之後的論文引用會放上鍊接)

Section 2 關於顏色特徵:

[24] H. Shao, Y. Wu, W. Cui, and J. Zhang, “Image retrieval based on MPEG-7 dominant color descriptor,” in Proceedings of the 9th International Conference for Young Computer Scientists ICYCS 2008, pp. 753–757, IEEE, Hunan, China,November 2008.

基於MPEG-7 descriptor,每個圖選8個主色,然後基於直方圖計算圖像類似

[25] X. Duanmu, “Image retrieval using color moment invariant,”in Proceedings of the 2010 Seventh International Conference on Information Technology: New Generations (ITNG),pp. 200–203, IEEE, Las Vegas, NV, USA, April 2010.

用了HAC聚類顏色特徵

[26] X.-Y. Wang, B.-B. Zhang, and H.-Y. Yang, “Content-basedimage retrieval by integrating color and texture features,”Multimedia Tools and Applications, vol. 68, no. 3, pp. 545–569, 2014.

用了紋理和顏色,然後距離計算和合並兩種特徵造成了難題(以及計算成本)

[27] H. Zhang, Z. Dong, and H. Shu, “Object recognition by acomplete set of pseudo-Zernike moment invariants,” in Proceedings of the 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 930–933, IEEE, Dallas, TX, USA, March 2010.

基於Zernike 和 pseudo-Zernike polynomials擬合的優化來解決縮放旋轉問題

 

作者提到,顏色特徵是一種很難被圖像基礎形變(旋轉,縮放,平移等)所影響的特徵,比如以下:

[28] J. M. Guo, H. Prasetyo, and J. H. Chen, “Content-based image retrieval using error diffusion block truncation coding features,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 3, pp. 466–481, 2015.

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6898854

使用error diffusion block truncation coding (EDBTC)抽出特徵,也就是抽出了顏色特徵和bitmap特徵後進行檢索

[29] Y. Liu, D. Zhang, and G. Lu, “Region-based image retrieval with high-level semantics using decision tree learning,”Pattern Recognition, vol. 41, no. 8, pp. 2554–2570, 2008.

(雖然老,這篇想稍微推薦一下)這篇論文使用了決策樹,

[30] M. M. Islam, D. Zhang, and G. Lu, “Automatic categorization of image regions using dominant color based vector quantization,” in Proceedings of the Digital Image Computing:Techniques and Applications, pp. 191–198, IEEE, Canberra,Australia, December 2008.

這篇是提出了一種基於顏色的量化方法

[31] Z. Jiexian, L. Xiupeng, and F. Yu, “Multiscale distance coherence vector algorithm for content-based image retrieval,”@e Scientific World Journal, vol. 2014, Article ID 615973,13 pages, 2014.(雖然個人覺得這篇主要是基於輪廓特徵,然後經過一系列演算實現抗旋轉等干擾)

然後作者總結,顏色特徵雖然不能很好的表現局域特徵,但是,相對很多區域特徵,確實減少了計算消耗然後文中給出以上方法的檢索效率:

 

在相同dataset上來看,【30】提出的顏色量化方法可以多關注關注。

接下來總結紋理特徵:

[32] G. Papakostas, D. Koulouriotis, and V. Tourassis, “Feature extraction based on wavelet moments and moment invariants in machine vision systems,” in Human-Centric Machine Vision, InTech, London, UK, 2012.

基於小波矩和不變矩的特徵抽出

[33] G.-H. Liu, Z.-Y. Li, L. Zhang, and Y. Xu, “Image retrieval based on micro-structure descriptor,” Pattern Recognition,vol. 44, no. 9, pp. 2123–2133, 2011.

這篇文章作者提出了一種micro-structures,把HSV色彩的特徵,和邊緣方向特徵(用的Sobel operator)拿來定義了新的特徵map

[34] X.-Y. Wang, Z.-F. Chen, and J.-J. Yun, “An effective method for color image retrieval based on texture,” Computer Standards & Interfaces, vol. 34, no. 1, pp. 31–35, 2012.

用 color co-occurrence matrix 抽出紋理特徵

[40] N.-E. Lasmar and Y. Berthoumieu, “Gaussian copula multivariate modeling for texture image retrieval using wavelet transforms,” IEEE Transactions on Image Processing, vol. 23,no. 5, pp. 2246–2261, 2014.

這篇如標題 wavelet transforms

然後作者總結,因爲紋理特徵代表的是一個像素羣,所以它比顏色特徵要更加的具有語意上的意義,但是呢紋理特徵有一點就是它對噪聲很敏感。以上的檢索效率如下圖:

接下來講一下形狀特徵:

[15] D. Zhang and G. Lu, “Review of shape representation and description techniques,” Pattern Recognition, vol. 37, no. 1,pp. 1–19, 2004.

這是一篇形狀特徵的總結論文 但是隻到04年

然後作者根據以下兩篇論文:

[14] D. Ping Tian, “A review on image feature extraction and representation techniques,” International Journal of Multimedia and Ubiquitous Engineering, vol. 8, no. 4, pp. 385–396, 2013.

[15] D. Zhang and G. Lu, “Review of shape representation and description techniques,” Pattern Recognition, vol. 37, no. 1,pp. 1–19, 2004.

總結出了這個表格:

[41] Z. Hong and Q. Jiang, “Hybrid content-based trademark retrieval using region and contour features,” in Proceedingsof the 22nd International Conference on Advanced Information Networking and Applications-Workshops AINAW2008, pp. 1163–1168, IEEE, Okinawa, Japan, March 2008.

這篇的話主要還是一種輪廓特徵的表達。

然後空間特徵:

一種常見方法就是:Bag of visual words(https://towardsdatascience.com/bag-of-visual-words-in-a-nutshell-9ceea97ce0fb)bag of words (BOW)是一種nlp的基於統計詞頻的方法,所以放在圖像就是一種特徵來表示一個詞,

大概感覺就是引用鏈接裏的這樣,然後用這些特徵來表現一張圖像:

[42] N. Ali, K. B. Bajwa, R. Sablatnig et al., “A novel image retrieval based on visual words integration of SIFTand SURF,”PLoS One, vol. 11, no. 6, Article ID e0157428, 2016.

用SIFT (抗旋轉)和 SUPF(抗光線)把圖像表現成直方圖。

然後還有一種就是Spatial Pyramid Matching(關於這個請看:http://slazebni.cs.illinois.edu/slides/ima_poster.pdf

相關的文獻:

[43] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proceedings of the 2006 IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition—Volume 2 (CVPR’06), pp. 2169–2178, IEEE, New York, NY, USA, June 2006.(這個就是剛剛那個鏈接對應的論文)

[44] Z. Mehmood, S. M. Anwar, N. Ali, H. A. Habib, and M. Rashid, “A novel image retrieval based on a combination of local and global histograms of visual words,” Mathematical Problems in Engineering, vol. 2016, Article ID 8217250, 12 pages, 2016.

用了SIFT 特徵,k-means聚類來做codebooks

[46] B. Zafar, R. Ashraf, N. Ali et al., “A novel discriminating and relative global spatial image representation with applications in CBIR,” Applied Sciences, vol. 8, no. 11, p. 2242, 2018.

用計算Bag of visual words間成對單詞的global geometric relationship來對付transformation invariance 

[47] N. Ali, B. Zafar, F. Riaz et al., “A hybrid geometric spatial image representation for scene classification,” PLoS One,

vol. 13, no. 9, Article ID e0203339, 2018.

emmmm,個人愚見,這篇就是把圖像分成圓形正方形三角形區域來抽取特徵抽codebooks

[48] B. Zafar, R. Ashraf, N. Ali, M. Ahmed, S. Jabbar, and S. A. Chatzichristofis, “Image classification by addition ofspatial information based on histograms of orthogonal vectors,” PLoS One, vol. 13, no. 6, Article ID e0198175, 2018.

這篇用正交做了些位置表現的處理

[51] H. Anwar, S. Zambanini, and M. Kampel, “A rotation-invariant bag of visual words model for symbols based ancient coin classification,” in Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP),pp. 5257–5261, IEEE, Paris, France, October 2014.

[52] H. Anwar, S. Zambanini, and M. Kampel, “Efficient scaleand rotation-invariant encoding of visual words for image classification,” IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1762–1765, 2015.

[53] R. Khan, C. Barat, D. Muselet, and C. Ducottet, “Spatial histograms of soft pairwise similar patches to improve the bag-of-visual-words model,” Computer Vision and Image Understanding, vol. 132, pp. 102–112, 2015.

[54] N. Ali, B. Zafar, M. K. Iqbal et al., “Modeling global geometric spatial information for rotation invariant classification of satellite images,”

以上幾個都是各種優化來防治旋轉縮放等問題

 

剩餘的將會在下一篇博客(中)裏更新,重點講解

Section 6 底層特徵融合

Section 7  局部特徵

Section 8 基於深度學習的檢索

這三塊的內容,如果有什麼錯誤歡迎指正留言~~

 

 

=======================================================

個人github:https://github.com/timcanby

 

 

 

 

 

 

 

 

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章