M-Tree for Similarity Search

My current area of research is similarity search. Just like the normal search process, we need several data structures to make the similarity search effectively and efficiently, which should support the range query and KNN at least. In this essay, I would like to sum up my recent research in M-Tree, which is a kind of metric tree (only considering relative distances between objects).

Firstly, let us see the example, which is copied from the book 《Similarity Search-The Metric Space Approach》, written by Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal and Michal Batko.


Obviously, it is a two dimensional space and we can abstract these objects, ranging from O1 to O11 in the vector space. Additionally, to compute the relative distances, euclidean distance is often used. In this data structure, internal nodes including root node, in which each entry consists of radius representing its area and the distance between itself and its parent object (0 for root), while as for the leaf nodes, radii are always 0 instead.  

The features of M-Tree can be concluded:

1. Balanced.

2. All of its objects are listed in the leaf nodes.

3. Dynamic, meaning insertion is possible without reorganization the whole tree.

4. Most importantly, it bases on the secondary memory, able to process large data.

However, to further improve the performance of M-Tree, triangle inequality is also applied to diminish the computing as distance computing in high dimensional space is rather time-consuming. Fully employing the distances stored in the entries can contribute it totally.

Note that euclidean distance is not the only way to measure the distance. Only if the distance meets the requirement of non-negativity, symmetry as well as triangle inequality can we employ it as the distance in M-Tree.

Several useful materials are listed below:

1. http://www-db.deis.unibo.it/Mtree/ (below are most relative ones)

2. P. Ciaccia, M. Patella, F. Rabitti, and P. Zezula. Indexing metric spaces with M-tree. In Atti del Quinto Convegno Nazionale SEBD, Verona, Italy, June 1997.

3. P. Ciaccia, and M. Patella. Bulk loading the M-tree. In Proceedings of th 9th Australasian Database Conference (ADC'98), Perth, Australia, February 1998.

4. M. Patella. Similarity Search in Multimedia Databases. PhD thesis, Dipartimento di Elettronica Informatica e Sistemistica, Università degli Studi di Bologna, Bologna, Italy, February 1999.


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章