時空權衡——哈希法(Time/Space Tradeoff - Hashing)

時空權衡——哈希法(Time/Space Tradeoff - Hashing)


哈希法簡介(Introduction)
Hashing is a standard way of implementing the abstract data type “dictionary”, which is a set with the operations of searching, insertion, deletion, initialisation and rehashing. The elements of this set can be anything, such as numbers, characters, character strings, etc.

The element above is called the key. Hashing is based on this idea of distributing keys among a one-dimensional array H[0..m-1] called a hash table. The method for the distribution is called hash function. The function assigns an integer between 1 to m, called hash address, to a key. e.g. h: key -> {1..m}

哈希函數(Hash Function)
If keys are integers and the size of a hash table is m, we can denote h(n) = n mod m. Actually, the hash function also apply to characters and strings.

Suppose a string consists of 26 letters. and the size of hash table m=101. We can code for each letter. As follow:
這裏寫圖片描述
We can view a string as a long binary string and a decimal number:
L I N : 01011 01000 01101 = 11533
F(LIN) = 11533 mod 101 = 19
Hence, 19 is the position of string LIN in the hash table.
這裏寫圖片描述

碰撞(Collision)
In some cases, different keys may be mapped to the same hash table address. When this occurs we have a collision.

Here is an example, we assume m = 23 and h(k) = k mod m.
這裏寫圖片描述
Both 19 and 663 are stored in position 19 in the hash table. How to collisions? - Separate chaining, linear probing and double hashing can be used.

拉鍊法(Separate Chaining)
Each element of the hash table is a list of keys with the same hash value k.
這裏寫圖片描述
We define load factor α = n/m, where n is the number of items stored in hash table and m is the size of hash table.
Number of probes(Comparisons) in successful search ≈ 1+ α/2.
Number of probes in unsuccessful search ≈ α.

線性探測法(Linear Probing)
If collision happens, try the next cell, then the next, and so on. For 19(19), 392(1), 179(18), 663(19), 639(18), 321(22), insert them into hash table:
這裏寫圖片描述
Note that if we get to the end of the hash table, just wrap around. e.g. 20 -> cell 0.
Number of probes in successful search = 1/2 + 1/2(1-α).
Number of probes in unsuccessful search = 1/2 + 1/2(1-α)^2.

再哈希法(Double Hashing)
Double hashing uses a second hash function s(k) when collision occurs. For example, s(k) = 1 + k mod 23. If h(k) has an item, try h(k)+s(k), then try h(k)+2s(k), and so on.

Rabin-Karp String Search
The Rabin-Karp String Search algorithm is based on string hashing.

To search for a string p of length m in a larger string s, we can compute hash(p) and then check every substring si … si+m-1 to see if it has the same hash value. Of course, if it has, the strings may still be different. So we need to compare them in the usual way.

適用情形(When to use hashing)
All sorts of information retrieval applications involving thousands to millions of keys.

寫在最後的話(PS)
I just give some fundamental knowledges about hashing. You can find more at https://en.wikipedia.org/wiki/Hash_function Good luck.
Welcome questions always and forever.

發佈了37 篇原創文章 · 獲贊 2 · 訪問量 2萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章