時空權衡——哈希法(Time/Space Tradeoff - Hashing)

哈希法簡介(Introduction)
Hashing is a standard way of implementing the abstract data type “dictionary”, which is a set with the operations of searching, insertion, deletion, initialisation and rehashing. The elements of this set can be anything, such as numbers, characters, character strings, etc.

The element above is called the key. Hashing is based on this idea of distributing keys among a one-dimensional array H[0..m-1] called a hash table. The method for the distribution is called hash function. The function assigns an integer between 1 to m, called hash address, to a key. e.g. h: key -> {1..m}

哈希函數(Hash Function)
If keys are integers and the size of a hash table is m, we can denote h(n) = n mod m. Actually, the hash function also apply to characters and strings.

Suppose a string consists of 26 letters. and the size of hash table m=101. We can code for each letter. As follow:

We can view a string as a long binary string and a decimal number:
L I N : 01011 01000 01101 = 11533
F(LIN) = 11533 mod 101 = 19
Hence, 19 is the position of string LIN in the hash table.

碰撞(Collision)
In some cases, different keys may be mapped to the same hash table address. When this occurs we have a collision.

Here is an example, we assume m = 23 and h(k) = k mod m.

Both 19 and 663 are stored in position 19 in the hash table. How to collisions? - Separate chaining, linear probing and double hashing can be used.

拉鍊法(Separate Chaining)
Each element of the hash table is a list of keys with the same hash value k.

We define load factor α = n/m, where n is the number of items stored in hash table and m is the size of hash table.
Number of probes(Comparisons) in successful search ≈ 1+ α/2.
Number of probes in unsuccessful search ≈ α.

線性探測法(Linear Probing)
If collision happens, try the next cell, then the next, and so on. For 19(19), 392(1), 179(18), 663(19), 639(18), 321(22), insert them into hash table:

Note that if we get to the end of the hash table, just wrap around. e.g. 20 -> cell 0.
Number of probes in successful search = 1/2 + 1/2(1-α).
Number of probes in unsuccessful search = 1/2 + 1/2(1-α)^2.

再哈希法(Double Hashing)
Double hashing uses a second hash function s(k) when collision occurs. For example, s(k) = 1 + k mod 23. If h(k) has an item, try h(k)+s(k), then try h(k)+2s(k), and so on.

Rabin-Karp String Search
The Rabin-Karp String Search algorithm is based on string hashing.

To search for a string p of length m in a larger string s, we can compute hash(p) and then check every substring si … si+m-1 to see if it has the same hash value. Of course, if it has, the strings may still be different. So we need to compare them in the usual way.

適用情形(When to use hashing)
All sorts of information retrieval applications involving thousands to millions of keys.

寫在最後的話(PS)
I just give some fundamental knowledges about hashing. You can find more at https://en.wikipedia.org/wiki/Hash_function Good luck.
Welcome questions always and forever.

UoM_XiaoShuaiShuai

發佈了37 篇原創文章 · 獲贊 2 · 訪問量 2萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

時空權衡——哈希法(Time/Space Tradeoff - Hashing)

時空權衡——哈希法(Time/Space Tradeoff - Hashing)

平衡樹——2-3樹(Binary Search Tree - 2-3 Tree)

貪心算法——普林姆算法(Greedy Algorithm-Prim's Algorithm)

排序算法——堆排序(Heap Sort)

時空權衡——斐波那契數列(Time/Space Tradeoff - Fibonacci Sequence)

動態規劃算法——沃夏爾算法(Dynamic Programming Algorithm - Warshall's Algorithm)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結