hashtable HashMap相關問題

1 Hash Function

(1) it always returns a number for an object

(2) two equal objects will always have the same number

(3) two unequal objects not always have different numbers


One way to implement in Java is hashCode(). The hashCode() method is implemented in the Oject class and therefore each class in Java inherits it. The hash code provides a numeric representation of an object.(somewhat similar to toString method that gives a text representation of an object)

hash function is used to convert a string(or any other type) into an integer smaller than hash size and bigger or equal to zero. The objective of designing a hash function is to "hash" the key as unreasonable as possible. A good hash function can avoid collision as less as possible. A widely used hash function algorithm is using a magic number 33, consider any string as a 33 based big integer like follow:

這裏給出一種hash function的方法!!!!!利用ascii值和magic number


hashcode("abcd") = (ascii(a) * 33^3 + ascii(b) * 33^2 + ascii(c) *33 + ascii(d)) % HASH_SIZE

= (97* 33^3 + 98 * 33^2 + 99 * 33 +100) % HASH_SIZE

= 3595978 % HASH_SIZE


here HASH_SIZE is the capacity of the hash table (you can assume a hash table is like an array with index 0 ~ HASH_SIZE-1).

Given a string as a key and the size of hash table, return the hash value of this key.


Example
For key="abcd" and size=100, return 78

Code中要注意的是overflow 所以要mod hash_size!!!

public int hashCode(char[] key,int HASH_SIZE) {
        int b = 33;
        int res = 0;
        for (int i = 0; i < key.length; ++i) {
            res = multiply(res, b, HASH_SIZE);
            res += key[i];
            res %= HASH_SIZE;
        }
        return res;
    }
    public int multiply(long x,long y,int m) {
        return (int) (x * y % m);
    }


2 Collision & Rehash Question from lintcode

Collision : different objects( by equals() method) may have same hashcode.

2 way to solve collision:

(1) Separate chaining sollision resolution: use linked list ---- hashtable is an array of list 

(2) Linear probing:  if can not insert at index k, try next slot k + 1.if occupied, go to k + 2.


Rehash question from lintcode

主要介紹了rehash 以及hashtable存儲方式。 Hashtable 實際上是一個單向鏈表linklistnode的數組!


The size of the hash table is not determinate at the very beginning. If the total size of keys is too large (e.g. size >= capacity / 10), we should double the size of the hash table and rehash every keys. Say you have a hash table looks like below:

size=3, capacity=4
[null, 21->9->null, 14->null, null]

The hash function is:

int hashcode(int key, int capacity) {
return key % capacity;
}

here we have three numbers, 9, 14 and 21, where 21 and 9 share the same position as they all have the same hashcode 1 (21 % 4 = 9 % 4 = 1). We store them in the hash table by linked list.rehashing this hash table, double the capacity, you will get:

size=3, capacity=8
index: 0 1 2 3 4 5 6 7
hash table: [null, 9, null, null, null, 21, 14, null]

Given the original hash table, return the new hash table after rehashing .
Note
For negative integer in hash table, the position can be calculated as follow:

In C++/Java, if you directly calculate -4 % 3 you will get -1. You can use function: a % b = (a % b + b) % b to make it is a non negative integer.

In Python, you can directly use -1 % 3, you will get 2 automatically.

Example
Given [null, 21->9->null, 14->null, null], return [null, 9->null, null, null, null, 21->null, 14->null, null]


思路很簡單,code略。


hashtable is constant time performance for add, remove, contains, size. 

collision : worst all collide into same index,  need to search for one of them == search in list (linear time)

How to guarantee expected constant time?

make sure lists not become too long. Usually implemented by Load Factor.

 Load factor keeps track of average length of lists, if approaches a set in advanced threshold, create a bigger array and rehash all elements from old table into a new one.


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章