ch8 - Hash哈希表

支持的操作：O(1)的插入，O（1）的查找，O（1）的刪除
java中hash table（線程安全，有加鎖機制）、hash map（線程不安全）、hash set（只有key，沒有value）的區別

目錄：

1.hash function - （常用hash函數、open hash解決衝突需要掌握）
1.2 hash-function (128 in lintcode)
1.3 strstr-ii 字符串查找 (594 in lintcode) （★★★★★）
2.Rehashing
2.2 rehashing (129 in lintcode) （★★★★★）
2.3 LRU-cache LRU緩存策略 (134 in lintcode) （★★★★★）
2.3 其他相關題目

1. Hash Function

1). 根據hash function，計算出key對應的下標，然後下標訪問數組的複雜度是O（1）.

2). key space << hash 數組的大小，最好差距在10倍以上

3). 一些著名的hash算法： MD5、SHA-1、SHA-2，用於加密，不是用在hash 表中的，這些複雜度太高了。

4). 常用的hash函數：

 a*31^2 + b * 31 +c* 31^0

a. 取模運算對加減乘除的次序沒有影響。爲了避免溢出，邊乘邊取模，不能直接用pow函數。

b. 31是經驗值，其他也可以，但是乘31效果比較好。選擇質數會更好，數字太大，會影響計算速度，數字太小，衝突太多。像Apache的底層庫中，用的是33.

5). 整數或者double型的數怎麼處理：將每個字節看作一個字符，如整數是4個字節，則看作是4個字符處理

6). 解決hash衝突：再好的hash函數也會存在衝突（collision）

**2種解決方案：**Open Hashing vs Closed Hashing（兩個鏈接有動畫）
https://www.cs.usfca.edu/~galles/visualization/ClosedHash.html
https://www.cs.usfca.edu/~galles/visualization/OpenHash.html

A. closed Hash - 佔坑法

刪除時：

B. open hash - 拉鍊法（較常用）- 實現方法：hash 函數+基本的鏈表操作

每個數組位置是一個鏈表的頭。

1.2 hash-function (128 in lintcode)

1）.題目

http://www.lintcode.com/zh-cn/problem/hash-function/
http://www.jiuzhang.com/solution/hash-function/

在數據結構中，哈希函數是用來將一個字符串（或任何其他類型）轉化爲小於哈希表大小且大於等於零的整數。一個好的哈希函數可以儘可能少地產生衝突。一種廣泛使用的哈希函數算法是使用數值33，假設任何字符串都是基於33的一個大整數，比如：

hashcode("abcd") = (ascii(a) * 333 + ascii(b) * 332 + ascii(c) *33 + ascii(d)) % HASH_SIZE 
                              = (97* 333 + 98 * 332 + 99 * 33 +100) % HASH_SIZE
                              = 3595978 % HASH_SIZE

其中HASH_SIZE表示哈希表的大小(可以假設一個哈希表就是一個索引0 ~ HASH_SIZE-1的數組)。
給出一個字符串作爲key和一個哈希表的大小，返回這個字符串的哈希值。

2).代碼

class Solution {
public:
    /*
     * @param key: A string you should hash
     * @param HASH_SIZE: An integer
     * @return: An integer
     */
    int hashCode(string &key, int HASH_SIZE) {
        // write your code here
        long res = 0;
        for(int i=0;i<key.size();++i){
            res = (res *33 % HASH_SIZE + key[i]) % HASH_SIZE;
        }
        return res;
    }
};

1.3 strstr-ii 字符串查找 (594 in lintcode) （★★★★★）

1）.題目

http://www.lintcode.com/zh-cn/problem/strstr-ii/
http://www.jiuzhang.com/solution/strstr-ii/

實現時間複雜度爲 O(n + m)的方法 strStr。
strStr 返回目標字串在源字串中第一次出現的第一個字符的位置. 目標字串的長度爲 m , 源字串的長度爲 n . 如果目標字串不在源字串中則返回 -1。

2).代碼

class Solution {
public:
    const int BASE = 1000000;

    /*
     * @param source: A source string
     * @param target: A target string
     * @return: An integer as index
     */
    int strStr2(const char* source, const char* target) {
        // write your code here
        if(source == NULL || target == NULL){
            return -1;
        }
        int m = strlen(target);
        int n = strlen(source);
        if(m==0){
            return 0;
        }

        //compute 31^m
        int power=1;
        for(int i=0;i<m;++i){
            power = power * 31 % BASE; 
        }

        //hashCode of target
        int targetCode = 0;
        for(int i=0;i<m;++i){
            targetCode = (targetCode * 31 % BASE + target[i]) % BASE; //直接賦值，不是相加
        }

        //hashCode of source
        int hashCode = 0;
        for(int i=0;i<n;++i){                                                                                                                                                 
            //abc+d
            hashCode = (hashCode * 31 % BASE + source[i]) % BASE;

            //abcd-a
            if(i>=m){
                hashCode -= source[i-m] * power % BASE;
                if(hashCode < 0){
                    hashCode += BASE;
                }
            }

            //判斷
            if(i>=m-1 && hashCode == targetCode){
                char tmp[m]; //爲什麼是m
                memcpy(tmp,&source[i-m+1], m);
                tmp[m] = '\0';
                if(strcmp(tmp,target)==0){
                    return i-m+1;
                }
            }
        }

        return -1;
    }
};

2. Rehashing

2.1.hash表的飽和度

飽和度 = 實際存儲元素個數 / 總共開闢的空間大小 = size / capacity

一般來說，超過1/10（經驗值）的時候，說明需要進行rehashing

不是原有的數組被填滿了纔是不夠，如有100個位置的數組，已經放了10個數，那麼就認爲已經滿了

2.2 Rehashing - 129 in lintcode

1). 題目

http://www.lintcode.com/problem/rehashing/
http://www.jiuzhang.com/solutions/rehashing/

哈希表容量的大小在一開始是不確定的。如果哈希表存儲的元素太多（如超過容量的十分之一），我們應該將哈希表容量擴大一倍，並將所有的哈希值重新安排。假設你有如下一哈希表：
size=3, capacity=4

[null, 21, 14, null]
        ↓    ↓
        9   null
        ↓
       null

哈希函數爲：

int hashcode(int key, int capacity) {
     return key % capacity;
 }

這裏有三個數字9，14，21，其中21和9共享同一個位置因爲它們有相同的哈希值1(21 % 4 = 9 % 4 = 1)。我們將它們存儲在同一個鏈表中。

重建哈希表，將容量擴大一倍，我們將會得到：
size=3, capacity=8

index:   0    1    2    3     4    5    6   7
hash : [null, 9, null, null, null, 21, 14, null]

給定一個哈希表，返回重哈希後的哈希表。

注意事項
哈希表中負整數的下標位置可以通過下列方式計算：

C++/Java：如果你直接計算-4 % 3，你會得到-1，你可以應用函數：a % b = (a % b + b) % b得到一個非負整數。
Python：你可以直接用-1 % 3，你可以自動得到2。

2) 代碼

/**
 * Definition of ListNode
 * class ListNode {
 * public:
 *     int val;
 *     ListNode *next;
 *     ListNode(int val) {
 *         this->val = val;
 *         this->next = NULL;
 *     }
 * }
 */
class Solution {
public:
    /**
     * @param hashTable: A list of The first node of linked list
     * @return: A list of The first node of linked list which have twice size
     */   
    vector<ListNode*> rehashing(vector<ListNode*> hashTable) {
        // write your code here
        if(hashTable.size()==0){
            return hashTable;
        }

        int cap = hashTable.size();
        int newCap = cap * 2;
        vector<ListNode*> resTable(newCap, NULL);
        cout<<resTable.size()<<endl;

        for(int i=0;i < hashTable.size();++i){
            ListNode* head = hashTable[i];
            while(head){
                cout<<i<<endl;
                int val = head->val;
                int newval = (val % newCap + newCap) % newCap;
                if(!resTable[newval]){
                    resTable[newval] = new ListNode(val);
                }
                else{
                    ListNode* cur = resTable[newval];
                    while(cur->next){
                        cur = cur->next;
                    }
                    cur->next = new ListNode(val);
                }
                head = head->next;
            }
        }
        return resTable;
    }
};

2.3 LRU-cache LRU緩存策略 - 134 in lintcode （★★★★★）

1)題目

http://www.lintcode.com/problem/lru-cache/
http://www.jiuzhang.com/solutions/lru-cache/
Example: [2 1 3 2 5 3 6 7]

爲最近最少使用（LRU）緩存策略設計一個數據結構，它應該支持以下操作：獲取數據（get）和寫入數據（set）。

獲取數據get(key)：如果緩存中存在key，則獲取其數據值（通常是正數），否則返回-1。
寫入數據set(key, value)：如果key還沒有在緩存中，則寫入其數據值。當緩存達到上限，它應該在寫入新數據之前刪除最近最少使用的數據用來騰出空閒位置。

2)思路 & 代碼：

刪除最近最少使用的，即當緩存達到上限之後，需要刪除使用時間離現在最遠的元素。

所以需要支持中間刪除、頭部刪除，以及尾部追加，適合用LinkedList來實現。

所以需要 hash表 + LInkedList。在java中該數據結構叫做LinkedHashMap。單雙向鏈表都可以實現，單向鏈表存上一個節點。

LinkedHashMap = DoublyLinkedList + HashMap

 HashMap<key, DoublyListNode> DoublyListNode {
          prev, next, key, value;
      }

Newest node append to tail.
Eldest node remove from head.

單向鏈表：
Singly List 是否可行？

可以，在 Hash 中存儲 Singly List 中的 prev node 即可
如 linked list = dummy->1->2->3->null 時
hash[1] = dummy, hash[2] = node1

class keyValueNode{
public:
    int key,val;
    keyValueNode* next; //存當前結點的前一個節點
    keyValueNode(int _key,int _val){
        key = _key;
        val = _val;
        next = NULL;
    }
    keyValueNode(){
        key = 0;
        val = 0;
        next = NULL;
    }
};

class LRUCache {
private:
    void moveToTail(keyValueNode * prev){
        if(prev->next == tail){
            return;
        }

        keyValueNode* node = prev->next;
        prev->next = node->next;
        if(node->next != NULL){
            hash[node->next->key] = prev;
        }
        tail->next = node;
        node->next = NULL;
        hash[node->key] = tail;
        tail = node;
    }
public:
    unordered_map<int, keyValueNode*> hash;
    keyValueNode* head,*tail;
    int capacity,size;
    /*
    * @param capacity: An integer
    */LRUCache(int capacity) {
        // do intialization if necessary
        this->head = new keyValueNode(0,0);
        this->tail = head;
        this->capacity = capacity;
        this->size = 0;
        hash.clear();
    }

    /*
     * @param key: An integer
     * @return: An integer
     */
    int get(int key) {
        // write your code here
        if(hash.find(key) == hash.end()){
            return -1;
        }

        moveToTail(hash[key]);
        return hash[key]->next->val;
    }

    /*
     * @param key: An integer
     * @param value: An integer
     * @return: nothing
     */
    void set(int key, int value) {
        // write your code here
        if(hash.find(key)!=hash.end()){
            hash[key]->next->val = value;
            moveToTail(hash[key]);
        }
        else{
            keyValueNode * node = new keyValueNode(key,value);
            tail->next = node;
            hash[key] = tail;
            tail = node;
            size++;
            if(size > capacity){
                hash.erase(head->next->key);
                head->next = head->next->next;
                if(head->next!=NULL){
                    hash[head->next->key] = head;
                }
                size--;
            }
        }
    }
};

ch8 - Hash哈希表

1. Hash Function

1). 根據hash function，計算出key對應的下標，然後下標訪問數組的複雜度是O（1）.

2). key space << hash 數組的大小，最好差距在10倍以上

3). 一些著名的hash算法： MD5、SHA-1、SHA-2，用於加密，不是用在hash 表中的，這些複雜度太高了。

4). 常用的hash函數：

5). 整數或者double型的數怎麼處理：將每個字節看作一個字符，如整數是4個字節，則看作是4個字符處理

6). 解決hash衝突：再好的hash函數也會存在衝突（collision）

1.2 hash-function (128 in lintcode)

1）.題目

2).代碼

1.3 strstr-ii 字符串查找 (594 in lintcode) （★★★★★）

1）.題目

2).代碼

2. Rehashing

2.1.hash表的飽和度

2.2 Rehashing - 129 in lintcode

1). 題目

2) 代碼

2.3 LRU-cache LRU緩存策略 - 134 in lintcode （★★★★★）

1)題目

2)思路 & 代碼：

2.4 相關題目

九章算法筆記 - 思路總結

海量數據處理 - 筆記

合併多個excel文件

leetcode-高頻題題解

ch8 - Data Structure 數據結構

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結