數據結構基礎(18) --哈希表的設計與實現

哈希表

根據設定的哈希函數 H(key)和所選中的處理衝突的方法，將一組關鍵字映射到一個有限的、地址連續的地址集 (區間) 上，並以關鍵字在地址集中的“映像”作爲相應記錄在表中的存儲位置，如此構造所得的查找表稱之爲“哈希表”。

構造哈希函數的方法

1. 直接定址法(數組)

哈希函數爲關鍵字的線性函數H(key) = key 或者 H(key) = a*key + b

此法僅適合於：地址集合的大小 == 關鍵字集合的大小

2. 數字分析法

假設關鍵字集合中的每個關鍵字都是由 s 位數字組成 (u1, u2, …, us)，分析關鍵字集中的全體，並從中提取分佈均勻的若干位或它們的組合作爲地址。

此方法僅適合於:能預先估計出全體關鍵字的每一位上各種數字出現的頻度。

3. 平方取中法

以關鍵字的平方值的中間幾位作爲存儲地址。求“關鍵字的平方值”的目的是“擴大差別” ，同時平方值的中間各位又能受到整個關鍵字中各位的影響。

此方法適合於:關鍵字中的每一位都有某些數字重複出現頻度很高的現象。

4. 摺疊法

將關鍵字分割成若干部分，然後取它們的疊加和爲哈希地址。有兩種疊加處理的方法：移位疊加和間界疊加。

此方法適合於:關鍵字的數字位數特別多;

5. 除留餘數法

設定哈希函數爲:{H(key) = key % p | 其中,p≤m(表長)並且p 應爲不大於 m 的素數或是不含 20 以下的質因子}

爲什麼要對 p 加限制？

例如:給定一組關鍵字爲：12, 39, 18, 24, 33,21，若取 p=9, 則他們對應的哈希函數值將爲：3, 3, 0, 6, 6, 3;

可見，若 p 中含質因子 3，則所有含質因子 3 的關鍵字均映射到“3 的倍數”的地址上，從而增加了“衝突”的可能。

6. 隨機數法

設定哈希函數爲:H(key) = Random(key)其中，Random 爲僞隨機函數;

通常，此方法用於對長度不等的關鍵字構造哈希函數。

(如果關鍵字並不是數字, 則還需先對其進行數字化處理。)

實際造表時，採用何種構造哈希函數的方法取決於建表的關鍵字集合的情況(包括關鍵字的範圍和形態)，總的原則是使產生衝突的可能性降到儘可能地小(下面我們將以除留餘數法構造哈希函數)。

處理衝突的方法

“處理衝突” 的實際含義是：爲產生衝突的地址尋找下一個哈希地址。

1. 開放定址法

爲產生衝突的地址 H(key) 求得一個地址序列：{ H0, H1, …, Hs|1≤ s≤m-1}

其中: H0 = H(key)

Hi = ( H(key) + di ) % m {i=1, 2, …, s}

對增量 di 有三種取法：

1) 線性探測再散列
di = c * i 最簡單的情況 c=1

2) 平方探測再散列
di = 1^2, -1^2, 2^2, -2^2, …,

3) 隨機探測再散列
di 是一組僞隨機數列或者di=i×H2(key) (又稱雙散列函數探測)

注意：增量 di 應具有“完備性”,即:產生的 Hi 均不相同，且所產生的s(m-1)個 Hi 值能覆蓋哈希表中所有地址。則要求：

※ 平方探測時的表長 m 必爲形如 4j+3 的素數（如: 7, 11, 19, 23, … 等）；

※ 隨機探測時的 m 和 di 沒有公因子。

2. 鏈地址法(又稱拉鍊法)

將所有哈希地址相同的記錄都鏈接在同一鏈表中(我們將採用的方法)。

哈希表的設計與實現

[cpp]view
plain copy

//哈希表設計  

template <typename HashedObj>  

class HashTable  

{  

public:  

    typedef typename vector<HashedObj>::size_type size_type;  

public:  

    explicit HashTable(int tableSize = 101)  

        : theList(tableSize), currentSize(0) {}  

    ~HashTable()  

    {  

        makeEmpty();  

    }  

    //判斷元素x是否存在於哈希表中  

    bool contains(const HashedObj &x) const;  

    void makeEmpty();  

    bool insert(const HashedObj &x);  

    bool remove(const HashedObj &x);  

private:  

    vector< list<HashedObj> > theList;  

    size_type currentSize;  

    void rehash();  

    int myHash(const HashedObj &x) const;  

};

哈希函數

[cpp]view
plain copy
 
//如果關鍵字並不是數字, 則需先對其進行數字化處理  

template <typename Type>  

int hash(Type key)  

{  

    return key;  

}  

template<>  

int hash<const string &>(const string &key)  

{  

    int hashVal = 0;  

    for (size_t i = 0; i < key.length(); ++i)  

    {  

        hashVal = 37 * hashVal * key[i];  

    }  

    return hashVal;  

}  

//哈希函數  

template <typename HashedObj>  

int HashTable<HashedObj>::myHash(const HashedObj &x) const  

{  

    //首先對key進行數字化處理  

    int hashVal = hash(x);  

    //計算哈希下標  

    hashVal = hashVal % theList.size();  

    if (hashVal < 0)  

        hashVal += theList.size();  

    return hashVal;  

}

哈希表的插入

[cpp]view
plain copy
 
//插入  

template <typename HashedObj>  

bool HashTable<HashedObj>::insert(const HashedObj &x)  

{  

    //首先找到應該插入的桶(鏈表)  

    list<HashedObj> &whichList = theList[ myHash(x) ];  

    //哈希表中已經存在該值了  

    if (find(whichList.begin(), whichList.end(), x) != whichList.end())  

        return false;  

    //插入桶中  

    whichList.push_back(x);  

    //如果此時哈希表已經"滿"了(所存儲的元素個數 = 哈希表的槽數)  

    //裝載因子 == 1, 爲了獲取更好的性能, 再哈希  

    if (++ currentSize > theList.size())  

        rehash();  

    return true;  

}

再哈希

[cpp]view
plain copy
 
//判斷是否是素數  

bool is_prime(size_t n)  

{  

    if (n == 1 || !n)  

        return 0;  

    for (size_t i = 2; i*i <= n; i++)  

        if (!(n%i))  

            return 0;  

    return 1;  

}  

//尋找下一個素數  

size_t nextPrime(size_t n)  

{  

    for (size_t i = n; ; ++i)  

    {  

        if (is_prime(i))  

            return i;  

    }  

    return -1;  

}

[cpp]view
plain copy
 
//再哈希  

template <typename HashedObj>  

void HashTable<HashedObj>::rehash()  

{  

    vector< list<HashedObj> > oldList = theList;  

    //以一個大於原表兩倍的第一個素數重新設定哈希桶數  

    theList.resize( nextPrime(2*theList.size()) );  

    //將原表清空  

    for (typename vector< list<HashedObj> >::iterator iter = theList.begin();  

            iter != theList.end();  

            ++ iter)  

        iter -> clear();  

    //將原表的數據插入到新表中  

    for (size_type i = 0; i < oldList.size(); ++i)  

    {  

        typename list<HashedObj>::iterator iter = oldList[i].begin();  

        while (iter != oldList[i].end())  

        {  

            insert(*iter ++);  

        }  

    }  

}

哈希表的查找

查找過程和造表過程一致。假設採用開放定址處理衝突，則查找過程爲：對於給定值 K，計算哈希地址 i = H(K),若 r[i] = NULL 則查找不成功,若 r[i].key = K 則查找成功否則 “求下一地址 Hi” ，直至 r[Hi] = NULL (查找不成功)或r[Hi].key = K (查找成功) 爲止。

而我們採用比較簡單的鏈地址法(也稱拉鍊法的查找實現):

[cpp]view
plain copy
 
//查找:判斷哈希表中是否存在該元素  

template <typename HashedObj>  

bool HashTable<HashedObj>::contains(const HashedObj &x) const  

{  

    const list<HashedObj> &whichList = theList[ myHash(x) ];  

    if (find(whichList.begin(), whichList.end(), x) != whichList.end())  

        return true;  

    return false;  

}

哈希表查找的分析:

從查找過程得知,哈希表查找的平均查找長度實際上並不等於零。決定哈希表查找的ASL的因素

1)選用的哈希函數；

2)選用的處理衝突的方法；

3)哈希表飽和的程度，裝載因子 α=n/m 值的大小(n:記錄數，m:表的長度)

一般情況下，可以認爲選用的哈希函數是“均勻”的，則在討論ASL時，可以不考慮它的因素。

因此，哈希表的ASL是處理衝突方法和裝載因子的函數。可以證明,查找成功時有下列結果

線性探測再散列:

隨機探測再散列:

鏈地址法

從以上結果可見：哈希表的平均查找長度是裝載因子的函數，而不是 n 的函數;這說明，用哈希表構造查找表時,可以選擇一個適當的裝填因子,使得平均查找長度限定在某個範圍內(這是哈希表所特有的特點).

哈希表的刪除操作

[cpp]view
plain copy
 
//刪除  

template <typename HashedObj>  

bool HashTable<HashedObj>::remove(const HashedObj &x)  

{  

    list<HashedObj> &whichList = theList[ myHash(x) ];  

    typename list<HashedObj>::iterator iter = find(whichList.begin(), whichList.end(), x);  

    //沒有找到該元素  

    if (iter == whichList.end())  

        return false;  

    whichList.erase(iter);  

    -- currentSize;  

    return true;  

}

清空哈希表

[cpp]view
plain copy
 
//清空哈希表  

template <typename HashedObj>  

void HashTable<HashedObj>::makeEmpty()  

{  

    for (typename vector< list<HashedObj> >::iterator iter = theList.begin();  

            iter != theList.end();  

            ++ iter)  

    {  

        iter -> clear();  

    }  

}

附1-測試代碼

[cpp]view
plain copy
 
int main()  

{  

    HashTable<int> iTable;  

    // 1 2 3 4 5 6 7 8 9 10  

    for (int i = 0; i < 10; ++i)  

        iTable.insert(i+1);  

    for (int i = 0; i < 10; ++i)  

        if (iTable.contains(i+1))  

            cout << i << ": contains..." << endl;  

        else  

            cout << i << ": not contains" << endl;  

    cout << endl;  

    //1 2  

    for (int i = 0; i < 10; ++i)  

        iTable.remove(i+3);  

    for (int i = 0; i < 10; ++i)  

        if (iTable.contains(i))  

            cout << i << ": contains..." << endl;  

        else  

            cout << i << ": not contains" << endl;  

    cout << endl;  

    // 6 8  

    iTable.makeEmpty();  

    iTable.insert(6);  

    iTable.insert(8);  

    for (int i = 0; i < 10; ++i)  

        if (iTable.contains(i))  

            cout << i << ": contains..." << endl;  

        else  

            cout << i << ": not contains" << endl;  

    return 0;  

}

附2-各類算法複雜度的比較

數據結構基礎(18) --哈希表的設計與實現

哈希表

構造哈希函數的方法

處理衝突的方法

哈希表的查找

哈希表的刪除操作

Socket編程實踐(13) --UNIX域協議

C++筆試題總結（一）

Socket編程實踐(5) --TCP粘包問題與解決

數據結構基礎(1) --Swap & Bubble-Sort & Select-Sort

數據結構基礎(8) --單鏈表的設計與實現(1)之基本操作

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結