hash函數應用（整理）

評估hash函數優劣的基準主要有以下兩個指標：

(1) 散列分佈性

即桶的使用率backet_usage = (已使用桶數) / (總的桶數)，這個比例越高，說明分佈性良好，是好的hash設計。

(2) 平均桶長

即avg_backet_len，所有已使用桶的平均長度。理想狀態下這個值應該=1，越小說明衝突發生地越少，是好的hash設計。

hash函數計算一般都非常簡潔，因此在耗費計算時間複雜性方面判別甚微，這裏不作對比。

f(x) x的變化引起雪崩反應分佈均勻桶利用率高

擇。當然，最好實際測試一下，畢竟應用特點不大相同。其他幾組測試結果也類似，這裏不再給出。

Hash函數   桶數   Hash調用總數   最大桶長   平均桶長   桶使用率%
simple_hash   10240   47198   16   4.63   99.00%
RS_hash   10240   47198   16   4.63   98.91%
JS_hash   10240   47198   15   4.64   98.87%
PJW_hash   10240   47198   16   4.63   99.00%
ELF_hash   10240   47198   16   4.63   99.00%
BKDR_hash   10240   47198   16   4.63   99.00%
SDBM_hash   10240   47198   16   4.63   98.90%
DJB_hash   10240   47198   15   4.64   98.85%
AP_hash   10240   47198   16   4.63   98.96%
CRC_hash   10240   47198   16   4.64   98.77%

字符串求hash:

/* A Simple Hash Function */
unsigned int simple_hash(char *str)
{
    register unsigned int hash;
    register unsigned char *p;

    for(hash = 0, p = (unsigned char *)str; *p ; p++)
        hash = 31 * hash + *p;

    return (hash & 0x7FFFFFFF);
}

//平時寫小程序的時候&0xFFFFFFF就行了，這個可以控制在3位數，用來平時寫小程序

如："fsfdsfdfdfdqqqqqqqqqqqqqqqqqsssssssssssssssssssssssssssssssssssssssssssssssssssssssqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqd",

"fsdfdsf",

"sssss",

"sseeeeeeeeeeee",

"sssee"

對應的哈希值是 111 218 104 248 104

數字求哈希：

static inline u32 hash_32(u32 val, unsigned int bits){

/* On some cpus multiply is faster, on others gcc will do shifts */

u32 hash = val * GOLDEN_RATIO_PRIME_32; // ox9e370001UL

/* High bits are more random, so use them. */

return hash >> (32 - bits);

}

// BKDR Hash Function
unsigned  int BKDRHash( char   * str)
{
        unsigned  int seed  =   131 ;  // 31 131 1313 13131 131313 etc..
         unsigned  int hash  =   0 ;

         while ( * str)
         {
                hash  = hash  * seed  + ( * str ++ );
        }

return (hash & 0x7FFFFFFF );
}

其中hash_long在<linux/hash.h>中定義如下:

/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
#define GOLDEN_RATIO_PRIME_32 0x9e370001UL
/* 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL

#if BITS_PER_LONG == 32
#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_32
#define hash_long(val, bits) hash_32(val, bits)
#elif BITS_PER_LONG == 64
#define hash_long(val, bits) hash_64(val, bits)
#define GOLDEN_RATIO_PRIME GOLDEN_RATIO_PRIME_64
#else
#error Wordsize not 32 or 64
#endif

static inline u64 hash_64(u64 val, unsigned int bits)
{
u64 hash = val;

/* Sigh, gcc can't optimise this alone like it does for 32 bits. */
u64 n = hash;
n <<= 18;
hash -= n;
n <<= 33;
hash -= n;
n <<= 3;
hash += n;
n <<= 3;
hash -= n;
n <<= 4;
hash += n;
n <<= 2;
hash += n;

/* High bits are more random, so use them. */
return hash >> (64 - bits);
}

static inline u32 hash_32(u32 val, unsigned int bits)
{
/* On some cpus multiply is faster, on others gcc will do shifts */
u32 hash = val * GOLDEN_RATIO_PRIME_32;

/* High bits are more random, so use them. */
return hash >> (32 - bits);
}

static inline unsigned long hash_ptr(const void *ptr, unsigned int bits)
{
return hash_long((unsigned long)ptr, bits);
}
#endif /* _LINUX_HASH_H */

上面的函數很有趣，我們來仔細看一下。

首先，hash的方式是，讓key乘以一個大數，於是結果溢出，就把留在32/64位變量中的值作爲hash值，又由於散列表的索引長度有限，我們就取這hash值的高几爲作爲索引值，之所以取高几位，是因爲高位的數更具有隨機性，能夠減少所謂“衝突”。什麼是衝突呢？從上面的算法來看，key和hash值並不是一一對應的。有可能兩個key算出來得到同一個hash值，這就稱爲“衝突”。

那麼，乘以的這個大數應該是多少呢？從上面的代碼來看，32位系統中這個數是0x9e370001UL，64位系統中這個數是0x9e37fffffffc0001UL。這個數是怎麼得到的呢？

“Knuth建議，要得到滿意的結果，對於32位機器，2^32做黃金分割，這個大樹是最接近黃金分割點的素數，0x9e370001UL就是接近 2^32*(sqrt(5)-1)/2 的一個素數，且這個數可以很方便地通過加運算和位移運算得到，因爲它等於2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1。對於64位系統，這個數是0x9e37fffffffc0001UL，同樣有2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1。”

從程序中可以看到，對於32位系統計算hash值是直接用的乘法，因爲gcc在編譯時會自動優化算法。而對於64位系統，gcc似乎沒有類似的優化，所以用的是位移運算和加運算來計算。首先n=hash, 然後n左移18位，hash-=n，這樣hash = hash * (1 - 2^18)，下一項是-2^51，而n之前已經左移過18位了，所以只需要再左移33位，於是有n <<= 33，依次類推，最終算出了hash值。

處理衝突：
開放地址法：hi=(h(key)+i)%m i<=m-1 di=i;
用線性探測法處理衝突，思路清晰，算法簡單，但存在下列缺點：
① 處理溢出需另編程序。一般可另外設立一個溢出表，專門用來存放上述哈希表中放不下的記錄。此溢出表最簡單的結構是順序表，查找方法可用順序查找。
② 按上述算法建立起來的哈希表，刪除工作非常困難。假如要從哈希表 HT 中刪除一個記錄，按理應將這個記錄所在位置置爲空，但我們不能這樣做，而只能標上已被刪除的標記，否則，將會影響以後的查找。
③ 線性探測法很容易產生堆聚現象。所謂堆聚現象，就是存入哈希表的記錄在表中連成一片。按照線性探測法處理衝突，如果生成哈希地址的連續序列愈長 ( 即不同關鍵字值的哈希地址相鄰在一起愈長 ) ，則當新的記錄加入該表時，與這個序列發生衝突的可能性愈大。因此，哈希地址的較長連續序列比較短連續序列生長得快，這就意味着，一旦出現堆聚 ( 伴隨着衝突 ) ，就將引起進一步的堆聚。

（2）拉鍊法的優點
與開放定址法相比，拉鍊法有如下幾個優點：
①拉鍊法處理衝突簡單，且無堆積現象，即非同義詞決不會發生衝突，因此平均查找長度較短；
②由於拉鍊法中各鏈表上的結點空間是動態申請的，故它更適合於造表前無法確定表長的情況；
③開放定址法爲減少衝突，要求裝填因子α較小，故當結點規模較大時會浪費很多空間。而拉鍊法中可取α≥1，且結點較大時，拉鍊法中增加的指針域可忽略不計，因此節省空間；
④在用拉鍊法構造的散列表中，刪除結點的操作易於實現。只要簡單地刪去鏈表上相應的結點即可。而對開放地址法構造的散列表，刪除結點不能簡單地將被刪結點的空間置爲空，否則將截斷在它之後填人散列表的同義詞結點的查找路徑。這是因爲各種開放地址法中，空地址單元(即開放地址)都是查找失敗的條件。因此在用開放地址法處理衝突的散列表上執行刪除操作，只能在被刪結點上做刪除標記，而不能真正刪除結點。

（3）拉鍊法的缺點
　拉鍊法的缺點是：指針需要額外的空間，故當結點規模較小時，開放定址法較爲節省空間，而若將節省的指針空間用來擴大散列表的規模，可使裝填因子變小，這又減少了開放定址法中的衝突，從而提高平均查找速度。

C++中的hashMap

[轉]http://biancheng.dnbcw.info/c/170128.html

標準std中只有map,是使用平衡二叉樹實現的，查找和添加的複雜度都爲O(log(n)),
沒有提供hash map,gnu c++提供了hash_map，是一個hash map的實現，查找和添加複雜
度均爲O(1)。利用空間來換時間可以參考：http://www.cnblogs.com/luxiaoxun/archive/2012/09/02/2667782.html
#include <ext/hash_map>
#include <iostream>
#include <cstring>

using namespace std;
using namespace __gnu_cxx;

struct eqstr{
   bool operator()(const char *s1, const char *s2)const{
       return strcmp(s1,s2) == 0;
   }
};

int main(){
   hash_map<const char *,int,hash<const char *>,eqstr> months;
   months["january"] = 31;
   months["february"] = 28;
   months["march"] = 31;
   cout << "march -> " << months["march"] << endl;
}

不過gnu hash_map和c++ stl的api不兼容，c++ tr1(C++ Technical Report
1)作爲標準的擴展，實現了hash map,提供了和stl兼容一致的api,稱爲unorder_map.在頭文件
<tr1/unordered_map>中。另外c++ tr1還提供了正則表達式、智能指針、hash table、
隨機數生成器的功能。
#include <iostream>
#include <string>
#include <tr1/unordered_map>
using namespace std;

int main(){
   typedef std::tr1::unordered_map<int,string> hash_map;
   hash_map hm;
   hm.insert(std::pair<int,std::string>(0,"Hello"));
   hm[1] = "World";
   for(hash_map::const_iterator it = hm.begin(); it != hm.end(); ++it){
       cout << it->first << "-> " << it->second << endl;
   }
   return 0;
}

與C++primer（4版）中的map用法相同！！！不過這個速度快一點！C:\MinGW\lib\gcc\mingw32\4.6.2\include\c++\tr1

#include <iostream>
#include <string>
#include <tr1/unordered_map>
using namespace std;

int main(){
typedef std::tr1::unordered_map<string,int> hash_map;
hash_map hm;
hm.insert(make_pair("Hello",1));
hm.insert(std::pair<std::string,int>("Hello2",1));
hm.insert(hash_map::value_type("Hello2",1));//已經存在了就不會代替
pair<hash_map::iterator,bool> ret=hm.insert(hash_map::value_type("Hello2",1));//要改就用這種方式改！！！
if(!ret.second) ++ret.first->second;
hm["world"] =1;
++hm["world"];
for(hash_map::const_iterator it = hm.begin(); it != hm.end(); ++it){
cout << it->first << "-> " << it->second << endl;
}
return 0;
}

---------------------
作者：tanglanting12
來源：CSDN
原文：https://blog.csdn.net/szu_tanglanting/article/details/12406605
版權聲明：本文爲博主原創文章，轉載請附上博文鏈接！

hash函數應用（整理）

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

大齡程序員思考

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

WebStorm 創建 Vue 項目

nuget添加readme

selinux在 android 上的實現

Linux 死鎖檢測模塊 Lockdep 簡介——轉自魅族內核團隊，對死鎖檢測認識上升到新高度

輕鬆幾步實現在STM32上運行FreeRTOS任務

Android 9.0 Auto及m4 core倒車邏輯--基於imx8qm

request_firmware函數的使用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結