數據結構（十七）散列查找

散列查找

1. 基本思想

以關鍵字 key 爲自變量，通過一個確定的函數 h（散列函數），計算出對應的函數值 h(key)，作爲數據對象的存儲地址
可能不同的關鍵字會映射到同一個散列地址上，即 h(key $_i$ ) = h(key $_j$ )（當 key $_i$ ≠ key $_j$ ），稱爲“衝突”——需要某種衝突解決策略

2. 基本工作

計算位置：構造散列函數確定關鍵詞存儲位置
解決衝突：應用某種策略解決多個關鍵詞位置相同的問題

時間複雜度幾乎爲是常數：O(1)

3. 散列函數的構造

1. 考慮因素

計算簡單，以便提高轉換速度
關鍵詞對應的地址空間分佈均勻，以儘量減少衝突

2. 數字關鍵詞

1. 直接定址法

取關鍵詞的某個線性函數值爲散列地址，即：h(key) = a x key + b （a、b爲常數)

2. 除留餘數法

散列函數爲：h(key) = key mod p （p 一般取素數)

3. 數字分析法

分析數字關鍵字在各位上的變化情況，取比較隨機的位作爲散列地址

4. 摺疊法

把關鍵詞分割成位數相同的幾個部分，然後疊加

5. 平方取中法

將關鍵詞平方，取中間幾位

3. 字符串關鍵字

1. ASCII碼加和法

h(key) = (Σkey[i]) mod TableSize

2. 前3個字符移位法

h(key) = (key[0]×27 $^2$ + key[1]×27 + key[2])mod TableSize

3. 移位法

h(key) = ( $\sum_{i=0}^{n-1}$ key[n-i-1]×32 $^i$ ) mod TableSize

例子（移位法）

h(“abcde”) = 'a’x32 $^4$ + 'b’x32 $^3$ +'c’x32 $^2$ + 'd’x32 + ‘e’
= ((('a’x32+‘b’)x32+‘c’)x32+‘d’)x32+‘e’

Index Hash( const char *Key,int TableSize){
    unsigned int h = 0;   // 散列值函數，初始化爲 0 
    while ( *Key != '\0' )    // 位移映射
        h = ( h << 5) + *Key++;
    return h % TableSize;
}

4. 衝突處理方法

1. 常用策略

換個位置：開放地址法
同一位置的衝突對象組織在一起：鏈地址法

2. 開放定址法

一旦產生了衝突（該地址已有其它元素），就按某種規則去尋找另一空地址
若發生了第 i 次衝突，試探的下一個地址將增加 d $_i$ ，基本公式是： h $_i$ (key) = (h(key)+d $_i$ ) mod TableSize （1 ≤ i ≤ TableSize）
d $_i$ 決定了不同的解決衝突方案

1. 線性探測

以增量序列 1,2,…, (TableSize - 1) 循環試探下一個存儲地址

2. 平方探測法

以增量序列 1 $^2$ ,-1 $^2$ ,2 $^2$ ,-2 $^2$ , … , q $^2$ , -q $^2$ 且 q ≤ ⌊ TableSize/2 ⌋ 循環試探下一個存儲地址

如果散列表長度是某個 4k+3（k是正整數）形式的素數時，平方探測法就可以探查到整個散列表空間

3. 雙散列

d $_i$ 爲 i * h $_2$ (key)，h $_2$ (key) 是另一個散列函數，探測序列成：h $_2$ (key),2h $_2$ (key),3h $_2$ (key), …

對任意 key，h $_2$ (key) ≠ 0

h $_2$ (key) = p - (key mod p) （p < TableSize，p、TableSize 都是素數）

4. 再散列

當散列表元素太多（即裝填因子 α 太大）時，查找效率會下降

解決的方法是加倍擴大散列表，這個過程就叫"再散列"，擴大時，原有元素需要重新計算放置到新表中

3. 分離鏈接法

將相應位置上衝突的所有關鍵詞存儲在同一個單鏈表中

5. 抽象數據類型定義

數據類型：符號表（SymbolTable）
數據對象集：符號表是"名字(Name)-屬性(Attribute)"對的集合
操作集：Table ∈ SymbolTable，Name ∈ NameType，Attr ∈ AttributeType

主要操作：
- SymbolTable InitalizeTable(int TableSize)：創建一個長度爲 TableSize 的符號表
- Boolean IsIn(SymbolTable Table,NameType Name)：查找特定的名字 Name 是否在 Table 中
- AttributeType Find(SymbolTable Table,NameType Name)：獲取 Table 中指定名字 Name 對應的屬性
- SymbolTable Modefy(SymbolTable Table,NameType Name,AttributeType Attr)：將 Table 中指定名字 Name 的屬性修改爲 Attr
- SymbolTable Insert(SymbolTable Table,NameType Name,AttributeType Attr)：向 Table 中插入一個新名字 Name 及其屬性 Attr
- SymbolTable Delete(SymbolTable Table,NameType Name)：從 Table 中刪除一個名字 Name 及其屬性

1. 平方探測法實現

#include<iostream>
#include<stdlib.h>
#include<cmath>
#define MAXTABLESIZE 100000   // 定義允許開闢的最大散列表長度 
typedef int Index;
typedef int ElementType; 
typedef Index Position;
typedef enum{   // 分別對應：有合法元素、空、有已刪除元素 
	Legitimate,Empty,Deleted
} EntryType;  // 定義單元狀態類型 

typedef struct HashEntry Cell;
struct HashEntry{   //  哈希表存值單元 
	ElementType Data;  // 存放元素
	EntryType Info;  // 單元狀態	
};

typedef struct HashTbl *HashTable;
struct HashTbl{  // 哈希表結構體 
	int TableSize;   // 哈希表大小 
	Cell *Cells;   // 哈希表存值單元數組 
};

using namespace std;

int NextPrime(int N);  // 查找素數 
HashTable CreateTable( int TableSize); // 創建哈希表 
Index Hash(int Key,int TableSize);   // 哈希函數 

// 查找素數 
int NextPrime(int N){
	int p = (N%2)?N+2:N+1;  // 從大於 N 的下個奇數開始
	int i;
		
	while(p <= MAXTABLESIZE){
		for(i = (int)sqrt(p);i>2;i--)
			if(!(p%i))  // p 不是素數 
				break;
		if(i==2) 
			break; 
		p += 2;  // 繼續試探下個奇數 
	}
	return p;
}

// 創建哈希表 
HashTable CreateTable( int TableSize){
	HashTable H;
	int i;
	H = (HashTable)malloc(sizeof(struct HashTbl));
	// 保證哈希表最大長度是素數 
	H->TableSize = NextPrime(TableSize);
	// 初始化單元數組
	H->Cells = (Cell *)malloc(sizeof(Cell)*H->TableSize);
	// 初始化單元數組狀態 
	for(int i=0;i<H->TableSize;i++)
		H->Cells[i].Info = Empty;
	return H;
}

// 平方探測查找 
Position Find(HashTable H,ElementType Key){
	Position CurrentPos,NewPos; 
	int CNum = 0 ;   // 記錄衝突次數
	CurrentPos = NewPos = Hash(Key,H->TableSize);
	// 如果當前單元狀態不爲空，且數值不等，則一直做 
	while(H->Cells[NewPos].Info != Empty && H->Cells[NewPos].Data != Key){
		if(++CNum % 2 ){ // 衝突奇數次發生 
			NewPos = CurrentPos + (CNum+1)/2*(CNum+1)/2;
			// 如果越界，一直減直到再次進入邊界 
			while(H->TableSize <= NewPos){
				NewPos -= H->TableSize; 
			}
		}else{  // 衝突偶數次發生 
			NewPos = CurrentPos - CNum/2*CNum/2;
			// 如果越界，一直加直到再次進入邊界 
			while(NewPos < 0){
				NewPos += H->TableSize; 
			}
		}
	} 
	return NewPos;
}

// 插入
bool Insert( HashTable H,ElementType Key,int i){
	Position Pos = i;
	Pos = Find(H,Key);
	// 如果單元格狀態不是"存在合法元素" 
	if( H->Cells[Pos].Info != Legitimate){
		H->Cells[Pos].Info = Legitimate;
		H->Cells[Pos].Data = Key;
	}
	return true;
} 

// 除留餘數法哈希函數 
Index Hash(int Key,int TableSize){
	return Key % TableSize;
}


void output(HashTable H){
	for(int i=0;i<H->TableSize;i++)
		cout<<i<<" "<<H->Cells[i].Data<<endl;
} 

int main(){
	HashTable H = CreateTable(9);
	int N;
	cin>>N;
	for(int i=0;i<N;i++){
		int tmp;
		cin>>tmp;
		Insert(H,tmp,i);
	}
	output(H);
	return 0;
}

2. 分離鏈接法實現

#include<iostream>
#include<cstdlib>
#include<cmath>
#define MAXTABLESIZE 100000
typedef int Index;
typedef int ElementType;
typedef struct LNode *PtrToLNode;
struct LNode{   // 單鏈表 
	ElementType Data;
	PtrToLNode Next;
}; 
typedef PtrToLNode Position;
typedef PtrToLNode List;

typedef struct TblNode *HashTable;  // 散列表
struct TblNode{
	int TableSize;   // 表的最大長度 
	List Heads;  // 指向鏈表頭結點的數組 
}; 
using namespace std;

int NextPrime(int N){
	int p = (N%2)?(N+2):(N+1);   // 比 tablesize 大的奇數 
	int i;
	while(p <= MAXTABLESIZE){
		for(i = (int)sqrt(p);i>2;i--)
			if(!(p%i))
				break;
		if(i==2)  // 找到素數了 
			break;
		p += 2; // 下一個奇數
	}
	return p;
}

// 創建哈希表 
HashTable CreateTable( int TableSize){
	HashTable H;
	H = (HashTable)malloc(sizeof(struct TblNode));
	H->TableSize = NextPrime(TableSize);
	H->Heads = (List)malloc(sizeof(struct TblNode) * H->TableSize);
	for(int i=0;i<H->TableSize;i++) 
		H->Heads[i].Next = NULL;  // 鏈表頭：H->Heads[i] 是不存東西的 
	return H;
}

// 除留餘數法哈希函數 
Index Hash(	int TableSize,ElementType Key){
	return  Key%TableSize;
}

// 查找
Position Find(HashTable H,ElementType Key){
	Position p;
	Index pos;
	
	pos = Hash(H->TableSize,Key); 
	p = H->Heads[pos].Next;  //獲得鏈表頭 
	while(p && p->Data != Key)
		p = p->Next;
	return p;
} 

// 插入
bool Insert(HashTable H,ElementType Key){
	Position p,NewCell;
	Index pos;
	
	p = Find(H,Key);
	if(!p){  // 關鍵詞未找到，可以插入 
		NewCell = (Position)malloc(sizeof(struct LNode));
		NewCell->Data = Key;
		pos = Hash(H->TableSize,Key);   // 初始散列表地址
		// 將新增結點插到最前面
		NewCell->Next = H->Heads[pos].Next;
		H->Heads[pos].Next = NewCell;
		return true;
	}else{
		return false;
	}
}

void output(HashTable H){
	for(int i=0;i<H->TableSize;i++){
		cout<<i;
		List p = H->Heads[i].Next;  
		while(p){
			cout<<" "<<p->Data;
			p = p->Next;
		} 
		cout<<endl;
	}
}

void DestroyTable(HashTable H){
	Position P,tmp;
	for(int i=0;i<H->TableSize;i++){
		P = H->Heads[i].Next;
		while( P ){
			tmp = P->Next;
			free(P);
			P = tmp;
		}
	}
	free(H->Heads);
	free(H);
} 


int main(){
	HashTable H = CreateTable(9);
	int N;
	cin>>N;
	for(int i=0;i<N;i++){
		int tmp;
		cin>>tmp;
		Insert(H,tmp);
	}
	output(H);
	DestroyTable(H);
	return 0;
}

數據結構（十七）散列查找

文章目錄

散列查找

1. 基本思想

2. 基本工作

3. 散列函數的構造

1. 考慮因素

2. 數字關鍵詞

1. 直接定址法

2. 除留餘數法

3. 數字分析法

4. 摺疊法

5. 平方取中法

3. 字符串關鍵字

1. ASCII碼加和法

2. 前3個字符移位法

3. 移位法

4. 衝突處理方法

1. 常用策略

2. 開放定址法

1. 線性探測

2. 平方探測法

3. 雙散列

4. 再散列

3. 分離鏈接法

5. 抽象數據類型定義

1. 平方探測法實現

2. 分離鏈接法實現

自學編程兩個月，現在我月入 4 萬元

「實戰應用」如何用圖表控件LightningChart創建2D氣泡圖

百度安全多篇議題入選Blackhat Asia以硬技術發現“芯”問題

Google Chrome驅動程序 124.0.6367.62（正式版本）去哪下載？

C++之類型推斷

C++之特殊函數：重載、默認參數值和內聯函數

數據結構（十七）散列查找

數據結構（十六）排序

佈置 SSM 項目到阿里雲

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結