Trie樹（c++實現）

原理

先看個例子，存儲字符串abc、ab、abm、abcde、pm可以利用以下方式存儲

上邊就是Trie樹的基本原理：利用字串的公共前綴來節省存儲空間，最大限度的減少無謂的字串比較。

應用

Trie樹又稱單詞查找樹，典型的應用是用於統計，排序和保存大量的字符串（不僅用於字符串），所以經常被搜索引擎系統用於文本詞頻的統計。

設計

trie，又稱前綴樹或字典樹，是一種有序樹，用於保存關聯數組，其中的鍵通常是字符串。與二叉查找樹不同，鍵不是直接保存在節點中，而是由節點在樹中的位置決定。一個節點的所有子孫都有相同的前綴，也就是這個節點對應的字符串，而根節點對應空字符串。一般情況下，不是所有的節點都有對應的值，只有葉子節點和部分內部節點所對應的鍵纔有相關的值。

結點可以設計成這樣：

//trie節點定義
template <int Size>
class trieNode {
public:
	trieNode() : terminableSize(0), nodeSize(0) {
		for (int i = 0; i < Size; ++i)
			children[i] = NULL;
	}
	~trieNode() {
		for (int i = 0; i < Size; ++i) {
			if (children[i] != NULL) {
				delete children[i];
				children[i] = NULL;
			}
		}
	}

public:
	int terminableSize; //存儲以此節點爲結尾的字符串的個數
	int nodeSize;  //記錄此節點孩子的個數
	trieNode *children[Size]; //該數組記錄指向孩子的指針
};

圖示

樹設計成這樣：

//trie樹定義
template <int Size>
class trie {
public:
	trie() : root(new trieNode<Size>) {}

	int Index(char ch) { //取某個字符在children數組中的位置
		return static_cast<int>(ch % Size);
	}

	void insert(const string& str); //插入字符串str

	bool find(const string& str);  //查找字符串str

	bool downNodeAlone(trieNode<Size> *ptr); //判斷當前節點往下是否是單一的字符串

	bool erase(const string& str);  //刪除字符串str

	int sizeAll(const trieNode<Size> *pNode); //統計不重複字符串個數

	int sizeNoneRedundant(const trieNode<Size> *pNode); //統計重複字符串個數

public:
	trieNode<Size> *root;
};

index字串索引利用（char % 26）得到，這樣'a' % 26 = 19, 'b' % 26 = 20

實現

插入

以插入abc、ab爲例

]

刪除

刪除結點，首先查找此字串是否在樹中，如果在樹中，再查找此結點以下的部分是不是都是隻有一個孩子，並且每個結點只有葉子結點是結束結點，如果不是繼續往下重複上邊過程。

統計字串個數

分兩種情況

計算重複的字串的個數：是結束結點，此時加的是terminabel的個數
計算不重複的字串的個數：是結束結點，此時加的是1(當terminabel>0）的個數

參考代碼

#include <iostream>
#include <string>
using namespace std;

//trie節點定義
template <int Size>
class trieNode {
public:
	trieNode() : terminableSize(0), nodeSize(0) {
		for (int i = 0; i < Size; ++i)
			children[i] = NULL;
	}
	~trieNode() {
		for (int i = 0; i < Size; ++i) {
			if (children[i] != NULL) {
				delete children[i];
				children[i] = NULL;
			}
		}
	}

public:
	int terminableSize; //存儲以此節點爲結尾的字符串的個數
	int nodeSize;  //記錄此節點孩子的個數
	trieNode *children[Size]; //該數組記錄指向孩子的指針
};


//trie樹定義
template <int Size>
class trie {
public:
	trie() : root(new trieNode<Size>) {}

	int Index(char ch) { //取某個字符在children數組中的位置
		return static_cast<int>(ch % Size);
	}

	void insert(const string& str); //插入字符串str

	bool find(const string& str);  //查找字符串str

	bool downNodeAlone(trieNode<Size> *ptr); //判斷當前節點往下是否是單一的字符串

	bool erase(const string& str);  //刪除字符串str

	int sizeAll(const trieNode<Size> *pNode); //統計不重複字符串個數

	int sizeNoneRedundant(const trieNode<Size> *pNode); //統計重複字符串個數

public:
	trieNode<Size> *root;
};

template <int Size>
void trie<Size>::insert(const string& str) {
	trieNode<Size> *cur = root;
	for (size_t i = 0; i < str.size(); ++i) {
		if (!cur->children[Index(str[i])]) {
			cur->children[Index(str[i])] = new trieNode<Size>;
			++cur->nodeSize;
		}
		cur = cur->children[Index(str[i])];
	}
	++cur->terminableSize;
}

template <int Size>
bool trie<Size>::find(const string& str) {
	trieNode<Size> *cur = root;
	for (size_t i = 0; i < str.size(); ++i) {
		if (!cur->children[Index(str[i])]) {
			return false;
		}
		cur = cur->children[Index(str[i])];
	}
	if (cur->terminableSize > 0)
		return true;
	return false;
}

//判斷當前節點往下是否是單一的字符串
template<int Size>
bool trie<Size>::downNodeAlone(trieNode<Size> *ptr) {
	trieNode<Size> *cur = ptr;
	int terminableSum = 0;
	while (cur->nodeSize > 0) {
		terminableSum += cur->terminableSize;
		if (terminableSum > 1)
			return false;

		if (cur->nodeSize > 1)
			return false;
		else { //cur->nodeSize = 1
			for (int i = 0; i < Size; ++i) {
				if (cur->children[i]) {
					cur = cur->children[i];
					break;
				}
			}
		}
	}
	if (terminableSum == 1)
		return true;
	return false;
}


//刪除字符串str
template<int Size>
bool trie<Size>::erase(const string& str) {
	if (find(str)) {
		trieNode<Size> *cur = root;
		for (int i = 0; i < str.size(); ++i) {
			if (downNodeAlone(cur)) {
				while (i < str.size()) {
					trieNode<Size> *tmp = cur;
					cur = cur->children[Index(str[i])];
					delete tmp;
					++i;
				}
				return true;
			}
			cur = cur->children[Index(str[i])];
		}
		if (cur->terminableSize > 0)
			--cur->terminableSize;
		return true;
	}
	return false;
}


//統計該trie樹中包含字符串個數（包括重複字符串）
template <int Size>
int trie<Size>::sizeAll(const trieNode<Size> *root) {
	if (root == NULL)
		return 0;
	int rev = root->terminableSize;
	for (int i = 0; i < Size; ++i) {
		rev += sizeAll(root->children[i]);
	}
	return rev;
}

//統計該trie樹中包含字符串個數（不包括重複字符串）
template <int Size>
int trie<Size>::sizeNoneRedundant(const trieNode<Size> *root) {
	if (root == NULL)
		return 0;
	int rev = 0;
	if (root->terminableSize > 0)
		rev = 1;
	if (root->nodeSize > 0) {
		for (int i = 0; i < Size; ++i) {
			rev += sizeNoneRedundant(root->children[i]);
		}
	}
	return rev;
}

int main()
{
	trie<26> t;
	t.insert("hello");
	t.insert("hello");
	t.insert("h");
	t.insert("h");
	t.insert("he");
	t.insert("hel");
	cout << "SizeALL:" << t.sizeAll(t.root) << endl;
	cout << "sizeNoneRedundant:" << t.sizeNoneRedundant(t.root) << endl;

	t.erase("h");

	cout << "\nSizeALL:" << t.sizeAll(t.root) << endl;
	cout << "sizeNoneRedundant:" << t.sizeNoneRedundant(t.root) << endl;

	system("pause");
	return 0;
}

結果

技術實現細節

1. 對樹的刪除，並不是樹銷燬結點，而是通過結點自身的析構函數實現

2. 模版類、模版函數、非類型模版可以參考：http://www.cnblogs.com/kaituorensheng/p/3601495.html

3. 字母的存儲並不是存儲的字母，而是存儲的位置，如果該位置的指針爲空，則說明此處沒有字母；反之有字母。

4. terminableNum存儲以此結點爲結束結點的個數，這樣可以避免刪除時，不知道是否有多個相同字符串的情況。

Trie樹（c++實現）

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

構造/析構/賦值函數

以對象管理資源、在資源管理類中小心coping行爲、在資源管理類中提供對原始資源的訪問

fork調用拷貝緩衝區

最多有多少個點在一條直線上

Linux中getopt()函數用法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結