leetcode刷題：我好像戳到奇妙的知識點了——Trie樹

前言

3月份Leetcode發起了"每日一題"的打卡活動，本人蔘加過一次之後，就覺得這活動真有意思，然後就忘了，直到活動快結束了纔想起來。今天戳的題目是：

題目是這樣的，

給定一個單詞列表，我們將這個列表編碼成一個索引字符串 S 與一個索引列表 A。

例如，如果這個列表是 [“time”, “me”, “bell”]，我們就可以將其表示爲 S = “time#bell#” 和 indexes = [0, 2, 5]。

對於每一個索引，我們可以通過從字符串 S 中索引的位置開始讀取字符串，直到 “#” 結束，來恢復我們之前的單詞列表。

那麼成功對給定單詞列表進行編碼的最小字符串長度是多少呢？

示例：

輸入: words = [“time”, “me”, “bell”]
輸出: 10
說明: S = “time#bell#” ， indexes = [0, 2, 5] 。

簡單說，就是將一堆字符串壓縮成一個字符串，其中的"#"表示單詞之間的結束符。題目中的"time#"是：time和me的壓縮結果。

Leetcode官方給出了兩種解法，這裏主要介紹“字典樹（Trie）”的解法。在解決這道題之前，我們先看看什麼是Trie樹。

什麼是Trie樹

Trie的介紹來自於百度百科：

又稱單詞查找樹，Trie樹，是一種樹形結構，是一種哈希樹的變種。典型應用是用於統計，排序和保存大量的字符串（但不僅限於字符串），所以經常被搜索引擎系統用於文本詞頻統計。它的優點是：利用字符串的公共前綴來減少查詢時間，最大限度地減少無謂的字符串比較，查詢效率比哈希樹高。

舉個栗子——自動補全：

也就是說，Trie樹可以利用字符串的公共前綴來構成樹結構，並由此減少查詢時間。

其實將Trie樹的結構畫出來，應該是這樣：

當然，中文的處理會複雜很多，在此用小寫英文來演示Trie樹的定義。以下代碼出自Leetcode：208. 實現 Trie (前綴樹)

class Trie {

    private TrieNode root;

    /** Initialize your data structure here. */
    public Trie() {
        root = new TrieNode();
    }
    
    /** Inserts a word into the trie. */
    public void insert(String word) {
        TrieNode node = root;
        for(int i=0; i<word.length(); i++){
            char currentChar = word.charAt(i);
            if(!node.containsKey(currentChar)){
                node.put(currentChar, new TrieNode());
            }
            node = node.get(currentChar);
        }
        node.setEnd();
    }
    
    /** Returns if the word is in the trie. */
    public boolean search(String word) {
        TrieNode node = searchPrefix(word);
        return node != null && node.isEnd();
    }

    private TrieNode searchPrefix(String word){
        TrieNode node = root;
        for(int i=0; i<word.length(); i++){
            char ch = word.charAt(i);
            if(node.containsKey(ch)){
                node = node.get(ch);
            }else{
                return null;
            }
        }
        return node;
    }
    
    /** Returns if there is any word in the trie that starts with the given prefix. */
    public boolean startsWith(String prefix) {
        TrieNode node = searchPrefix(prefix);
        return node!=null;
    }

    private class TrieNode{
        private TrieNode[] links;

        private final int R = 26;

        private boolean isEnd;

        public TrieNode(){
            links = new TrieNode[R];
        }

        public boolean containsKey(char ch){
            return links[ch-'a'] != null;
        }

        public TrieNode get(char ch){
            return links[ch-'a'];
        }

        public void put(char ch, TrieNode node){
            links[ch-'a'] = node;
        }

        public void setEnd(){
            isEnd = true;
        }

        public boolean isEnd(){
            return isEnd;
        }

    }
}

個人認爲，Trie樹是用於解決字符串前綴或後綴相關的問題，所以只要掌握住兩個核心點基本可以解決Trie樹相關的問題：TrieNode的結構構成，以及Trie樹是用來解決什麼問題。

820.單詞的壓縮編碼

上文所提到的單詞的壓縮編碼這個問題。解決這個目的的關鍵是：先錄入較長的字符串，然後較短的字符串再進行比較。根據這個思路，即使沒有用到Trie樹其實也可以做出來：

class Solution {
    public int minimumLengthEncoding(String[] words) {
        Set<String> good = new HashSet<>(Arrays.asList(words));
        for(String w : words){
            for(int i=1; i<w.length(); i++){
                good.remove(w.substring(i));
            }
        }
        int ans = 0;
        for(String g : good){
            ans += g.length() + 1;
        }
        return ans;
    }
}

可以看到，在沒有用到Trie樹時，可以使用集合good將用於壓縮的字符串裝起來，而甄選字符串的過程則需要將每一個單詞後綴的所有可能進行比較。

而Trie樹因可以在尋找是否有匹配的字符串過程中，就對需要壓縮的字符串進行篩選了：

class Solution {
    public int minimumLengthEncoding(String[] words) {
        int len = 0;
        Trie trie = new Trie();
        //先對單詞列表根據長度從長到短進行排序
        Arrays.sort(words, (s1, s2)->s2.length()-s1.length());
        //單詞插入trie，返回該單詞增加的編碼長度
        for(String word: words){
            len += trie.insert(word);
        }
        return len;
    }
    
}
class Trie{
    private TrieNode root;
    public Trie(){
        root = new TrieNode();
    }
    public int insert(String word){
        TrieNode node = root;
        boolean isNew = false;
        for(int i=word.length()-1; i>=0; i--){
            char c = word.charAt(i);
            //如果這個單詞是個新單詞，則記爲需壓縮字符串
            if(node.links[c-'a'] == null){
                node.links[c-'a'] = new TrieNode();
                isNew = true;
            }
            node = node.links[c-'a'];
        }
        return isNew?word.length()+1:0;
    }
}
class TrieNode{
    TrieNode[] links = new TrieNode[26];
    public TrieNode(){
    }
}

（本題的題解參考：99% Trie 吐血攻略，包教包會）

除此之外，關於Trie樹的題目還有：

211. 添加與搜索單詞 - 數據結構設計

結語

對於大佬們來說Trie樹簡直不算什麼，但筆者乃算法小白一枚，若文章有不正之處，或難以理解的地方，請多多諒解，歡迎指正。

如果本文對你的學習有幫助，請給一個贊吧，這會是我最大的動力~

leetcode刷題：我好像戳到奇妙的知識點了——Trie樹

前言

什麼是Trie樹

820.單詞的壓縮編碼

結語

leetcode刷題：我好像戳到奇妙的知識點了——Trie樹

《軟技能：代碼之外的生存指南》筆記

計算機網絡知識框架總結（複習）

關於String的這9個問題，值得一看

不來了解下JVM支持的語言有哪些？

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結