字典樹 —— 字符串分析算法

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們繼續來編程訓練,在《前端進階》這個系列裏面我們已經講過一些字符串的算法了。然後這篇文章我們就來一起學習,剩下的幾個字符串中比較細節的算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"字符串分析算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在開始之前我們先來看看字符串算法的一個整體目錄。這裏我們從簡單到難的算法來排列,大概就分成這樣一個順序:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"字典樹","attrs":{}},{"type":"text","text":" ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大量高重複字符串的儲存與分析(完全匹配)","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如說我們要處理 1 億個字符串,這裏面有多少出現頻率前 50 的這樣的字符串,1 億這個量我們還是可以用字典樹去處理的","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再比如說大家做搜索關鍵詞,或者相同的字符串搜索類型的情況,很多時候我們就會需要用到類似字典樹這樣的一個結構","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"KMP","attrs":{}},{"type":"text","text":" ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在長字符串裏找模式(部分匹配) ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它跟字典樹最大的區別就是字典樹是檢查兩個字符串是否完全匹配,而 KMP 是兩個字符串中,一個字符串是兩一個字符串的一部分,但是這個就會出現一個更爲複雜的問題。 ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們有一個長度爲 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"m","attrs":{}}],"attrs":{}},{"type":"text","text":" 的字符串和一個長度爲 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"n","attrs":{}}],"attrs":{}},{"type":"text","text":" 的字符串,然後讓他們兩個互相匹配,這個時候我們有兩種匹配方法","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一種就是暴力","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"破解法","attrs":{}}],"attrs":{}},{"type":"text","text":",它可能是","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"m","attrs":{}}],"attrs":{}},{"type":"text","text":" 乘以 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"n","attrs":{}}],"attrs":{}},{"type":"text","text":" 的時間複雜度,顯然這個算法的性能在大量的搜索字符的時候是不行的","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以後面幾位計算機專家研究出了 KMP 算法,而 KMP 就是三個人的名字的首字母,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"K","attrs":{}}],"attrs":{}},{"type":"text","text":" 是高德納,一個著名的寫計算機程序設計的老爺子。加上另外兩個計算機專家共同發明了 KMP 算法。這個算法就是在一個長字符串裏面匹配一個短字符串,這個匹配算法的複雜度可以降到 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"m + n","attrs":{}}],"attrs":{}},{"type":"text","text":"。所以這個算法還是非常的厲害的。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Wildcard","attrs":{}},{"type":"text","text":" ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 KMP 的基礎上加了通配符的字符串模式","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通配符包括","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"問號","attrs":{}}],"attrs":{}},{"type":"text","text":" 表示匹配任意字符,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"星號","attrs":{}}],"attrs":{}},{"type":"text","text":"表示匹配任意數量的任意字符","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們做一些文件查找的時候可能就會運用到 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Wildcard","attrs":{}}],"attrs":{}},{"type":"text","text":" 的這種通配符","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們也可以理解它爲一個弱一點的正則表達式,因爲相比正則它只有兩種通配符,並且這些通配符與正則有一個顯著的區別,就是 Wildcard 其實也是可以在 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"O(n)","attrs":{}}],"attrs":{}},{"type":"text","text":" 或者 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"O(m+n)","attrs":{}}],"attrs":{}},{"type":"text","text":"的時間複雜度內去處理的。這個現象是因爲 Wildcard 當中有一個貪心算法,也是它非常神奇的原因。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"正則","attrs":{}},{"type":"text","text":" ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正則一般來說都是需要用到回溯的一個系統","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它可以說是字符串通用模式匹配的終極版本","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"狀態機","attrs":{}},{"type":"text","text":" ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通用的字符串分析","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與正則表達式相比,狀態機會更強大","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正則表達式與有限狀態機在理論上是完全等價的兩種東西","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是有限狀態機不同的是,我們還可以往裏面嵌代碼,還可以給字符串做而外的處理","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外就是正則寫起來很方便,有限狀態機寫起來成本比較高","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"LL LR","attrs":{}},{"type":"text","text":" ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在簡單的匹配和分析的基礎上,如果我們要對字符串建立多層級的結構,我們就會使用 LL 和 LR 這樣的語法分析的算法","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LL 在上一篇文章我們已經學習過了,但是 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"LR","attrs":{}}],"attrs":{}},{"type":"text","text":" 是還沒有的,實際上 LR 是一個比 LL 更強大的一個語法分析","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是通常我們簡單寫,就都用 LL 去寫,因爲 LR 它 的理論性比較強","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果同學們還記得的話,我們在講解 HTML 的語法分析的時候,我們用了一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"stack","attrs":{}}],"attrs":{}},{"type":"text","text":" 去處理,這個其實就是 LR 算法的一個簡化版。它其實是 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"LR(0)","attrs":{}}],"attrs":{}},{"type":"text","text":" 的語法,但是一般來說我們去處理都會用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"LR(1)","attrs":{}}],"attrs":{}},{"type":"text","text":",而 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"LR(1)","attrs":{}}],"attrs":{}},{"type":"text","text":" 是相等於 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"LL(n)","attrs":{}}],"attrs":{}},{"type":"text","text":" 的這樣一種非常強大的分析算法。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"字典樹","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們先了解字典樹到底是一個什麼東西。我們平時遇到不懂得字都會去查字典對不對?那麼我去查字典的時候,我們往往會根據單詞的第一個字母(一般是拼音首字母)作爲索引去找到這個字大概在那一頁,這裏用到的就是字典序。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後如果我們把這種索引尋找方法不斷地重複。當我們找好了第一個字母之後,我們再去看它的第二個字母是屬於字典中的哪一個部分,最後把這些一路找過來的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"線索","attrs":{}}],"attrs":{}},{"type":"text","text":" 變成一個樹形的結構。換一句話說也可以理解爲 \"","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"查字典的行爲變成一個樹形的結構","attrs":{}},{"type":"text","text":"\"。—— ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"而這個樹形結構就是我們的字典樹了","attrs":{}}],"attrs":{}},{"type":"text","text":",字典樹有一個英文的名字叫 \"","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Trie","attrs":{}}],"attrs":{}},{"type":"text","text":"\"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"例子分析","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我們舉個例子:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a919a6f8f37b0acd29894aa82f2f13b8.gif","alt":null,"title":"","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如說現在我們有這 4 個字符串","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"javascript"},"content":[{"type":"text","text":"[\n '3499'\n '0015'\n '0002'\n '0007'\n]","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏它們都是等長的,不過不等長也沒有關係的,等一下我們再來了解爲什麼。那麼如果我們用字典樹來保存這個 4 個字符串,因給怎麼保存呢?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"第一層","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3c/3c516b9a6228c0fd2c6d1d51da2f325f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們首先來看所有字符串的第一個字母,它們的第一個字母只有 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 和 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"3","attrs":{}}],"attrs":{}},{"type":"text","text":"這兩種字符,所以我們字典樹的第一層就會分成 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"3","attrs":{}}],"attrs":{}},{"type":"text","text":" 和 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 兩個分支。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"第二層","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6c/6ca43408db0c7801ea3979c929cacfae.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我們看看第二層,這裏有 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"4","attrs":{}}],"attrs":{}},{"type":"text","text":" 和 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 兩種字符的分支。我們可以看到第一個字符串,","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"4","attrs":{}}],"attrs":{}},{"type":"text","text":" 的前面是 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"3","attrs":{}}],"attrs":{}},{"type":"text","text":",並且在這個位置沒有出現 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"4 ","attrs":{}}],"attrs":{}},{"type":"text","text":"前面是另外一種字符的情況。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以這裏我們就可以把 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"4","attrs":{}}],"attrs":{}},{"type":"text","text":" 放到上一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"3","attrs":{}}],"attrs":{}},{"type":"text","text":" 的分支之下,然後 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 也是一樣,前一個字符都是 0,所以放在我們的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 的分支之下。(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"這裏聽的有點蒙不要緊,到了最後看着動畫裏面的效果來理解,就會更加明確了。","attrs":{}},{"type":"text","text":")","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"第三層","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/df/dfdee3c61028f77f69f0bd286f8d7edb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 後面的分支,發生了一個變化,第二行的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 後面出現了一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"1","attrs":{}}],"attrs":{}},{"type":"text","text":",然後第一行的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"4","attrs":{}}],"attrs":{}},{"type":"text","text":" 後面又有一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"9","attrs":{}}],"attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以說我們最後出來的字典樹,在 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"4","attrs":{}}],"attrs":{}},{"type":"text","text":" 的後面產生了一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"9","attrs":{}}],"attrs":{}},{"type":"text","text":" 的分支,並且在 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 的分支上會產生了 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"1","attrs":{}}],"attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":"兩個分支。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"第四層","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/34/34e881dc0dcc4daa19287ee76aaf611a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同理,在第四層這裏 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"0","attrs":{}}],"attrs":{}},{"type":"text","text":" 的後面出現了 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"2","attrs":{}}],"attrs":{}},{"type":"text","text":" 和 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"7","attrs":{}}],"attrs":{}},{"type":"text","text":" 這兩種情況,而 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"1","attrs":{}}],"attrs":{}},{"type":"text","text":" 後面出現了 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"5","attrs":{}}],"attrs":{}},{"type":"text","text":" 這一種情況。最後的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"9","attrs":{}}],"attrs":{}},{"type":"text","text":" 後面再次出現了 9,所以我們只需要再追加一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"9","attrs":{}}],"attrs":{}},{"type":"text","text":" 的分支即可。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後我們來看看整個字典樹的生成過程!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ca/ca94282fc59d5ed3c9ea6d651fbf9e13.gif","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1d/1d28fc2d088b8e4291e06ef90f58b4d7.gif","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"代碼實現","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我們看看在代碼中,可以如何實現這棵字典樹,以及看看字典樹有什麼樣的應用場景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們來討論一下字典樹的存儲機制,這裏我們會用一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"空對象","attrs":{}}],"attrs":{}},{"type":"text","text":"來保存字典樹裏面的值。因爲我們字典樹在實際場景裏面就是一段字符串所以說我們會用一個","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"對象來作爲字典樹的節點","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然如果大家願意的話,用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Map","attrs":{}}],"attrs":{}},{"type":"text","text":" 也是可以的, ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Object","attrs":{}}],"attrs":{}},{"type":"text","text":" 和 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Map","attrs":{}}],"attrs":{}},{"type":"text","text":" 就是在 JavaScript 中最適合用來保存字典樹裏面的分支這種數據結構的。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲字典樹裏面只會存字符串,所以說用對象還是 Map 沒有本質的區別。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Constructor 構造方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們來加入一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Trie","attrs":{}}],"attrs":{}},{"type":"text","text":" 類,然後實現一個構建函數 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"constructor()","attrs":{}}],"attrs":{}},{"type":"text","text":",這裏爲了乾淨我們就選擇使用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Object.create(null)","attrs":{}}],"attrs":{}},{"type":"text","text":" 來創建這個字符串。這樣也可以避免受到 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Object.prototype","attrs":{}}],"attrs":{}},{"type":"text","text":" 原型上的一些污染。(不過因爲我們每次存的是一個字符,也不存在污染的問題,但是這個寫法是一個好的習慣,能幹淨還是儘量乾淨。)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"javascript"},"content":[{"type":"text","text":"class Trie {\n /** 構建函數 **/\n constructor() {\n this.root = Object.create(null);\n }\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Insert 添加樹節點方法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我們需要編寫一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"insert()","attrs":{}}],"attrs":{}},{"type":"text","text":" 方法,這個方法的作用就是把一個字符串插入字典樹裏面。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個插入邏輯其實很簡單,我們去設一個變量 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"node","attrs":{}}],"attrs":{}},{"type":"text","text":"(也就是一個節點) ,一開始讓這個節點等於我們的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"root","attrs":{}}],"attrs":{}},{"type":"text","text":" (這裏的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"root","attrs":{}}],"attrs":{}},{"type":"text","text":" 就是我們樹結構的","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"根節點","attrs":{}}],"attrs":{}},{"type":"text","text":") 。然後我們就從 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"root","attrs":{}}],"attrs":{}},{"type":"text","text":" 根節點,逐級地把字符串放進這個樹的子樹節點裏面去。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏如果我們的主樹不存在的話,我們就先創建主樹,然後我們再讓 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"node","attrs":{}}],"attrs":{}},{"type":"text","text":" 到下一個層級去(相當於我們在查字典的時候,翻到對應的字母的位置)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後我們要注意的是,字符串是會有大量的重複的。比如我們的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"ab","attrs":{}}],"attrs":{}},{"type":"text","text":" 和 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"abc","attrs":{}}],"attrs":{}},{"type":"text","text":" 其實它是兩個不同的字符串,所以說 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"ab","attrs":{}}],"attrs":{}},{"type":"text","text":" 後邊我們要有一個截止符。這個截止符我們就用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"$","attrs":{}}],"attrs":{}},{"type":"text","text":" 來表示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實這裏我們用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"$","attrs":{}}],"attrs":{}},{"type":"text","text":" 符是不合適的,因爲如果我們的字符串本身就支持 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"$","attrs":{}}],"attrs":{}},{"type":"text","text":" 這個內容的話,這樣就會出問題了。所以說其實一個更好的方案就是我們使用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Symbol()","attrs":{}}],"attrs":{}},{"type":"text","text":"來 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"創建一個 Symbol","attrs":{}}],"attrs":{}},{"type":"text","text":"。這裏我們可以使用 Symbol 來處理,這樣就不會和我們字符裏面的 $ 符號衝突了。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用了 Symbol 的這種","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"不可重複","attrs":{}}],"attrs":{}},{"type":"text","text":"的特點,那麼我們就可以讓 node 節點最後的截止符更加嚴謹一些。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏就講完 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"insert()","attrs":{}}],"attrs":{}},{"type":"text","text":" 方法的邏輯思路了,接下來我們看看代碼:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"javascript"},"content":[{"type":"text","text":"/** 創建 $ 唯一的截止符 symbol **/\nlet $ = Symbol('$');\n\nclass Trie {\n /** 構建函數 **/\n constructor() {\n this.root = Object.create(null);\n }\n\n /**\n * 添加樹節點\n * @param {String} word 字符\n */\n insert(word) {\n let node = this.root;\n\n for (let c of word) {\n if (!node[c]) node[c] = Object.create(null);\n node = node[c];\n }\n\n if (!($ in node)) node[$] = 0;\n\n node[$]++;\n }\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"randomWord 隨機單詞","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們做一個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"randomWord()","attrs":{}}],"attrs":{}},{"type":"text","text":" 函數,這個函數會產生一個隨機的單詞。然後結合我們的字典樹,我們就可以輕易的分析一些字符的數據,比如說 \"出現最多的單詞\" 之類的邏輯。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/90/90c33c0b1f737e4c6216faae2e61ac08.gif","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不多說,先點個贊!","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們來看看這個函數的實現代碼:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"javascript"},"content":[{"type":"text","text":"function randomWord(length) {\n var s = '';\n for (let i = 0; i < length; i++) {\n s += String.fromCharCode(Math.random() * 26 + 'a'.charCodeAt(0));\n }\n return s;\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏面的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"String.fromCharCode(Math.random() * 26 + 'a'.charCodeAt(0))","attrs":{}}],"attrs":{}},{"type":"text","text":" 這一行代碼做了什麼呢?其實就是在 26 個字母的字符集裏面隨機拿一個字母出來,因爲最大是 26 個,所以我們從字符 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"a","attrs":{}}],"attrs":{}},{"type":"text","text":" 開始隨機往後加入 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"隨機數","attrs":{}}],"attrs":{}},{"type":"text","text":" * 26,這樣我們就可以得到一個隨機的數,並且這個數是在 0 - 26 之間。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/50/50becc5e38fbc854c9dd7bb36e7815a7.gif","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好,有了這個隨機生成單詞的方法,我們就可以來生成大量的單詞,然後使用我們的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"字典樹","attrs":{}}],"attrs":{}},{"type":"text","text":" 來實現一個統計分析功能了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們來構建 10 萬個隨機單詞:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"javascript"},"content":[{"type":"text","text":"let trie = new Trie();\n\nfor (let i = 0; i < 100000; i++) {\n trie.insert(randomWord(4));\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後我們放入瀏覽器執行後,我們可以看到 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"字典樹","attrs":{}}],"attrs":{}},{"type":"text","text":"就生成好了:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b9/b90562f7e47c2bb3e4212315e7469284.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏代表什麼呢?如果我們還記得在 \"例子分析\" 部分講到的,這裏意思就是說我們是有 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"aaad","attrs":{}}],"attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"aaag","attrs":{}}],"attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"aaam","attrs":{}}],"attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"aaax","attrs":{}}],"attrs":{}},{"type":"text","text":"、","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"aaaz","attrs":{}}],"attrs":{}},{"type":"text","text":" 這樣的一些字符。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好,現在我們想實現我們的業務需求,找出出現最多的隨機字符串該怎麼寫呢?","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"most 統計字符函數","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回到我們的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Trie","attrs":{}}],"attrs":{}},{"type":"text","text":" 字典樹的類中,加入我們的 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"most()","attrs":{}}],"attrs":{}},{"type":"text","text":" 方法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們的 most 方法中,需要去遍歷整棵樹。在訪問這棵樹的時候,如果這棵樹上沒有結束,所以我們需要訪問這顆樹上的每一個單詞,那這種情況該怎麼辦呢?​","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先如果我們要統計所有的單詞,我們就需要用遞歸訪問整棵樹的節點,然後再訪問的同時記錄每一個單詞的出現次數。但是我們這裏是一棵字典樹,不是整個單詞的數組集合,所以我們需要在樹中找到每個字符結束的位置,並且記錄這個單詞的全部字母。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要找到單詞結束的位置,首先我們看這棵樹有沒有 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"$","attrs":{}}],"attrs":{}},{"type":"text","text":" 結束符,如果有 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"$","attrs":{}}],"attrs":{}},{"type":"text","text":" 結束符說明當前的位置就是單詞的截止的點,找到了截止的點,我們就可以找 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"max","attrs":{}}],"attrs":{}},{"type":"text","text":" 的節點。但是我們只找到 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"max","attrs":{}}],"attrs":{}},{"type":"text","text":" 的節點不等於我們找到了這個詞。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以我們需要在遞歸函數 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"visit()","attrs":{}}],"attrs":{}},{"type":"text","text":" 的參數中加入 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"word","attrs":{}}],"attrs":{}},{"type":"text","text":" 參數,這樣在我們嵌入這棵樹的所有分子的時候,我們都會在這個 word 變量值上追加當前節點的字母,最後整個分支被訪問後,疊加出來的就是我們單詞的全部字母了!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後我們用一個變量 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"max","attrs":{}}],"attrs":{}},{"type":"text","text":" 來記錄最後這個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"word","attrs":{}}],"attrs":{}},{"type":"text","text":" 出現的次數,也就是每一個單詞出現的次數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"聽的一臉懵逼了沒有?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a6/a6b050f46c6b36fd26c6181d1b746e33.gif","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"沒聽懵的同學,我給你們點個贊,也希望我寫的解釋可以讓大部分的同學聽得懂這部分的邏輯。不過要知道要聽懂這部分的算法邏輯,必須對 \"","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"數據結構","attrs":{}}],"attrs":{}},{"type":"text","text":"\" 中的 \"","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"樹","attrs":{}}],"attrs":{}},{"type":"text","text":"\" 有一定的瞭解。如果沒有了解這部分的知識,推薦同學們去補一下這方面的知識。​","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不過就算蒙了,也可以看看代碼,可能就突然腦洞大開了呢?!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"javascript"},"content":[{"type":"text","text":"class Trie {\n /** 構建函數 **/\n constructor() {\n this.root = Object.create(null);\n }\n\n /**\n * 添加樹節點\n * @param {String} word 字符\n */\n insert(word) {\n let node = this.root;\n\n for (let c of word) {\n if (!node[c]) node[c] = Object.create(null);\n node = node[c];\n }\n\n if (!($ in node)) node[$] = 0;\n\n node[$]++;\n }\n\n /** 統計最高頻單詞 */\n most() {\n let max = 0; // 出現總次數\n let maxWord = null; // 單詞\n /**\n * 訪問單詞的遞歸方法\n * @param {Object} 節點\n * @param {String} 拼接的單詞\n */\n let visit = (node, word) => {\n // 遇到單詞結束符\n if (node[$] && node[$] > max) {\n max = node[$];\n maxWord = word;\n }\n\n // 遞歸樹的整個分支\n for (let p in node) {\n visit(node[p], word + p);\n }\n };\n\n visit(this.root, '');\n console.log(maxWord, max);\n }\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後我們在 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"console","attrs":{}}],"attrs":{}},{"type":"text","text":" 中執行 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"trie.most()","attrs":{}}],"attrs":{}},{"type":"text","text":" 就會輸出我們出現頻率最高的單詞:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2f/2f02780ebb1aa0ca49a0989c110bf407.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏我們看到 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"maek","attrs":{}}],"attrs":{}},{"type":"text","text":" 這個字符在 10 萬個隨機單詞中出現了最多次,一共是 5 次。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 26 的 4 次方的單詞量中,其實這個數還是蠻大的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"等等,26 的 4 次方?這個是什麼?如果我們回去看看我們隨機生成單詞的代碼,我們隨機生成了 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"4","attrs":{}}],"attrs":{}},{"type":"text","text":" 個字母的單詞,我們一共有 26 個字母,所以 4 個字母的單詞一共有多少個組合呢?數學學的好的同學應該知道,在數學中我們可以用 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"可能有的種類數","attrs":{}}],"attrs":{}},{"type":"text","text":" 的 n 次方就是這組合的可能出現的組合數。這裏我們是 4 個字母的組合,所以 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"n","attrs":{}}],"attrs":{}},{"type":"text","text":" 就是 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"4","attrs":{}}],"attrs":{}},{"type":"text","text":"。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不知道我講的是什麼,可以去看一下數學中的 \"排列與組合\" 的理論知識哦。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"推薦同學直接點擊這個傳送門 《 \"","attrs":{}},{"type":"link","attrs":{"href":"https://zhuanlan.zhihu.com/p/41855459","title":""},"content":[{"type":"text","text":"5 分鐘徹底瞭解排列組合","attrs":{}}]},{"type":"text","text":"\"》","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於這個 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Trie","attrs":{}}],"attrs":{}},{"type":"text","text":" 樹,我們這裏就展示了 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Trie","attrs":{}}],"attrs":{}},{"type":"text","text":" 去求出現最多的次數的功能。實際上我們通過 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"Trie","attrs":{}}],"attrs":{}},{"type":"text","text":" 樹還可以找到字典樹中最小、字典樹中最大,這樣的值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"​在 1 萬以內的量級,我們想在它們中求最大,求最小,不管這個數字有多少個我們都是可以比較方便地去處理的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這就是字典樹在處理大量的輸入和字符串類的問題時候的能力。字典樹其實他是哈希樹的一種特例,這個哈希樹在字符串領域裏面 ,它最直接的應用的體現就是字典樹。如果說我們處理數字,我們就可以用別的哈希算法來構造別的哈希樹。因爲我們這裏不是主要學習算法,主要還是把字符串這一類常見的問題跟同學們一起了解清楚。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家都學會了嗎?學會了的點個贊,沒有學會的點個收藏,覺得這些文章非常值得一看的,給我來個三連吧~謝謝!","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一期我們就講到這裏,下一期我們就來一起深入瞭解以下 KMP 算法的相關知識!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/23/237db88ffb40799313e26780f79aad9e.gif","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"博主開始在B站每天直播學習,早上 06:00 點到晚上 11:30 。 歡迎過來《","attrs":{}},{"type":"link","attrs":{"href":"https://live.bilibili.com/22619211","title":""},"content":[{"type":"text","text":"直播間","attrs":{}}]},{"type":"text","text":"》一起學習。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在這裏互相監督,互相鼓勵,互相努力走上人生學習之路,讓學習改變我們生活!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"學習的路上,很枯燥,很寂寞,但是希望這樣可以給我們彼此帶來多一點陪伴,多一點鼓勵。我們一起加油吧! (๑ •̀ㅂ•́)و","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我是來自《","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"技術銀河","attrs":{}},{"type":"text","text":"》的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"三鑽","attrs":{}},{"type":"text","text":":\"學習是爲了成長,成長是爲了不退步。堅持才能成功,失敗只是因爲沒有堅持。同學們加油哦!下期見!\"","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":1}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章