分析apriori 算法的 trie實現

原創

2020-02-21 02:22

簡單來說trie 就是一個 ordered tree 排列依據可以是 alpha 也可以是數值。並且是遞歸的。這樣的方式即可以大量壓縮同前綴的串，也可以可容易作到子樹的融合。生成apiori的candidate 與接下來的刪枝就可以在一個樹上做了。算法和代碼，論文來源是《A fast APRIORI implementation》 Ferenc Bodon。本來他在另一片文章中說，數據庫讀入內存後也用trie來存的這樣在每次生成k-item 之後就可以刪減樣本數據庫的大小了，可是因爲lack of time 就麼有做了-_-! 嗯，怎麼說呢，這就是學校的科研啊，出論文是王道啊，實現？咳咳，該招個研究生了吧？

我之所以又回頭看看這個實現是因爲跟同學聊天到現在市面上已經有了 2T 內存 3T硬盤的筆記本了，進XP 是瞬間。差不多1萬刀。那麼硬件那麼賤價了拿來幹什麼好呢？那就算東西吧！以前在政府 Bureaucracy 或者大學纔有能力做的大規模數據分析工作，現在一個hacker在家也能倒弄了。1萬刀雖然貴但是比小型機可是便宜太多了。設想一下，拿一個T來存數據庫的trie ，設數據壓比例達到十，（如果是人名的話不希奇）那就有10995116277760byte 也就是說能存下1萬億個字，做什麼分析都行了。。。那麼從這個角度而言，計算機的發展的侷限也就越來越體現在軟件上了。。。。

A better solution would be to apply
trie, because map does not make use of the fact that two
baskets can have the same prexes. Hence insertion of a
basket would be faster, and the memory need would be
smaller, since the same prexes would be stored just once.
Because of the lack of time trie-based basket storing was
not implemented and we do not delete a reduced basket
from the map if it did not contain any candidate during some scan.

Our APRIORI implementation can be further improved
if trie is used to store reduced basket, and a reduced basket
is removed if it does not contain any candidate.

wineceramic

發佈了85 篇原創文章 · 獲贊 0 · 訪問量 13萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

分析apriori 算法的 trie實現

24-5-18 X

ACE/TAO 的 rt_event服務代碼分析

ogre和physx二sdk整合筆記

tnl 的 masterServer, client server 架構學習筆記

分析apriori 算法的 trie實現

計算機圖形學教學與業界需求的區別

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結