信息聚合系統的數據庫後臺（比如RSS訂閱，feedly）應該如何設計？

原創

2020-02-21 03:25

我想起之前有研究生同學曾經參與一個實習項目，他們用SQL數據庫來實現一個RSS訂閱聚合系統，結果遇到了擴展性問題：當RSS源達到上千的時候，併發查詢性能就已經下降到不可接受。

之後我遇到的實用的信息聚合系統：Google閱讀器、以及Feedly。Feedly的官方博客裏說它的後臺是用HBase來存的。我不禁好奇其數據架構設計到底是怎麼做的。

首先，容易想到的是，爲每篇博客文章關聯RSS源id（博客訂閱的rss url地址），及文章id（直接使用url，或者數據庫生成列），每篇博客文章需要全局順序的編號，則每個用戶的聚合訂閱相對於文章id的一個列表。這樣用戶拉取新文章相對於對前面全局文章列表的一個selective sorted io copy。

不過既然所有的博客文章都是全局序存儲的（按更新或RSS爬蟲的爬取時間），其物理存儲怎麼做水平切分呢？

能想到的最簡單的，就是對RSS源id做DHT。然後每次拉取用戶訂閱的聚合源的更新時，需要做一個並行的Fork（Scatter）-Join（Merge）查詢。這樣大概能夠解決問題了。但是僅僅對RSS源id做DHT的話，還不能解決每個不同的RSS源文章數量不同、數據量不均勻，爲使得DHT底層物理存儲更均衡，可能還要細化設計。。。

發佈了396 篇原創文章 · 獲贊 68 · 訪問量 66萬+

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

一場數據架構變革正在來臨

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-12-21 10:54:01

解讀數字化轉型下的數據安全：AI正在開闢新的可能性

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-19 14:03:54

雲原生數據庫企業Cockroach Labs再獲 2.73 億美元融資，估值高達50億美元

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-16 15:18:50

數千個數據庫、遍佈全國的物理機，京東物流全量上雲實錄 | 卓越技術團隊訪談錄

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1}},{"type":"blockquote","content":[{"type":"pa

2021-12-16 10:38:55

前車之鑑：聊聊我在基礎設施中掉過的坑

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-14 13:33:55

洞察數據庫變革趨勢，亞馬遜雲科技正在憑藉這項技術改變着遊戲規則

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-10 16:53:54

MongoDB發佈第三季度財報，雲數據庫收入增長加速

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-09 15:33:57

MySQL探祕(四):InnoDB的磁盤文件及落盤機制

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

程序员历小冰

2021-12-08 12:33:52

Oracle 大佬離職，怒噴 MySQL “糟糕的數據庫”

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-07 19:58:57

Jellyfish：爲Uber最大的存儲系統提供更節省成本的數據分層

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

Mohammed Khatib

2021-12-06 10:33:48

企業需要什麼樣的數據庫，One Size Fits All？

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-03 18:19:01

這個重要開源項目全靠一位低調的“怪老頭”維護！他和比爾蓋茨一樣撐起了計算機世界

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-12-03 14:23:56

數據庫事務的三個元問題

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null

2021-12-03 10:33:52

一個 Babelfish ，看懂雲數據庫的發展方向

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1}},{"type":"paragraph","attrs":{"indent":0,"nu

2021-12-01 18:43:50

數據庫內核雜談(二十一): 流處理系統簡介

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"typ

2021-11-24 10:38:57

24小時熱門文章

最新文章

最新評論文章