字節跳動萬億級圖數據庫的應用與挑戰

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、瞭解圖數據庫"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、開胃菜:公司業務場景的難題"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c5\/c586636f4c66fd99e01b53c666769aff.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先介紹字節跳動的一個產品——抖音,作爲我們本次分享的開胃菜。大家都知道目前抖音的視頻的推薦,是通過算法去做的,這當中肯定會有一些最核心的基礎數據的存儲,比如說用戶之間的關注關係、視頻點讚的這種關係,以及用戶的相互之間的通訊錄的關係,是基於這些關係的數據去做推薦算法的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何去存儲這些數據呢?這本身是一個非常有挑戰性的問題,所以基於用戶的關係來做這個內容或者是用戶的推薦,從算法的角度上來說,是一個核心指標提升很大的事情。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"做推薦顯然是會基於二度和多度的關係,基於pattern來做推薦的,pattern本身是有很多種組合的,如何去基於這些組合去求解,這本身也是作爲技術架構需要有這樣的能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼我們現在先假設一下原來的時限。比如說我要做二度關係,根據二度鄰居的關注關係的結果做查詢,那麼以前的方法是首先要從線上MySQL,把用戶的關係dump到hive上,然後將多張的hive表做這樣的一個join,然後產出這樣的二度關係,然後再將二度關係導入一個在線的KV系統,用於做推薦,然後上述的過程每天會去執行幾次。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那可以想象在6億+DAU的抖音,它的整個數據量是非常大的,所以這樣的一條條數據下來,它的時間成本和算力成本也是非常大的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,通過算法推薦這個方式,得到的數據已經很舊了,可能是幾天以前的數據,所以導致它推薦不實時,以及策略迭代的代價太大了,中間任何一個環節出問題,可能就會導致整個過程的失敗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三個問題就是日活如此高的抖音,如何實現高併發,以及毫秒級的查詢延時,是比較有挑戰的一件事情。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面,我們來看看圖數據庫的好處是什麼?"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、圖數據庫與關係型數據庫"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們需要做一個簡單的分析,圖數據庫相比於傳統的關係型數據庫,它的區別是什麼?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先圖非常很簡單,由最基礎的點、邊以及屬性所構成的,而關係型數據庫當中,當我對基於一種關係或者說某種特定的條件去做查詢的時候,通常的一個技術是多表的join。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那圖數據庫對於同樣的工作量,它的反應是通過這個圖上的遍歷(traversal)去實現的,操作更加高效。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲圖數據庫本身是基於點和邊所組成的,所以從固定的點去做traversal,它的形成消息分發是一個更加細粒度的分發,所以相當於在一個大的分佈式架構上,它會變得更細粒度,對內存的開銷、網絡的開銷都會變得更小,這是背後的系統層面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單舉一個例子:我如何去做一鳴的好友,然後我再去查詢好友的公司有多少名員工?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在關係型數據庫當中通常去需要有這樣的一些 table,比如說公司的信息表、僱傭關係表、員工信息表等,用MySQL去查,可能就需要寫成如下圖這樣的 SQL語句。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/41\/415d0477c21f72cbd1c68f62cbc70dc3.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而基於圖數據庫,我們用gremlin這樣一種查詢語言去實現,簡單的一行就能把它非常直白地寫完。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我會簡單介紹一下圖數據庫裏面流行的語言,以及整個圖數據庫裏面的體系。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3、圖數據庫業界對角度分類"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從語言生態、架構設計、集羣規模、場景上分圖數據庫:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c5\/c57774a2c65f2f73355b3613b15dfa0e.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、適用場景介紹舉例"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、ByteGraph的發展歷程和適用業務數據模型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,給大家介紹一下字節跳動內部ByteGraph的發展歷程:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d6\/d6af2ce9347d99c1652bf628215888b0.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前版本的ByteGraph在單機上是擁有百萬級QPS的查詢性能,並且同時支持了多維度的排序,比如說按照關注時間或者關係的親密度,分別去做排序,也就是分別會基於時間和關係的親密度去構建索引。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時我們也支持會比較支持比較複雜的這種查詢,比如說多跳的traversal,如果我要基於某一個點做一跳進而做二跳、三跳……通常這樣的查詢所涉及到的點的數量是非常多的,所以通常中間的某些子查詢是可以併發去做的,來降低延時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢語言上我們當前是基於Gremlin去做的,Gremlin本身是一個圖靈完備的語言,能表達任意的查詢;作爲一個公司百萬級數據的這樣的一個產品,可靠性是我們非常強調的,我們支持多機房容災,然後也支持數據的最終一致性。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、已上線業務場景分類"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從業務上劃分,目前我們支持了超過500多個業務集羣,服務器規模已經達到上萬臺服務器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/40\/40e4e03216b9b90c23a70ea0408897ee.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉一個例子:最開始的一個業務,抖音去存儲用戶關係的在線存儲,比如說好友關係、粉絲列表等,也有基於這些基礎數據去做推薦的,比如說抖音推薦、推人、推視頻等,是基於好友的好友等多跳的查詢去做挖掘關係,然後做關聯規則分析等一些算法上的內容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外也有知識圖譜領域,支持搜索百科、教育團隊、電商團隊等,然後去做個體的推薦。另外的,IT系統上去用graph來抽象 rapo的依賴關係,或者線上服務之間的網絡狀態等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏舉個例子,比如說抖音電商把它建模成一張圖,首先實體(點)和聯繫(邊)具體可以分成如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fe\/feefb0a4c663ae4c2f9242d0086cfebc.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們會基於不同的設備、商品、達人等構建出來非常大的一張構圖,然後這張圖上會做各種類型的推薦,或者是離線、在線的分析,甚至是在線的基於圖神經網絡的訓練等,有各種各樣的應用。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、數據模型和查詢語言"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、有向屬圖建模"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ByteGraph是基於有向屬性圖來建模的,有向屬性圖就是有點、邊和屬性來構成。點來表示的一些實體,然後通常ByteGraph內部是有一個二元組來唯一標誌一個點,二元組通常有一個用戶的 uid和他的在具體的一個應用場景下,比如說抖音、火山、頭條等應用,用不同的應用分類成不同的垂直的場景,用這樣的一個二元組來唯一標記一個實體點,然後 type標記的是一類點的集合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/05\/05baf2a654b3e5e677e7816b018fa7e5.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如說一個運動員或者一個team,他表達的是真實世界當中的一個實體,可以簡單理解成把table理解成是關係數據當中一個table,不同的 type是不同的table。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"屬性其實就是用來描述固定的一個實體,比如說姓名、性別、年齡等等,然後我們會規定同一種類型的 type點(schema)必須是一致的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同理,邊也是有一個這樣的實體,它是由點之間的關係就映射成了邊,它通常由起點加終點,以及邊上的邊類型type來描述一個事件。然後同時也會有很多種屬性,比如說它的起始時間、終止時間、發生的地點等,來描述這樣的一個事件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單介紹完了一下ByteGraph的圖建模,我們再簡單介紹一下ByteGraph的使用查詢語言。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、Gremlin查詢語言接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gremlin語言我剛纔說的是一種圖靈完備的語言,它隸屬於Apache,是Apache的一個項目之一,它規定了 Gremlin的一些不同算子所涉及到的查詢語義,但是對具體實踐是不會有一個硬性的限制的,所以這取決於不同的廠商對Gremlin的標準,完全依賴於每一個廠商自己的實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前ByteGraph支持了Gremlin的一個子集,覆蓋率到了80%左右。數據模型就是有效屬性圖模型,Gremlin相比於傳統RPC接口更靈活、表達力更好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3c\/3c5e13508f11a05447ac00b28dcfa3dd.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上述例子也可以看出,Gremlin這個語言是非常接近於自然語言的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單總結一下,基於Gremlin的查詢語言接口,用學英語來比喻的話,就分爲:學單詞、組句子、開口講這三步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e5\/e5e011f8d4ce6dae2e62d4cd4477d832.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再用另外一個非常具體的 UGC場景來舉例,基於Gremlin語言如何去表達不同類型的查詢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0d\/0dba2fd7f973ed66adc4df7514d5cb61.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到Gremlin這個查詢的寫法,相對來說比SQL還要簡單一點且非常直白,比如說我要去限制關注的大v的一個條件,會寫一個where,where裏面會寫 otherV,otherV表示當前基於關注關係的這樣的一條邊,對應的vertex一定要限制是一個點,所以這樣的一些條件限制可以寫到where語句,然後我們會基於這個時間和 tsUs然後去倒排,然後限制直取Top10,最後limit(10),然後我會把這些大v作者的名字給取出來,這個時候會再去拿到otherV然後去求它的一個value,然後 value的屬性名是name。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時ByteGraph的特點是我們支持跨集羣以及跨表的查詢,下面還是以 UGC場景舉例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ff\/ffb0c5af9f40270b24a836cddd1076b2.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設當前最上游的垂直業務當中,用戶的關係是通過一張table來存儲的,而點贊關係圖通過另外一個table來存的。現在我想這個去查詢,比如說用戶C的好友(相互關注)所喜歡的文章的列表,則是我是基於用戶C從table2開始去找,去找在table1當中文章的名字,這個時候我需要通過with Tbale語義去限制在table2中去找vertex C,然後從具有了double關注的所有用戶當中,切到table1把具有的點贊關係的文章數它給列出來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以Gremlin它也支持這樣子的一些跨表查詢的語義,來支撐跨表甚至是跨集羣的查詢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e7\/e7e6593e6e3a5ec1b03de275dcea74e8.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前ByteGraph支持了 C++\/go SDK,上右圖是基於Gremlin的語言,如果要去做一個圖查詢的話,怎樣基於Java或者是C++、Python這些SDK去寫這樣的一個基於Gremlin的查詢的例子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前的話,4種語言都支持 RPC的接口,但是基於Gremlin是暫時只支持C++和go的接口。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下開會更深入的講解一下 ByteGraph的整體架構。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、ByteGraph架構與實現"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、ByteGraph整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個系統可以分成三層:查詢引擎層、存儲引擎層,磁盤存儲層。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/71\/71cb0c17cc56e26dc081b5a341c2f12e.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"三層是相互獨立的,每一層都可以水平擴容,比如說你查詢的語言語義非常複雜,可能涉及到的step數比較多,所以它是一個計算比較重的這樣的查詢,但=內存存儲開銷可能會比較少一點,所以可以開更多的查詢引擎層(GQ)的實例,開更少的存儲引擎層(GS)實例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整體上ByteGraph是基於了一個經典的計算存儲分離的架構去做的,最底層是一個分佈式KV,分佈式KV我們用的是公司內部的一個叫Abase的一個分佈式KV系統,同時我們也支持其他的分佈式KV,比如說公司內部其實有一個 byte KV這樣的一個產品,它跟Abase區別是一個是一致性、一個重可靠性,所以整體上後端引擎是可以以熱插拔的形式去做更替的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢引擎層"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要涉及到用戶session管理、服務的proxy,然後核心的一個功能是基於用戶發過來的Gremlin請求,去做一個邏輯的邏輯查詢計劃的這樣的一個生成。然後基於邏輯上的查詢計劃,生成一個物理的查詢計劃,然後通過執行器executor把對應的子查詢給分發出去,它是由go來實現的,重高併發問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲引擎層"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這邊主要涉及到數據的存儲,因此我們會在這個模塊當中會涉及到如何把數據做切片,然後分成一個Graph partition(Graph shard),然後如何去把不同的shard內部所代表的子圖,用一種特定的數據結構把它組織起來,同時這個數據結構要有相對來說比較良好且較低的讀寫放大能力,以及它能夠在磁盤的組織形式上對磁盤比較友好,然後是順序讀寫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以如何去實現這樣的數據結構,也是我們ByteGraph比較一個核心的設計。爲了保證數據不能丟失,在存儲引擎層我們也支持了 WAL(Write-Ahead Log),同時我們也支持事務性,通過1PC的事務協議來支持的,事務性目前是當前是支持 Read Committed的事務隔離級別,這一層是通過C++寫的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"磁盤存儲層"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前是依賴於公司的第三方permission store(KV store)去做的,下個版本會自研圖原生存儲。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、ByteGraph讀寫流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後我們簡單分析一下,一個讀寫的query進來之後,它的路由機制怎樣的呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b1\/b141f45b0f34ab31c241ecdb3505e17f.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設當前的 GQ和GS層分別有不同的實例,一個寫語句進來之後,它會基於當前寫的X然後假設根據最簡單的哈希規則,它會不會映射到 GQ2的實例上,GQ收到read query之後,會基於路由規則把它打到 GS2上,然後GS2的cache層就會去找X存不存在。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果不存在的話,會把當前page從存儲的KV store裏去把它給撈上來,然後把當前寫過程給寫進去,與此同時,我們會寫入一條WAL這樣的log,把它固化到KV store,防止數據更新的丟失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果是一個讀的過程,就只是做查詢,就相對來說更簡單了,直接去基於一個GQ的實例去找應該在哪個GS的實例上,去找到 A這樣的page,如果找不到就把它從磁盤撈上來,如果找得到就直接返回結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以簡單可以把 GS層理解成緩存層,但同時我們也不是簡單的一個緩存,因爲在這一層上我們也支持數據的事務性,還有數據的防丟失的能力。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3、ByteGraph實現"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1)查詢引擎"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cc\/ccf00b013cdff1827cea840e8c3d578a.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏再簡單提及一下ByteGraph的查詢引擎,接下來我會依次對查詢引擎、存儲引擎做詳細的分享。查詢引擎這邊首先第一個是要做 parser,把一個string打進來之後,把它解析成一個查詢的語法數,基於一定的優化規則,如 RBO(rule based optimization)和CBO(cost based optimization)去生成一個邏輯的查詢計劃。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常query在垂直的業務上,所以query和pattern是比較相似的,所以我們爲了防止查詢計劃多次的生成,所以一個查詢計劃基於一個模板的情況下,我們是有緩存的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後生成了查詢計劃之後,接下來的事情就是讓 GQ層與GS層之間交互,能並行的查詢儘量並行去做,不能做的話就只能串行的去查詢,基於這樣的一個依賴關係去串行的完成這樣的查詢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時GQ層,需要去理解在存儲層上的分片邏輯,找到對應的一個數據,它在具體在GQ層還是在GS層上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,一個帶索引的查詢,它在存儲層上已經建了索引之後,這個查詢顯然不應該把它放到查詢上去做,它應該放到存儲上去做。所以我們這裏也涉及到一些算子下推的一些優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在查詢優化器上分成兩類:第一種是基於規則的優化器,第二個是基於代價的優化。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2)查詢優化器"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/61\/616a6a7b436f9e6c13f1a5abf664c992.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢優化器分成兩類:基於規則的優化和基於代價的優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在基於代價的優化中,可以用右圖來表示。執行計劃A非常顯而易見的一個方式,就是做兩票的 expand,我先找到他的一度鄰居,然後依次讓一度鄰居去找到他的二度鄰居,看有多少人當中是有他的。另外一個方式是找到了我的一跳領居之後,然後找到他的一跳入住鄰居,然後依次去做一個join,那顯然二會比一開銷很多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以我們在用戶寫出這樣的一個query之後,我們的優化器能夠找到相對cost最低的一個查詢優化的邏輯執行計劃。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3)圖分區算法"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d2\/d2485a998fca2c6fe0fac87993c3411f.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖分區的話,這一塊ByteGraph支持了不同策略的圖分區方式,比如說最簡單的基於點的起點,和邊的類型進行一致性哈希的分區方式,目前的話是在大部分場景上都是基於這樣的分區的算法來做的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ByteGraph支持了不同策略的圖分區方式,比如說最簡單的基於點的起點,和邊的類型進行一致性哈希的分區方式,目前的話是在大部分場景上都是基於這樣的一個分區的算法來做的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"知識圖譜的場景它的特點在於它的邊類型是非常多的,所以刪除之後映射到每一種類型的邊的數量相對較少,小到單機是可以完全容納這種類型的邊的所有集合,所以就不要考慮點了,完全依據邊的類型進行哈希分區。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣的話在知識圖譜這個場景下,它能大幅的降低查詢中多度查詢扇出的請求數量,也就是網絡的開銷,進而就可以降低了延時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在社交場景當中,通常來說是一個點的度的分佈,但是有一些點它的度特別大,有一些點它的度特別小,甚至沒有人關注它。在這種情況下,我們是基於 Facebook16年的一篇論文,去實現了一個這樣的,一個social hash這樣的一個算法,來保證我們做多跳鄰居查詢的時候,它的網絡開銷是比較小的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以這種情況下, ByteGraph會優先讓整個圖導入之後去做一個離線的圖分區算法,然後做完了之後再把對應的點和邊,基於這個算法映射到不同的數據。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"4)ByteGraph存儲引擎實現"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/74\/748f25b83f98b02d820ac156fcd49618.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來再講一下存儲引擎的一些細節,整體上說存儲引擎這邊可以把整個系統組件劃分成這樣的幾層。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最上層是一個跟圖語有關的讀寫的接口,然後中間這一層是涉及到如何去支持數據的事務性,以及我們如何把一個數據映射成一個圖原生的存儲,它的數據layout是什麼樣子,我們在這一層把它給解決了,然後同時我們也支持 WAL,來保證數據的更新是能夠持久化的,不會有任何數據的更新的丟失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最下面一層就是基於KV store這樣的接口,我們支持了不同類型的 KV store,比如說一些開源的HBase、RocksDB等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ① 存儲結構(一)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單講一下如何機遇KV系統能夠構建一個圖結構?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a8\/a8f73f585619e81b9e400785872a796a.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於 KV的一個建模,最簡單且直觀的方式就是一個KV對一條邊,同時它的寫放大也非常小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以寫放大就是當你想更新一條數據,這個數據可能是有一個字節數,比如說X,但是你實際上更新的這樣一個數據塊是Y,如果Y遠大於X的話,就是寫放大是非常大的,當前這樣建模它的寫放大是非常小的,因爲它的粒度很細,但是你可以想象它去做查詢,做一跳領域的查詢的時候,它的性能是退化的程度是非常大的,因爲它涉及到大量的隨機讀寫,它數據的局部性就沒有了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果用一個KV去保存一個起點的所有邊,顯然這個數據的局部性就會好,但是它的寫放大就會變得很大。比如說你改了當前對應的一個點上的一條邊,其實整個ege歷史都要被更改,這是我們設計上的一個權衡,需要做一個折中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體是怎麼做的,我們用一個類似於B樹的結構來建模Graph,對某一個點來說,一個點同一個邊type的所有的終點是一個存儲單元,也就是說我們把一個起點ID、起點type和邊type,基於它去group by,具有相同值的所有邊集合,我們會認爲它是邏輯上屬於一個分區的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果這個分區依然涉及到很多點怎麼辦呢?我們會把它作爲一個二級的拆分,所以因此會涉及到 b樹的多層級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設一個點,它基於某一種關注關係,粉絲數是1000萬,其實可以想象用一級的一個page去存肯定是不夠的,我們會把它拆分成多page。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ② 存儲結構(二)"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d3\/d34101c300e8f5805625bc901d3a3567.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一層的page就叫Meta page,它其實只是去簡單記錄了一個映射,這1000萬個鄰居當中,我們基於每2000爲一片,每一片我們把它稱作爲一個Edge page ,每一個Edge page又存儲了2000個Edge,所以用這樣一個多級拆分的這樣的方式去降低了讀寫放大的問題,同時起到了一個非常平衡的設計。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總結下來就是單起點和某一種固定的邊類型組成了一個B樹,然後B樹的每一個節點是一個KV隊,然後這裏涉及到完整性上的話,我們會限制每一個B樹的寫者只能是唯一的,以防止併發的寫入導致 B樹邏輯上的破壞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剛纔說到寫放大的問題,我們具體在當前 B樹的建模上,依然其實會存在寫放大的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ③ 日誌管理"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/90\/90b68ab05d51c71f59a7d5849453e89c.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們是如何進一步去優化寫放大的?比如說一個寫請求,過來了之後,我們其實是隻會去寫 WAL的,當它在內存當中的某一個B樹的page,當一個寫請求進來,它確實映射到page上的數據了,顯然內存中的數據是需要被更改的,但是磁盤上的數據這個時候是不需要更改的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個時候,它會寫一條相對來說尺寸比較小的WAL,再把它固化到 KV store裏,只有當這個數據再次被從磁盤上撈到內存裏的時候,我們會把原有的磁盤上的舊數據apply到新的WAL,然後就生成了最新的數據,然後把它放到內存裏,通過這樣的方式來緩解寫放大的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後同時如剛纔所說,爲了維持B樹的完整性,每個必須是有且唯一的一個WAL日誌流。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":" ④ 緩存實現"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/30\/30ea231f19c56283a1e84d0498d8f27c.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於緩存的話,我們去實現了自己的一個高性能的LRU Cache。這個不難理解,作爲一個數據庫的話,你需要有一個相對來說比較泛化的能力,不同的垂直的應用場景,它所涉及到讀寫的比例以及讀寫的QPS也是不一樣的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以我們LRU Cache是支持了不同的策略,基於不同的頻率的讀出,觸發的閾值也不一樣,比如說我們一臺物理機的內存,比如說我用到了60%,這個時候想再往上走可能就比較危險了,這個時候我就開始觸發LRU Cache的能力,把不會經常用到page,要下刷寫到磁盤上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當數據規模不變的時候、寫請求的流量增大的情況下,緩存與存儲分離的模式,它的一個優點就是可以快速的擴容,也就是把 GS這一層單獨的去加大請求的個數,來提高我的緩存的能力,但存儲層的QPS的整個的容量是不變的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖是ByteGraph的存儲層的全貌,單機的內存引擎就會長這個樣子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/17\/17e2984d330e74625893c98a7322d24c.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先會把整個圖數據模型成基於一個特定的點和它的邊類型,會把它抽象成一個B樹的數據架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果隨着讀寫流進來,特別是寫流會把一些page更新掉它的數據,同時會寫一個WAL,這個時候page會變髒,所以我們會用一個 dirty page的一個link list去記錄髒數據,髒數據積累到一定程度後,我們要把髒數據下刷到磁盤上,然後同時我們會有維護WAL這樣的一個log流,然後同時也會有這樣的一個LRU Cache來保證一個我們的物理機的內存開銷是在一個閾值之上的,有一個上界有一個下界。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、關鍵問題分析"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、ByteGraph關鍵問題之一:索引"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引我們目前是支持了全局索引和局部索引:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c6\/c6d9d0ce9882adcdc8c53600d4eaedbe.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"局部索引"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於給定的起點和邊類型,然後對邊上的屬性構建索引,比如說我基於用戶的年齡索引,顯然基於默認的屬性,比如說我們當前的默認屬性是基於時間去索引,邊就會基於時間去做排序,如果基於Edge去索引的話,那會基於Edge去做排序,這是兩個不同的B樹的組織方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全局索引"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於一個屬性值,能查到當前在整個Gragh裏面,具有特定屬性值的所有點的ID,這個是全局索引的定義。然後這裏就涉及到數據一致性的問題了,它本身是有分佈式事務的能力,所以我們通過分佈式事務能力來維護了數據與索引之間的一致性。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、ByteGraph關鍵問題之二:熱點讀寫"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子:比如說一個大v正在直播的時候,可能有很多人進入了他的直播間,回到剛纔那個例子,我們是通過一個圖來去模擬用戶與電商之間關係的,所以當有不同的用戶進到這個商家的時候,其實你可以想象成在 Graph裏面會有很多邊被寫進來了,很多人進入一個特定的商家的時候,就會造成熱點的寫問題,同樣讀也是一樣的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7c\/7c7beb76ee77cc435891d3d6ec2695ec.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3、ByteGraph關鍵問題之三:離線在線數據流融合"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/30\/30d5063a4fa660b3b4c5ba2721a86ca9.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存量數據導入:ByteGraph目前對存量的數據導入,比如它有不同的數據源存入MySQL\/Hive\/Redis\/Hbase等,我們是通過這樣的一個公司內部的平臺MapReduce去 Bulkload到我們的ByteGraph裏了,這是存量的離線的數據導入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在線數據實時寫入:在線實時的寫入是通過線上的服務調我們的Gremlin的SDK,或者是RPC的 SDK去寫入,或者也可以通過Kafka等這種消息隊列在線寫入到ByteGraph裏面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在線數據天級快照:ByteGraph也支持天級的數據快照,把一天的數據完整的放到hive裏,然後用來給上游的業務同學做離線分析或離線訓練等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":">>>>"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q&A"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q1:查詢語言Gremlin和GSQL差別大嗎?GSQL會成爲標準嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A1:"},{"type":"text","text":"據我所知,GSQL目前確實有一種趨勢會變成標準,原因是它跟SQL長得很像。但是我個人認爲,我覺得Gremlin對用戶更友好,是比較貼近於自然語言的,然後它是一個比較類pipeline的這樣一種語言,所以天生就比較適合做查詢計劃的優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲目前關係型數據庫還是主流,所以大家SQL會熟悉一點,所以會輕易的從SQL轉到GSQL上。至於GSQL會不會成爲標準,不影響當前的一個事實是Gremlin,已經被很多大廠基於查詢語言,去做了一個這樣的不同的數據庫產品出來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q2:支持HA嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A2:"},{"type":"text","text":"當前暫時是不支持的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q3:請問支持一些基本的圖計算操作嗎?比如計算三角形個數triangle counting\/listing。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A3:"},{"type":"text","text":"目前是不支持的,我們有另外的一套系統去支持它。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q4:貌似DGraph更簡單,問下,什麼原因不選擇DGraph?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A4:"},{"type":"text","text":"字節有自己特定的場景,我們不僅有國內的數據,還有國外的數據,數據量特別大,每秒鐘要支持的QPS也特別高,所以目前開源的數據庫都不能滿足我們公司內部的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q5:請問老師我們對超級節點的查詢有處理嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A5:"},{"type":"text","text":"這個是有的,我們用B樹來model這個圖,假設設定閾值爲2000,超出2000就會分裂成兩個page,以此類推。像我們抖音上人民日報的粉絲數有1億+,相當於一個超級節點,在ByteGraph的維護下,目前性能上都沒有任何問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Q6:圖數據庫也分爲類似的OLTP和OLAP嗎?還是主要應用於OLTP的場景?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"A6:"},{"type":"text","text":"目前OLTP和OLAP都是各有側重的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"陳宏智博士,目前在字節跳動基礎架構組擔任高級研發工程師角色,博士畢業於香港中文大學計算機科學與工程系,本科畢業於華中科技大學計算機學院,在計算機系統及數據庫領域發表頂會論文及期刊(e.g:EuroSys\/ SoCC\/SIGMOD\/KDD\/TPDS等)十餘篇,研究方向爲分佈式系統、分佈式計算、大規模圖存儲\/計算\/訓練系統;港府博士獎學金獲得者,微軟研究院“明日之星”榮譽獲得者。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:dbaplus社羣(ID:dbaplus)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/OC5ZH7Ve2GGjLTc9g2eXVw","title":"xxx","type":null},"content":[{"type":"text","text":"字節跳動萬億級圖數據庫的應用與挑戰"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章