從 0 到 1,建設實時OLAP

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文主要介紹 BTC.com 團隊在實時 OLAP 方面的技術演進過程及生產優化實踐,內容如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"業務背景"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"機遇挑戰"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"架構演進"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"架構優化"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"未來展望"}]}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"業務背景"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 業務介紹 - ABCD"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fb\/fb0a17873e086d8a576328227494111b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BTC.com 是一家區塊鏈技術方案提供者,我們的業務主要分爲四個部分,總結來說就是 ABCD:A 是人工智能機器學習,B 是區塊鏈,C 代表雲,D 是數據。這些模塊不僅相互獨立的,也可以互相結合。近幾年人工智能、區塊鏈的加速發展與大數據在背後提供的支持息息相關。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 業務介紹 - 區塊鏈技術方案提供商"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/04\/048b2614d7e5b9757fc5996ad3fe54c4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"區塊鏈通俗來講可以理解爲一個不可逆的分佈式賬本,我們的作用是讓大家能更好的瀏覽賬本,挖掘賬本背後的信息數據。目前比特幣的數據量級大概在幾十億到百億,數據量大概在數十T,當然我們也有其他的一些業務,如以太坊貨幣、智能合約分析服務等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整體而言我們是一家區塊鏈技術方案的提供商,提供挖礦的服務。與金融行業的銀行一樣,我們也有很多的 OLAP 需求,比如當黑客攻擊交易所或供應鏈進行資產轉移或者洗錢時,需要經過鏈上的操作,我們可以在鏈上對其進行分析,以及交易上的跟蹤,統計數據等,爲警方提供協助。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"機遇挑戰"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 之前的架構"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/02\/023f0be2f48d4169d425b5ec475bbc93.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大概 2018 年的時候,競爭對手比較少,我們整體的架構如上。底層是區塊鏈的節點,通過 Parser 不斷的解析到 MySQL ,再從 MySQL 抽取到 Hive 或者 Presto,從 Spark 跑各種定時任務分析數據,再通過可視化的查詢,得到報表或者數據。架構的問題也是顯而易見的:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不能做到實時處理數據"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存在單點問題,比方某一條鏈路突然掛掉,此時整個環節都會出現問題"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 遇到的需求與挑戰"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8b\/8b032a7662dd78183f1458c7bfa6a266.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"效率"},{"type":"text","text":",效率問題是非常常見的。我們的表大概在幾十億量級,跑這種 SQL ,可能需要很長時間, SQL 查詢比較慢,嚴重影響統計效率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"實時"},{"type":"text","text":",數據不是實時的,需要等到一定的時間纔會更新,如昨天的數據今天才能看到。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"監控"},{"type":"text","text":",實時需求,如實時風控,每當區塊鏈出現一個區塊,我們就要對它進行分析,但是區塊出現的時間是隨機的。缺乏完整的監控,有時候作業突然壞了,或者是沒達到指標,我們不能及時知道。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 技術選型我們需要考慮什麼"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8f\/8f6751786a7fb0287e036d72d5ac8626.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在技術選型的時候我們需要考慮什麼呢?首先是縮容,2020年行情不太好,大家都在盡力縮減成本,更好的活下去。在成本有限的情況下,我們如何能做更多的東西,必須提高自身的效率,同時也要保證質量。所以我們需要找到一種平衡,在成本效率還有質量這三者之間進行一定的平衡。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"架構演進"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 技術選型"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cb\/cb0ea127b0a7d5d25618a36d24b4cae1.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"俗話說,工具選的好,下班下的早,關於是否引入 Flink,我們想了很久,它和 Spark 相比優勢在哪裏?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們實際調研以後,發現 Flink 還是有很多優勢,比方說靈活的窗口,精準的語義,低延遲,支持秒級的,實時的數據處理。因爲團隊本身更熟練 Python ,所以我們當時就選擇了 PyFlink ,有專業的開發團隊支撐,近幾個版本變化比較大,實現了很多功能。在實時 OLAP 方面,數據庫我們採用了 ClickHouse 。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 爲什麼使用 ClickHouse"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f2\/f286a1b8a52e691c4cfde09277251872.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼要使用 ClickHouse ?首先是快,查詢的效率高。字節跳動,騰訊,快手等大公司都在用。同時我們也有 C++方面的技術積累,使用起來比較容易,成本不是太高。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 實時 OLAP 架構"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ac\/ac3a8730716996c9477ecf358f8fa1d2.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於以上的技術選型,我們就形成了上圖的架構,底層是數據源,包括區塊鏈的節點,通過 Parser 解析到 Kafka,Kafka 負責對接 Flink 和 Spark 任務,然後 Flink 把數據輸出到 MySQL 和 ClickHouse,支持報表導出,數據統計,數據同步,OLAP 統計等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據治理方面,我們參考了業界的分層,分成了原始層、明細層、彙總層以及應用層。我們還有機器學習的任務,這些都部署在 K8s 平臺之上。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 架構演進歷程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的架構演進過程如下圖,從 2018 年的 Spark 和 Hive ,到後來的 Tableau 可視化,今年接觸了 Flink ,下半年開始使用 ClickHouse ,後來 Flink 任務比較多了,我們開發了簡易的調度平臺,開發者只需要上傳任務,就會定時或者實時的跑任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/95\/95de927552c1ef27c092caac37950d54.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 架構演進思考"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/84\/84e63c0f929b1d7f851b7ae9d5dc33ca.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼演進這麼慢,因爲區塊鏈的發展還沒有達到一定量級,無法像某些大公司有上億級別或者 PB 級別的數據量。我們的數據量沒有那麼大,區塊鏈是一個新鮮的事物,沒有一定的歷史。另外的問題就是資源問題,由於人員不足,人員成本上也有所控制。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剛纔講的架構,我們總結了它適合怎樣的企業。首先是有一定的數據規模,比說某個企業 MySQL 只有幾千萬的數據,用 MySQL , Redis , MongoDB 都可以,就不適合這套架構。其次是需要一定的成本控制,這一整套成本算下來比 Spark 那一套會低很多。要有技術儲備,要開發瞭解相關的東西。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"區塊鏈數據的特點。數據量比較多,歷史數據基本上是不變的,實時數據相對來說是更有價值的,數據和時間存在一定的關聯。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"6. 實時 OLAP 產生的價值"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f2\/f27a9638466bd511ec887bc1cfd37ff9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實時 OLAP 上線後,基本滿足了業務需求,同時成本也在可控的範圍內。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"適合的是最好的,不要盲目追求新技術,比如數據湖,雖然好,但是我們的數據量級實際上用不到。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們不考慮建設技術中臺,我們的公司規模是中小型,部門溝通起來比較容易,沒有太多的隔閡,沒有發展到一定的組織規模,所以我們沒有打算髮展技術中臺,數據中臺,不盲目跟風上中臺。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們達到的效果是縮短了開發的時長,減少作業的運行時間。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"架構優化"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. Flink 和 ClickHouse"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8a\/8a685b0b07ee285147b9576b69b3b4a3.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Flink 和 ClickHouse 之間有一些聯動,我們自定義了三個工作。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自定義 sink 。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ClickHouse 要一次性插入很多數據,需要控制好寫入的頻次,優先寫入本地表,耗時比較多。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們主要用在智能合約的交易分析,新增的數據比較多,比較頻繁,每幾秒就有很多數據。數據上關聯比較多。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. ClickHouse 遇到的問題"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/de\/ded51dd21b761b6d04c59faba37ebf3d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"批量導入時失敗和容錯。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Upsert 的優化。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開發了常用 UDF ,大家知道 ClickHouse 官方是不支持 UDF 的嗎?只能通過打補丁,保證 ClickHouse 不會掛。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們也在做一些開源方面的跟進,做一些補丁方面的嘗試,把我們業務上,技術上常用的 UDF ,集合在一起。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 批量導入策略"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/54\/544e738e4014ceb0b0dd08e4af80575c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歷史數據,可以認爲是一種冷數據,相對來說不會經常改變。導入的時候按照大小切分,按照主鍵排序,類似於 bitcoind ,底層的 Checker 和 Fixer 工作,導入過程中及時進行報警和修復。比如導入某一數據失敗了,如何更好的及時發現,之前就只能人肉監控。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時數據,我們需要不斷解析實時數據,大家可能對重組,51%的概念不太熟悉,這裏簡單講一下,上圖最長的鏈也是最重要的鏈,它上面的一條鏈是一個重組並且分叉的一條鏈,當有一個攻擊者或者礦工去挖了上面的鏈,最終的結果會導致這條鏈被廢棄掉,拿不到任何獎勵。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果超過51%的算力,就會達到這樣的效果,成爲最長的鏈,這個是累計難度比較高的,此時我們會認爲數據導入失敗,同時我們會利用回撤的功能,不斷將其回滾和重組,直到滿足最完整的鏈。當然我們也會設置一些記錄和 CheckPoint ,這裏的 CheckPoint 和 Flink 的 CheckPoint 的概念也有所區別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它是區塊鏈方面的 CheckPoint ,區塊鏈有一個幣種叫 bch ,會定義 CheckPoint,當滿足一定的長度時,它就無法再進行回滾,避免了攻擊者的攻擊。我們主要是利用 CheckPoint 記錄信息,防止回滾,同時還會按照級別\/表記錄批量插入的失敗或者成功,如果失敗則會進行重試,以及報警回滾等操作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. Upsert 的優化"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ca\/ca25d7d22207a72a5bb5c3e1c7c27bfd.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ClickHouse 不支持 Upsert ,主要在 SDK 方面做兼容,之前是直接往 MySQL 寫數據,目標是通過 SQL 語句修改對應的 SDK 增加臨時小表的 join ,通過 join 臨時小表,進行 Upsert 的操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉個例子,區塊鏈地址賬戶餘額,就像銀行的賬戶餘額,必須非常精確。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. Kubernetes 方面優化"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cf\/cf2bd4c9ef8a5b8cad0b8a3ffba35064.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 方面的優化。Kubernetes 是一個很完整的平臺。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高可用的存儲,在早期的時候,我們就儘可能的將服務部署在 Kubernetes,包括 Flink 集羣,基礎業務組件,幣種節點,ClickHouse 節點,在這方面 ClickHouse 做的比較好,方便兼容,支持高可用操作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持橫向擴展。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務發現方面,我們做了一些定製。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":" 6. 如何保證一致性?"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/51\/517c5f107f6d3de45882e20f9c9b1f20.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用 Final 進行查詢,等待數據合併完成。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據方面的話,實現冪等性,保證唯一性,通過主鍵排序,整理出來一組數據,再寫入。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"寫入異常時就及時修復和回填,保證最終一致性。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"7. 監控"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/85\/85262f672a0565bd79568718660ab49f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 Prometheus 作爲監控工具。使用方便,成本較低。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五未來展望"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 從 1 到 2"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bd\/bd5924a78706f27aaa07eb256159ab94.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"擴展更多的業務和數據。之前我們的業務模式比較單一,只有數據方面的統計,之後會挖掘更多信息,包括鏈上追蹤,金融方面的審計。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"賺更多的錢,儘可能的活下去,我們才能去做更多的事情,去探索更多的盈利模式。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"跟進 Flink 和 PyFlink 的生態,積極參與開源的工作,優化相關作業。探索多 sink 方面的工作,原生 Kubernetes 的實踐。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 從 2 到 3"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a3\/a39d7952dd1a9bd31478cb610cbb1c76.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據建模的規範,規定手段,操作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Flink 和機器學習相結合。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爭取拿到實時在線訓練的業務,Flink 做實時監控,是非常不錯的選擇。大公司都已經有相關的實踐。包括報警等操作。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總的來說的話,路漫漫其修遠兮,使用 Flink 真不錯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:DataFunTalk(ID:dataFunTalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/uuSD3mURa87V0BtmxLyvNA","title":"xxx","type":null},"content":[{"type":"text","text":"從 0 到 1,建設實時OLAP"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章