爲了讓你搞定數據庫選型,這些工程師重寫了 26 萬行代碼

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"無論多麼有主見的架構師,在做數據庫選型的時候,也可能會犯難。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統 SOL、NoSQL 還是 NewSQL?架構風格是以 久經考驗的關係型數據庫爲主,還是偏向所謂原生的分佈式架構?如果提及具體產品,那選擇就更多了,TiDB、OceanBase、PolarDB、TDSQL、GaussDB、MongoDB…… 現在還有許多服務於新場景的產品,比如處理時序數據的 TDengine,處理圖數據的 Nebula Graph……以及最老派又最完善的 Oracle。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果從業務場景或即將面臨的遷移成本來看,問題會更加複雜。牽扯到底層數據的選型和架構設計,有時更像一錘子買賣,一旦定了某種方案,再想替換代價可不是一般的大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"差不多在 10 年前,這事兒還沒有那麼棘手。Oracle、IBM Db2,二選一而已。但今時不同往日,這是一個數據量急速膨脹、業務高度複雜的時代,真正讓人焦慮的不是單純的選型問題,而是將“降本提效”推向極致的數字化轉型問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,除了咬牙硬選一個數據庫,或者基於現有數據庫的基礎上自研一套存儲方案,真的沒有其他路可走了嗎?其實也有,只不過在分佈式數據庫熱度越來越高的當下,顯得有些“透明”,那就是中間件 + 多商業數據庫的解決方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看到這個答案,你可能會有些失望。中間件方案出現的時間,尚在分佈式數據庫成熟之前。因此在業內很多架構師看來,這種方案在技術上不夠超前,只能算是某種“過渡策略”,本質上是 NewSQL 數據庫成熟前的無奈之舉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但來自金融等行業的諸多落地實踐證明,事實可能並非如此。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們爲此特別採訪了 全球頂級開源項目、數據庫中間件產品 Apache ShardingSphere 的作者,以及背後商業公司 SphereEx 的 CEO 張亮,他一直對分佈式架構設計保持關注,曾是京東科技架構專家、噹噹架構部總監。"},{"type":"text","marks":[{"type":"strong"}],"text":"關於整個數據庫行業的選型和架構設計問題,張亮有着特別的思考。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"可插拔架構,或許是答案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“需求是多元化的,一部分用戶適合分佈式數據庫,一部分用戶適合用數據庫中間件,甚至還有一部分適合兩種都用,沒有太絕對的答案”,張亮說。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是個無錯的答案,但也可以得出一個推論:"},{"type":"text","marks":[{"type":"strong"}],"text":"如果場景是多元化的,數據庫是多元化的,那麼架構師應該儘量規避擴展性、兼容性不好的數據庫解決方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無獨有偶,Oracle ACE、數據庫專家韓鋒也在其個人公衆號裏發表過類似的觀點:“(關於數據庫選型)爲了規避路線選擇、廠商綁定的風險,比較現實的方法是選擇一款兼容通用性協議的產品,並且在應用中僅使用標準數據庫的用法。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二者結合起來,我們能抽離出一些關鍵詞:中立、兼容、標準,對於很多在選型問題上難以抉擇的架構師來說,這讓中間件路線看起來更加實際。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與數據庫行業不同,以 ShardingSphere 爲代表的中間件層由於不涉及存儲引擎,如今已將目光從單純的水平擴展問題轉向業務支持和靈活性問題,也因此得以實現對異構數據庫的統一管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"靈活性、兼容性,是數據庫中間件產品的核心,也是解決“數據庫選型”問題的關鍵。據張亮透露,近兩年的主要研發重點一直是可插拔架構,以將靈活性和兼容性推向極致。好消息是,ShardingSphere 的可插拔架構預計將在未來一段時間內正式上線。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"所謂的可插拔架構,是指在架構層面,將整個系統分爲基座和插件兩部分,插件部分互相隔離、互不影響,基座可以自由接入多個插件。"},{"type":"text","text":"可插拔架構多見於相對輕量級的前端領域,屬於微前端體系的一部分,但在基礎軟件部分則相當少見。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"張亮認爲,可插拔架構形式是未來數據庫中間件的主要趨勢之一,一則產品需要高度的靈活性,二則還有大量的能力需要被構建,比如數據安全、異構數據網關等功能,可插拔架構自然成爲了產品核心。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"話雖如此,但可插拔架構的設計難度卻很大,讓人望而生畏。這種設計難度,大致分可爲兩部分來談:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一,可插拔架構是對 OCP(Open-Closed Principle)原則的一次徹底執行,力圖僅通過增加新模塊來滿足新需求,舊有模塊完全保持 0 修改。這意味着,可插拔架構要清晰地定義出,什麼是基座,什麼是插件。它對上層、下層都無感知,一切面向接口。用張亮的話說,就是:“完全面向一個抽象的、虛無的東西,不涉及任何的業務細節”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如,ShardingSphere 轉向可插拔架構後,其核心流程裏已經沒有分片功能了,分片會作爲可插拔能力的一部分接入到服務中。對於數據庫中間件來說,幾乎屬於產品重定義。與許多人對數據庫中間件的固有認知相悖,因爲在許多人的理解中,數據庫中間件不就是爲了分庫分表而存在的嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但實際情況是,"},{"type":"text","marks":[{"type":"strong"}],"text":"單體數據庫的覆蓋場景依然很多,分庫分表並不是 0 級功能。這是在架構層面,必須具備的關鍵洞察。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二,與微服務相似,只要涉及服務拆分,就會涉及顆粒度問題。對於可插拔架構來說,需要插件化的不一定只是產品功能,比如兩階段強一致事務和柔性事務,也是能夠實現可插拔的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於這些拆分問題,ShardingSphere 把可插拔架構分爲三層,分別是內核層、功能層、生態層,分別面向數據庫內核、企業功能、數據庫生態進行可插拔設計。其中,查詢優化器、分佈式事務引擎、調度引擎等是內核層的可插拔模塊;數據分片、讀寫分離、數據庫高可用、數據加密、影子庫都是功能層的可插拔模塊;數據庫協議、SQL 方言等則是生態層的可插拔模塊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要實現可插拔架構,除了設計難度和顆粒度拆分,其工作量也令人歎爲觀止。ShardingSphere 有 190 多個模塊,近 43 萬行代碼,核心 Java 代碼 29 萬行,張亮回憶道:“爲了做可插拔架構,老代碼留了不到 1\/10。”這意味着,有近 26 萬行的核心代碼被重寫了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"可插拔架構是 ShardingSphere 追求靈活性最重要的標誌之一,但它對靈活性的追求又不僅限於可插拔架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如,ShardingSphere 還額外提供了兩種部署形態,分別爲 JDBC 和 Proxy。JDBC 是 Java 訪問數據庫的標準接口,Proxy 是中間件最常見的服務形式,且兩者經常能夠在同一環境下進行混用,以滿足多用戶下多類型訪問的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣多種類型的服務接口,一方面服務了不同類型的開發人員,另一方面也實現了性能層面的可定製化,工程師可以結合場景調整數據庫分片的鍵值,實現不同場景下,性能的最大化提升;反之,“全包式”數據庫方案,則往往需要放棄部分靈活性,以相對中庸的方案來換取無感知、低侵入的使用體驗,體現了與數據庫中間件方案的差異性。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"真正的開源項目,是社區說了算"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們要做技術選型,需要注意的另外一點是,備選產品的維護主體是誰,備選產品的基因是什麼,是開源,還是閉源?這與搞清楚產品的技術方案、技術理念同樣重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當下,無論開源的熱度如何,大部分分佈式中間件、分佈式存儲、數據庫都是閉源的,這是不爭的事實。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看到的是,大量的開源創業公司正在出現,資本也在快速進入,比如 SphereEx、歐若數網、Neo4j,以及大家熟知的 PingCAP。同時也有許多數據庫宣佈開放源代碼,比如 OceanBase、Tendis、openGauss。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"爲什麼在數據存儲領域,開源這麼引人關注?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個可能的答案是,開源在技術層面的想象空間更大,對開發者更友好。就像 ShardingSphere 的可插拔架構,架構設計完成只是第一步,後續還有海量的不同模塊的開發工作。對於創業公司來說,如果不借助社區的力量,美好的可插拔架構也可能成爲公司的研發黑洞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"產品的中立性,也是導致開源項目集中迎來爆發的另一個要素。尤其是在數據存儲領域,最美好的答案可能是無依賴、跨多雲,最差的答案纔是被單一產品強綁定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,開源再好,也抵不過現實的骨感。開源兩年,Star 幾百,花錢不少,效果爲零,這恐怕是衆多開源項目的常態。社區的健康程度,往往直接定義了開源項目的生死,這導致即便架構師想做選型,也沒有太多的好選擇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"在張亮看來,一個開源項目能不能成功,大致可以分爲三個維度來考察"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一,能否耐的住寂寞,團隊是真的相信開源,還是拿開源當做商業上的捷徑。一個最簡單的考覈指標便是運營時間,“開源項目一定要度過‘靜默期’纔會迎來爆發,只做了半年、一年,是沒法預估項目未來的”,張亮說。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二,一些必備的運營技巧。張亮一方面把 ShardingSphere 捐獻給 Apache 基金會,另一方面也帶着項目參與了許多活動,比如谷歌舉辦的黑客馬拉松、編程夏令營,除國內用戶以外,也吸引了大批的海外開發者、學生參與到社區建設中來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三,觀念的轉變,也是最關鍵的部分。從小處着眼,是從“自己開發”到“社區開發”;從大處着眼,就是在真正意義上擁抱開源,而不只是嘴上說說。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“ShardingSphere 的項目發展是受社區的引導。比如說,社區認爲 ShardingSphere 該做基於影子庫的壓測和可觀察性,ShardingSphere 就真的做了。這些都不是項目自上而下的設計,只要需求爆發,且在項目的 Scope 內,就可以實現。但如果一個公司在運營開源項目時,遇見所謂的偏離主線設計的社區訴求,就拒掉它,那麼大概率也會影響這個項目的成長,因爲它不算真正紮根社區的開源項目。”"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"未來的發展方向"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"目前相關的中間件產品,還是把核心聚焦在水平分片、彈性遷移和 MySQL 實例管理上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但某種程度上,ShardingSphere 可能代表了未來數據庫中間件發展的核心方向之一,即 0 級功能是可插拔,1 級功能纔是數據分片。開源和可插拔架構結合在一起,等於打開了數據庫中間件在技術和產品維度的想象空間。張亮透露,SQL 審計、基於數據的權限引擎、多租戶、TTL(Time To Live)都會被提上開發日程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除此之外,ShardingSphere 還有一個正在開發中的構想,叫做 Database Mesh,力爭實現數據庫上雲的原生體驗,但還需要一定的開發週期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Database Mesh 會在數據庫集羣之上,封裝一層代理,做智能的負載均衡。傳統的負載層無法識別 SQL 特徵,只能用輪詢或權重的方式透傳。但 Database Mesh 會根據不同的 SQL,匹配計算實例的標籤,更加智能選擇要訪問的數據庫計算或存儲節點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於架構師而言,最重要的是打開技術選型的眼界與想象力。分佈式數據庫對業務的侵入性更低,但中間件方案規避了對廠商的依賴問題,究竟如何選擇,要以實際場景爲判斷依據。但這並不妨礙我們給出階段性的推論:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"很可能,在未來的 5 - 10 年間,數據庫中間件都是底層架構最重要的解決方案之一,值得每一個架構師認真調研。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章