一個 Babelfish ,看懂雲數據庫的發展方向

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲基礎軟件皇冠上的明珠,數據庫技術一直以來都是開發者關注的焦點。這關注度是如此之高,幾乎自然打通了學界和產業界的隔閡,以至於關於數據庫技術的每一篇重要論文面世,都可能導致一批價值數十億美金的公司出現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而在最近幾年,縱觀整個數據庫產業,雲數據庫逐漸成爲焦點中的焦點。據 Gartner.Inc 稱,到 2022 年,所有數據庫中有 75% 將部署或遷移到雲平臺,只有 5% 曾考慮返回到本地。而 IDC 認爲,到 2025 年,全球超過 50% 的數據庫將部署在公有云上;在中國市場,這個數據更爲誇張,達到了 70% 以上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/a3\/a389c7188f1892a8dede6457920dc11a.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼問題來了,如果雲數據庫,或者叫雲原生數據庫,是確鑿無疑的下一個風口。那麼其當下主要的技術和發展方向是什麼?我們該如何看待雲原生數據庫的發展趨勢?亞馬遜雲科技在 2020 年發佈的 "},{"type":"link","attrs":{"href":"https:\/\/babelfishpg.org\/","title":"xxx","type":null},"content":[{"type":"text","text":"Babelfish"}]},{"type":"text","text":" 或許能夠帶給我們一些啓發。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Babelfish,一個被人低估的重磅發佈"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/cn\/rds\/aurora\/babelfish\/","title":"xxx","type":null},"content":[{"type":"text","text":"Babelfish"}]},{"type":"text","text":" 在 2020 年的 re:Invent 上發佈,由亞馬遜雲科技 CEO Andy Jassy 宣佈。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單來說,Babelfish 是雲數據庫 Amazon Aurora PostgreSQL 的一個插件,它讓 Aurora 能夠兼容 Microsoft SQL Server 編寫的應用程序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Babelfish 剛剛發佈,Youtube 上就有許多工程師製作視頻表達了不理解。因爲自打雲數據庫出現,相關遷移服務就在產業內隨處可見,幾乎每一家公有云企業,都能提供相關遷移服務,只不過大部分是針對 Oracle 的。有一家叫做 Enterprise DB 的美國,專門提供從 Oracle 到 PostgreSQL 的遷移服務。相關代理層、SQL 語言轉換工具更是層出不窮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事實上,亞馬遜雲科技自己就有相關的遷移服務,比如 "},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/cn\/dms\/schema-conversion-tool\/","title":"xxx","type":null},"content":[{"type":"text","text":"Amazon Schema Convertion Tool"}]},{"type":"text","text":" 做架構遷移,"},{"type":"link","attrs":{"href":"https:\/\/aws.amazon.com\/cn\/dms\/","title":"xxx","type":null},"content":[{"type":"text","text":"Amazon Database Migration Service"}]},{"type":"text","text":" 做存儲遷移。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,Babelfish 存在的意義是什麼呢?多加一層代理增加後端處理成本嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際上,只遷移架構和存儲是不完整的,構建在數據庫之上的應用還沒有完成遷移。以 Babelfish 服務的場景來說,基於 Microsoft SQL Server 構建的應用使用 T-SQL 與數據庫交互,這與 PostgreSQL 完全是兩碼事。如果你想將應用也同步遷移,除非把這部分重寫一遍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這也讓數據庫遷移成爲了業內非常少見的動作,不是大家不想(畢竟誰也沒法保證最初的架構選型永遠正確),但成本實在是太高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這種遷移成本,我們可以通過一套比較通用的遷移方案來感受下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/a1\/a1250446e3de9a7ff32119404ab69229.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比起這種沉重的遷移,如果數據庫天生兼容,是不是方便太多了?這也是 Babelfish 存在最主要的意義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而許多人,會低估 Babelfish ,可能也是因爲只看見了其商業層面的意義,而沒有注意到其技術層面的難度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Oracle 和 PostgreSQL,許多特性相同,轉換尚且困難;切換到 T-SQL 和 PostgreSQL 就更加複雜了。數據庫的同步轉換要注意許多異常複雜的細節問題,包括查詢語言的轉換,存儲過程的轉換,靜態遊標的轉換,觸發器的轉換,等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/eb\/eb98e109292f9f99bf58ef0d44f06a5a.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"亞馬遜雲科技 的 Sébastien Stormacq 曾在發佈的博客中指出,在 T-SQL 中,MONEY 類型具有四位小數精度, PostgreSQL 則只有兩位小數精度,這種細微的差異可能會導致四捨五入錯誤,並對下游流程(例如財務報告)產生重大影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"他說:“在這種情況下,Babelfish 會確保保留了 SQL Server 數據類型的語義和 T-SQL 功能:我們創建了一個 MONEY 數據類型,使其行爲與 SQL Server 應用程序預期的一樣。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Babelfish 的方案是用 hooks(鉤子)方法在 PostgreSQL 內置引擎中實現,將自己暴露爲不同的數據庫(否則就只能修改 PostgreSQL 許多核心區域的代碼),其架構圖如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/fd\/fdefbb58806ae0f0613cdbf521b1a443.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"精妙之處在於,通過數據庫內核部分執行器層面的擴展開發,Babelfish 實現了 T-SQL 與 pgSQL 之間的互相調用。也就是說,新寫 PostgreSQL 代碼可以調用之前應用寫的 SQL Server 代碼。對於寫過存儲過程的朋友們來說,這個功能已經和 Babelfish 的名字一樣,帶上“科幻”色彩了。即便已經使用了最硬核的實現方式, Babelfish 也沒有完全實現兼容,ADD SIGNATURE 等一些功能、語法還沒有實現。亞馬遜官方工程師說:“SQL Server 已經發展了 30 多年,我們不希望立即支持所有功能。相反,我們專注於最常見的 T-SQL 命令並返回正確的響應或錯誤消息。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這也恰恰說明了類似遷移加速器的開發難度,也證實了爲什麼開源路線纔是最適合 Babelfish 發展的,因爲開源可以讓足夠多的開發者參與到產品迭代中來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同理,一個如此高難度的開發項目,也不太可能是無足輕重的。相反,它可能是亞馬遜雲科技 2020 年最重要的發佈之一。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據庫碎片化時代,真的來了?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"亞馬遜在雲計算領域的發佈,曾多次引導了整個產業的發展方向。比如,2012 年發佈的 Amazon Redshift 引導了雲原生數倉的發展方向,2014 年發佈的 Amazon Lambda 引導了 Serverless 的發展方向(Gartner 到 2019 年才確認 Serverless 爲未來趨勢),Amazon Aurora 本身也是雲原生數據庫的先驅產品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果說,Babelfish 也代表了一種方向,那麼或許是,數據庫碎片化的時代,真的來了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫這個產品本身因爲開發難度太高,長期以來都被少數幾家公司把控着,其中的佼佼者 Oracle 更是以極快的速度提升着商業數據庫的開發門檻。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但數據庫“單極”化發展後導致的價格高、綁定風險高等問題,也讓衆多企業逐漸難以忍受。當下,各種類型的數據庫層出不窮,關係型、鍵值、時序、圖形……讓人難以抉擇。另外一個重要的現象是,大部分雲原生數據庫都是基於 PostgreSQL 研發而來,但後續的許多研發力量卻沒有投入到高性能、高可拓展性等傳統技術概念本身。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫兼容,這一開發難度高,與性能無關的特性,卻成爲了亞馬遜雲科技的研發重點。某種意義上也說明,遍地開花的各類型數據庫還將長期存在於產業內。人們習慣認爲,產業的長期發展趨勢是從單一走向多元,最終經過市場篩選,迴歸單一。但這次,“單極”時代可能真的一去不復返了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,在 2020 Gartner 的魔力象限報告裏,雲數據庫領域有數家佔領導位置的企業,亞馬遜、微軟、Google 位居前三位。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/73\/73256cff4a23bb9ea3eef657f3a5ee71.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而就在 2019 年,前三名還是微軟、Oracle、亞馬遜。老大老三打着打着,老二沒了……"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/66\/6641556e7bf8c348c8423c67d864541e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如今,有 Babelfish 加持的 Amazon Aurora ,兼容了 Microsoft SQL Server ,恐怕受傷的還是 Oracle。雲數據庫之間的牆壁在倒塌,而傳統商業型數據庫的競爭難度在進一步加大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而乘上碎片化時代東風,發佈了 Babelfish 的 Amazon ,也順理成章的成爲了雲數據庫市場新的領頭羊。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"寫在最後"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據庫行業遠未走到終局,也不會有所謂的終局。但云原生數據庫可以獲得的優勢並不僅限於數據庫本身,比如 Amazon Aurora Serverless 提供的彈性伸縮服務,Amazon Aurora Global Database 提升了數據全球同步能力與業務連續性,Amazon DevOps Guru 將機器學習引入了應用管理。這是“合力”,將數據庫在雲上的體驗拉伸到了全新的維度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在雲數據庫領域,這種“合力”將主導接下來的市場格局。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"11 月 30 日,2021 re:Invent 又將到來,Adam Selipsky 將首次以亞馬遜雲科技新任 CEO 的身份亮相。相信雲數據庫市場的趨勢動向,也將隨之更加清晰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章