MLSQL 一分鐘讓 Kylin 裝備 ETL 能力!

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在上週舉辦的 Apache Kylin + MLSQL Meetup 中,我們邀請了來自 MLSQL 社區的技術大佬 "},{"type":"text","marks":[{"type":"strong"}],"text":"祝威廉"},{"type":"text","text":" 來進行分享。大家都知道 Kylin 一向以強大的分析能力和豐富的周邊生態而備受歡迎,但是 Kylin 自身還欠缺一些 ETL 能力。在本次分享中,祝威廉演示瞭如何在 Kylin 中快速完成數據處理,用戶不用離開 Kylin 即可完成大規模數據分析整個 Pipeline,同時也分享了未來 Kylin 和 MLSQL 之間聯動的更多可能性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下是祝威廉在 Kylin Meetup 的演講實錄"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/11\/bf\/11c2e1301d9883b7ea6262ffyyec8dbf.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次分享主要分爲三部分,首先會宏觀地去看一下 Kylin 和 MLSQL 這兩者爲什麼是互補的,其次會進行一個 Demo 的演示,如何在 Kylin 裏面通過簡單幾行代碼的修改,就能夠支持 MLSQL 的語言,從而完成一些 ETL 的工作。因爲 Demo 目前只演示了其中一個場景,我會在第三部分來分享 Kylin 和 MLSQL 之間未來更多的可能性,兩者其實是有許多可以聯動的地方,大家也可以期待兩者聯動實現更好的效果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kylin 和 MLSQL 爲什麼是互補的"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先來介紹一下爲什麼說 Kylin 和 MLSQL 是互補的,如上圖所示,目前在整個數據分析領域,主要有三大部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fe\/53\/fe496b6d4060df45cdda5f02609yy053.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一部分是 BI(Business Intelligence);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二部分是數據處理,其中包含了目前正熱的流處理,以及傳統的批處理等;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三部分是 AI(Artificial Intelligence)。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 其實是非常好地覆蓋了 BI 這一部分,並且具有非常完美的生態系統,Kylin 對大部分 BI 工具都是支持的,具備一部分的處理功能比如說多維聚合分析 。用戶使用 Kylin 去做數據分析,其實就是利用它的數據處理功能。但如果涉及到數據處理中 ETL 相關的功能,Kylin 目前是不太適合去做的,比如一部分流式的數據處理,不過這也正是 MLSQL 可以覆蓋的。MLSQL 在 BI 這一部分並不擅長,但是在數據處理和 AI 這兩部分是比較強的。如果 Kylin 和 MLSQL 兩者進行整合,那麼基本就可以完美覆蓋大數據分析最主要的三個領域。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Kylin 簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先來簡單介紹一下 Kylin。Kylin 是一個 OLAP 的系統,支持 "},{"type":"text","marks":[{"type":"strong"}],"text":"高併發、亞秒級查詢"},{"type":"text","text":" ,這兩點優勢其實是許多工具難以超越的。同時,Kylin "},{"type":"text","marks":[{"type":"strong"}],"text":"生態完備"},{"type":"text","text":" ,支持各種 BI 工具,比如像 Tableau PowerBI 等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/db\/f5\/db5626f906d45877b17388e9f37a10f5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上面 Kylin 官網架構圖所示,Kylin 有一個 Cube Build Engine,也就是用來構建 Cube 的過程。這個過程意味着它必須有數據來源,這個過程可能存在兩個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一是 Build Engine 其實更加側重的是在數據建模這一塊,數據來源需要進行加工才能適合去做建模,所以它其實是缺乏較強的 ETL 能力;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二是 Kylin 的很多數據源是要單獨去支持的,MLSQL 本身已經支持非常多的數據源了,如果把兩者打通,那就意味着這兩部分 Kylin 無需自己支持,直接藉助一些工具就能夠實現,比如前面提到的 CSV 數據源。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前有兩種方式:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方式一:大家可以自建一個 ETL 的基礎設施;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方式二:如果 Kylin 本身可以支持,用戶就不用離開 Kylin 就可以完成這件事情了,對用戶而言,就能帶來更好的使用體驗。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"MLSQL 簡介"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大部分 Kylin 用戶可能暫時對 MLSQL 不太瞭解,我先來進行一個簡單介紹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要從以下兩個方面進行介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1e\/84\/1ee4f6b2253be6c86e46117d0c53dd84.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第一方面, MLSQL 是一個面向大數據和 AI 設計的語言。"},{"type":"text","text":" 正常情況下,使用大數據,用戶可以用 Scala 或者 JAVA 語言;使用 AI 的用戶可能會用 Python 語言;其實數據處理大家真正用的最多的應該是 SQL。那爲什麼還需要像 MLSQL 這樣的語言呢?簡單來說,如果用戶去用 Scala 寫代碼,其實是需要深入掌握和應用這門語言的,另外包括 Python 也是存在一定的使用門檻的。SQL 很簡單,但是如果想實現 AI 或是更強的 ETL 功能,SQL 就會存在能力欠缺的問題。在這樣的情況下,MLSQL 其實就變成很有必要的一種語言了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二方面,前面也提到 MLSQL 是一種語言,任何一種語言都一定要有一個實現,就像我們之前提到的 JAVA語言,就有 JVM 虛擬機的實現。其實 MLSQL 底層也有一個實現這門語言的引擎,本質上底層是基於 Spark 做的引擎。 "},{"type":"text","marks":[{"type":"strong"}],"text":"MLSQL 的目標其實是要做一個真正整合數據管理、商業分析和機器學習的統一的語言和平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來介紹 MLSQL 的一些語法,讓大家簡單熟悉一下。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f8\/d5\/f82dbae95094e4972feaf9dde7220ed5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,在加載數據的部分,在 SQL 裏必須要有 Catalog 纔可以實現,默認數據源可能是 Hive。如果用戶想加載其他數據,用 SQL 會比較麻煩。在這一部分,MLSQL 進行了一些拓展,比如使用 MLSQL 的 Load 語句,用戶就可以加載任意數據源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次是可以用 SQL 去做標準的數據處理,同時處理完的結果可以進行保存。如上圖所示,左側是 AI 訓練,MLSQL 其實是把 Python 和 SQL 進行了融合,用戶可以用 Python 去直接使用 SQL 的表, Python 處理後的結果也可以作爲表繼續在 SQL 裏使用。這裏只是一些簡單介紹,更多內容大家可以關注 MLSQL 的官方文檔。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"http:\/\/docs.mlsql.tech\/mlsql-console\/"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"如何不離開 Kylin 就能完成 ETL"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上簡單介紹了 Kylin 和 MLSQL,要想整合兩者,首先必須滿足以下幾個標準:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"首先,整合成本必須要低。"},{"type":"text","text":" 最差的情況下,是可能需要去修改一丟丟代碼的,實際上,我和Kylin社區的Committer張智超也去進行了一些實踐。在極致的情況下,只需要修改兩行代碼就可以完成整合。當然,這個過程中可能會新增一些文件,但是修改的部分可能只有幾行代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"其次,MLSQL 希望優先補全 Kylin 的 ETL 能力,可以實現在 Kylin 裏面就可以直接進行 ETL。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/bd\/32\/bd989db67516490da949381512bf5032.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"考慮到以上兩點,大家可以看到上圖裏做了一個簡單註解是 "},{"type":"text","marks":[{"type":"strong"}],"text":"--%mlsql"},{"type":"text","text":" ,後面可以跟一個 MLSQL 引擎地址,有了這個註解,即可把整個框中的腳本內容發到 MLSQL 引擎裏面去執行。如果沒有這個註解,就會按照 Kylin 的執行邏輯去執行。有了初步思路,我們去做了一個演示 Demo,大家可以一起來看看演示效果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Demo 演示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Demo 中的這個界面大家應該非常熟悉,這是 Kylin 的運行頁面。今天分享的 Demo 就是演示用戶如何在 Kylin 的產品裏,將一個 CSV 文件導入到 Hive 以供 Kylin 進行建模,整個過程僅修改幾行代碼就能把 ETL 的處理完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"http:\/\/mpvideo.qpic.cn\/0b78paaauaaauyajyq2z3bqfa6gdbj4aacqa.f10002.mp4?dis_k=882c3b642a1e2c26f0a2e1e467784012&dis_t=1617263061&spec_id=MzAwODE3ODU5MA%3D%3D1617263062&vid=wxv_1793346667862130694&format_id=10002"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了上面展示的 Demo 以外,面向未來,MLSQL 在 Kylin 裏可以和 Kylin API 進行對接,或者使用任務調度器 Airflow 等,就可以把寫好的腳本保存並運轉起來,其實就相當於已經完成了一個 ETL 的工具集了。用戶無需自行搭建 ETL 平臺,就能在 Kylin 頁面完成整個流程。實現以後,用戶可以直接對接 Kylin 數據,既可以使用 Kylin 進行加速,比如實現亞秒級響應速度,同時也可以寫一個較爲複雜的數據處理的流程,去查看數據的情況等。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kylin 與 MLSQL 更多聯動方式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上分享的這個 Demo 解決的核心問題是 ETL 能力,其實理論上,用戶也是可以直接在剛剛的界面上做一些 AI 相關的操作。當用戶拿到數據或通過 Kylin 得到一些分析結果,這個時候想去做一些算法相關的工作,就可以用到上面這種最簡單的聯動方式。其實 Kylin 和 MLSQL 聯動的方式存在多種,這裏簡單介紹以下兩種:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第一種是前置聯動"},{"type":"text","text":" ,如果這是一個 Pipeline 的話,那麼 MLSQL 應該是在 Kylin 的前半部分, Kylin 是面向用戶的或者說面向報表的,MLSQL 就可以幫助 Kylin 完成 ETL 的部分。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第二種是後置聯動"},{"type":"text","text":" , MLSQL 本身是支持多數據源的,這也就意味着 MLSQL 可以連接 Kylin 的結果,再和 HDFS 或 HIVE 的結果再進行 Join 以及其他計算。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今天我們所講的主要是前置聯動,也分爲兩種:淺聯動和深聯動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b9\/ec\/b9dda1218033491249af3c0e1391b8ec.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次 Demo 演示的就是淺前置聯動,也就是正常情況下進行的 Pipeline,從各種數據源,經過 ETL 處理到 HIVE 或者 Kafka,之後由 Kylin 進行 Cube 的構建,再面向用戶進行查詢或者面向報表以及其他複雜的操作。這整個 ETL 的過程其實是可以通過 MLSQL Engine 去完成的,當然用戶也可以直接使用一個比較好的控制檯去 MLSQL Engine 裏去執行任務,或者使用其他的引擎等。在本次 Demo 示例裏面, "},{"type":"text","marks":[{"type":"strong"}],"text":"Kylin 用戶可以在 Kylin UI Query 裏面直接下發 ETL 任務到引擎層,去幫助 Cube 事先準備好數據,這就是淺聯動"},{"type":"text","text":" ,因爲它的影響面比較小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/60\/7b\/600a70f76d66c380b239907299f9c57b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二種是深前置聯動。相較淺聯動,深聯動可以把 Pipeline 稍微調整一下,不僅可以下發 ETL 的任務到 MLSQL,甚至可以把 Cube 的構建任務也下發到 MLSQL。目前 Kylin 4.0 其實已經支持 Spark 引擎,所以把 Cube 任務給下沉到 MLSQL 實現起來也比較簡單。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此處小劇透,有讚的鄭生俊老師會分享在 Kylin 3.0 把 Cube 的任務內置到 Spark 引擎,並且用 K8s 去跑,已經被驗證是可行的。這其實是一個比較深的聯動,相當於把 Kylin 的 Cube 任務都下沉到 MLSQL 引擎層了。這也是更適合引擎層面去實現的,未來 MLSQL 既可以作爲一個常駐的任務,也可以在 Kylin 去啓動執行一個任務後,再把資源放掉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上就是對深聯動和淺聯動的簡單介紹。關於後置聯動,未來會有相關技術博客和演示分享。從 MLSQL 社區的角度來講,首先 MLSQL 社區接下來會嘗試引入 Cube 的構建功能,也就是前面提及的與 Kylin 的深層次聯動;其次,未來 MLSQL 會提供一個更友好的方式以供 Kylin 銜接;當然,我們也有可能給 Kylin 社區去貢獻類似嵌入 ETL 能力的插件體系,不僅可以使用 MLSQL,Spark,還可以使用更多其他 ETL 工具,從而實現讓用戶可以在 Kylin 裏就完成整個過程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"祝海林(祝威廉),新一代開源數據與 AI 處理語言 MLSQL 作者,現爲 Kyligence 技術合夥人 & 資深架構師,擁有 10+ 年研發經驗,近 6 年專注於數據管理、商業分析、機器學習的統一平臺的設計和開發。長期活躍在 Spark、Ray、Kylin 等多個開源社區,有多年的開源社區運營經驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號ApacheKylin(ID:apachekylin)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653082286&idx=1&sn=3c3b19fcc6ade0fc86cdc72d0a2e40c8&chksm=80a4ac5fb7d32549f25c726b539e99db400b8149a378d13e1d313fc173c98fa91ab67338113d&token=1020922772&lang=zh_CN#rd","title":"","type":null},"content":[{"type":"text","text":"MLSQL 一分鐘讓 Kylin 裝備 ETL 能力!"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章