九章雲極公佈開源AutoML力作:DAT大幅提升AI建模效率,建模時間節約10倍

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器學習模型在開發的過程中面臨數據資源不足、人才有限、技術門檻高等挑戰,而利用AutoML ,不論你有沒有機器學習相關的背景,哪怕你是個小白,都可以通過 AutoML 簡單、高效地進行工作所需的模型訓練,AutoML甚至被稱爲下一代機器學習系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"什麼是 AutoML?AutoML 是 Automated\/Automatic Machine Learning 的縮寫,是要讓機器自動完成建模、自動調參的工作。使用 AutoML,可以自動完成神經結構搜索、模型選擇、特徵工程、超參調優、模型壓縮等任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AutoML 之所以重要,是因爲它節省了時間和資源,省去了許多手動工作,並使數據科學家能夠更快、更有效地交付業務價值。不少業內人士認爲,對於整個 AI 領域來說,AutoML 一定是下一個時代發展重點,並且極有可能是機器學習的“大殺器”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IDC中國研究總監盧言霞表示,自動化機器學習是未來五年人工智能領域的六大技術趨勢之一。自動化機器學習是推進行業AI應用落地的重要技術路徑,將在降低AI應用門檻、培育AI人才、繁榮AI生態等方面產生深遠影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近幾年,AutoML挺火。谷歌、微軟等海外大廠紛紛入局 AutoML,國內也有一批科技公司如九章雲極、第四範式等推出了自研的 AutoML 平臺,國產AutoML發展大有可爲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近日,InfoQ瞭解到,九章雲極在AutoML開源方面有了新動作。10月25日,九章雲極公佈了最新2項AI開源項目—— "},{"type":"text","marks":[{"type":"strong"}],"text":"面向自主建模、自動建模的DAT開源產品"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"strong"}],"text":"面向高併發、能夠做實時分析的DingoDB數據庫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"九章雲極DataCanvas聯合創始人暨CTO尚明棟表示,成立8年來,九章雲極一直希望解決讓數據分析既快又簡單這兩個核心問題。一方面通過機器學習和深度學習的自動化,將機器學習建模的能力下沉,實現AI能力的普及化。另一方面,讓數據分析的速度越來越快,服務越來越及時,從準實時變成毫秒級的實時響應。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"DAT開源AutoML爲AI賦能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據DAT負責人、九章雲極DataCanvas 資深架構師楊建介紹,DAT(DataCanvas AutoML Toolkit)是一個自動機器學習工具套件包,它包含了一系列功能強大的AutoML開源工具,從底層的通用自動機器學習框架到用於結構化及非結構化領域端到端的自動建模工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"楊建表示,整個DAT裏的所有工具可以面向不同的用戶,每一個工具都可以單獨使用。整個DAT的工具站,從面向任務來分,同時可覆蓋結構化和非結構化;從面向人羣來分,既可以面向於專業的AI從業人員,也可以讓沒有專業AI背景的人員利用AutoML相應的工具使用,既可以滿足AI使用者的需求,還有面向AutoML工具開發者的相應框架。因此,DAT並不是一個面向某一個場景來開發的工具,希望AutoML能夠面向於不同人羣,從不同角度和各個層面全方位地釋放AutoML能力,爲用戶來賦能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DAT的所有項目都是以開源方式來開發的,目前接收到來自於GitHub社區Star的數量超過2600個,來自於社區的安裝和下載次數超過6萬次。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DAT性能突破了機器學習建模過程中存在的不均衡、概念漂移、泛化能力、大規模數據這4大難點。DAT包含DeepTables、Hypernets、HyperGBM、Cooka。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DeepTables:用於結構化數據建模的深度學習工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepTables是一款易用的深度學習工具,僅需5行代碼就可以訓練出高質量的模型,其具有開箱即用、架構靈活、簡單易用等特點,可以滿足企業在結構化數據建模方面的大部分需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DT採用突破性的技術解決了深度學習在結構化數據上表現不佳的難題,在大量的公開數據集上擊敗了XGBoost、LightGBM等傳統算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DT裏引入了以下4種主要組件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Embedding,深度學習重要的表示學習的方法;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特徵交互層,專門針對結構化數據設計了一系列的子網絡架構,如CIN、DCN、PNN、FM等,實現特徵,實現非線性的、海量的交互學習和衍生;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特徵提取,引入了很多機制,包括Transform著名的Extraction方法,用來做特徵提取;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GBM模型融合,採用遷移學習和特徵知識提取的方法,把GBM模型裏學習到的信息融合到神經網絡裏,進一步提升整個DeepTables最後建模的效果。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Hypernets: 通用自動機器學習框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hypernets是一個自動機器學習的底層通用框架,幫助用戶快速開發專用領域的AutoML工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hypernets解決了自動建模領域的三個關鍵技術:搜索空間的表示、高效的搜索算法以及評估策略,可以與各種機器學習、深度學習框架結合開發出專用的自動機器學習工具;同時提供開放的訓練服務框架,可以滿足單節點及分佈式高性能的模型訓練需求,大大降低了AutoML工具的開發門檻;最新的神經網絡架構搜索(NAS)算法的支持,也讓深度學習的網絡架構設計實現自動化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"楊建表示,Hypernets是DAT裏的一個“重器”,它是面向於AutoML工具開發者專門設計的框架,利用它可以更自由地組裝定製化AutoML工具的框架。在實際上,Hypernets能夠滿足開發AutoML工具所需要的絕大部分能力,同時也預留了足夠的擴展空間,可以被用來定製化地滿足特定建模場景的需求,大幅降低AutoML工具開發的門檻和成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就目前企業來說,每一個建模場景都有一些自己獨特的需求,通用AutoML工具很難去應對企業在建模過程中個性化的需求。而基於Hypernets,可以簡單地幾百行、甚至幾十行代碼,就可以開發一款定製AutoML的工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"楊建舉例,公司內部一名沒有任何AutoML背景的實習生,基於Hypernets,從零開始,只用了不到兩週時間就完成了一個基於聚類算法的AutoML工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"楊建表示,九章雲極認爲,未來可能會出現更多的AutoML工具來滿足企業碎片化、個性化的建模需求,希望Hypernets能在這個過程中發揮作爲一個基礎架構應有的價值,希望Hypernets基礎框架自動建模產生的模型,未來能夠突破人類專家現有的水平。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"HyperGBM:基於GBM模型的自動建模工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HyperGBM是基於Hypernets框架融合了多款先進的GBM模型包括XGBoost、LightGBM、CatBoost模型的自動建模工具,根據先進的設計理念實現了從數據預處理、特徵衍生、特徵篩選、模型超參數優化、模型選擇、模型融合全過程的全自動機器學習,不僅能實現一鍵訓練,同時還能把整個Pipeline合成單一模型實現一鍵上線,徹底解決生產化困擾。模型效果出衆,在多個公開數據集和客戶實際業務場景上的表現超出人類專家水平。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HyperGBM具有很多高級特性。企業在建模過程中面臨的很多挑戰,如數據不均衡、概念漂移等問題,均可在HyperGBM裏自動解決。針對海量的數據量級,也提供了基於集羣的分佈式訓練能力,滿足企業在海量數據中實現自動建模的需求。HyperGBM近期還在開發提供基於GPO的硬件加速特性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"楊建介紹,HyperGBM整個建模需要的時間成本較低,HyperGBM 通常只需要人工單次訓練時間的 10 倍左右的時間就可以完成整個 AutoML 的過程。手工建模需要大量的超參數的調優,包括數據預處理、特徵加工等,需要反覆迭代實驗,通常有時需要幾十次、幾百次重複實驗,才找到一個相對滿意的模型。通常,整個建模週期需要數週到數月的週期量級。而基於HyperGBM,建模週期降到了以天爲級別的週期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,手工建模更多依賴於個人能力的上限,這給企業帶來了很多不確定性,HyperGBM的搜索相對更加穩定,在某一個搜索空間裏能找到一個最佳的Pipeline。這是它相對於手工建模的優勢。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Cooka:輕量自動機器學習系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cooka是一款界面友好的開源交互式自動機器學習系統,資源要求低,安裝在便攜式電腦中即可運行。Cooka融合了HyperGBM、HyperDT自動機器學習工具,界面簡單、操作簡便,讓沒有任何專業基礎的人員也可輕鬆完成機器學習建模工作,進一步拉低AutoML使用的門檻。藉助Cooka,使用HyperGBM和DeepTables變得更加輕鬆。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DingoDB實時交互式分析數據庫"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DingoDB是新一代集分析與服務於一體的實時分析數據庫HSAP(Hybrid Serving & Analytical Processing),支持高頻修改和查詢、實時交互式分析、實時多維分析。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"DingoDB的由來以及設計目標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"九章雲極DataCanvas 產品總監胡宗星詳細介紹了DingoDB的研發背景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"九章雲極的客戶羣體主要以金融2B爲主,在面向客戶交付時,團隊發現,企業的數據架構大多采用Lanmda架構,以P計算作爲數據處理的主線,以流計算作爲P計算的輔助,兩者相互配合來共同支撐企業的數據應用開發和數據中臺的建設。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Lanmda架構不僅是企業主流的數據架構,也是很多互聯網公司主流的數據架構。但Lanmda架構存在很多技術上的不足,如數據散列存儲的問題,存在多套存儲引擎,這導致數據融合變得非常困難;數據存在多個存儲引擎,也會讓數據的一致性和準確性變的困難,由此會在生產運維中增加數據的核對和校驗的難題;此外,基於傳統的大數據和MPP數據架構,高併發的數據服務和及時修改的能力較差,通常會在數據服務層增加各種緩存和KV來進行數據提速,提高數據服務的併發性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多套存儲引擎、計算引擎以及各種緩存的存在,讓企業的數據平臺架構變的異常複雜,學習和運維的成本變的極高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“隨着業務的演進,九章雲極借鑑了TB系統和AP系統各自的優點,衍生出了一種新的數據架構。在進行海量數據存儲的同時,能進行高併發的數據查詢,以及進行實時數據分析,這就是DingoDB誕生的主要原因”,胡宗星說, “Dingo DB不純粹的解決TB類的交易性事物問題,也不純粹的解決AP類的複雜分析問題,而是解決TB和AP中間既能提供高併發的數據服務,同時提供數據實時分析的問題,具體而言,主要解決三個方面的問題:數據存儲問題、高併發數據服務問題、數據計算的問題。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"九章雲極將DingoDB定義爲實時交互式分析數據庫。胡宗星團隊希望,通過DingoDB,數據能實時的接入、實時存儲,能夠提供一種簡潔化的方式,讓用戶能夠快速進行分析,並對分析的結果能夠得到及時的應答。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“我們的目標是將DingoDB打造成一款集分析和服務爲一體的開源數據庫,同時它能支持高併發的查詢、修改和刪除,能夠進行實時的交互式分析和多維分析,多維一體的分佈式數據庫”,胡宗星表示。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"DingoDB核心技術創新"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據介紹,DingoDB採用的核心技術有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"標準SQL"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo支持ANSI SQL語法,兼容TPC-H和TPC-DS,可以和Calcite客戶端、BI報表工具無縫銜接。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智能優化器"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo數據庫支持行存、列存和行列混合,同時表級支持多分區和副本機制。Dingo的SQL優化器基於數據的元信息提供最優執行計劃,實現行、列的自動選擇。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時高頻更新"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo數據庫能夠基於主鍵,實現數據記錄的Upsert、Delete操作;同時數據採用多分區副本機制,能夠將Upsert、Delete操作轉化爲Key-Value操作,實現高頻更新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行列混合:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo支持行存、列存和行列混合的存儲形式。針對多維分析場景的場景,爲了保證計算的時效性,Dingo能夠通過列存模式實現數據聚合計算,實現高效分析;針對記錄級的查詢、更新操作,Dingo通過行存的模式實現數據的快速定位,實現數據的查詢和更新操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"插件化模式支持多種數據的導入:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了適應不同場景的用數需求,Dingo採用插件化的模式支持多種類型的Connector,如Kafka、Pulsar、離線文件、HDFS等多種形態的Connector,實現數據的無縫接入和服務能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存算分離、彈性部署:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo將數據持久化到S3對象存儲、通過基於SQL實現執行計劃的分佈式計算,能夠實現存儲、計算的分離;數據的分區、多副本模式和數據的分佈式存儲能夠實現計算、存儲的獨立橫向擴容和彈性擴展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DingoDB的創新點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"智能優化器實現行列優化選擇"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo數據庫內置智能SQL優化器,能夠實現分析性SQL、記錄級SQL的自動優化,基於不同的業務場景實現行存模式、列存模式的智能選擇。Dingo能夠通過列存模式實現數據聚合計算,實現高效分析;針對記錄級的查詢、更新操作,Dingo通過行存的模式實現數據的快速定位,實現數據的查詢和更新操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"高頻點查、修改操作"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了滿足數據的時效性需求,Dingo數據存儲採用Key-Value的模式實現存儲,同時基於數據的副本策略實現數據的行列混合存儲。針對高頻記錄級的場景,如數據關聯、記錄修改等場景,可以實現記錄級的高併發、高頻率的查詢、修改操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"多副本機制存算彈性擴展"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo數據表採用多分區多副本機制,保證了數據的安全性和穩定性;同時存儲、計算分離的模式保證了容器化部署的橫向擴展,實現計算和存儲的數據彈性。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章