九章云极公布开源AutoML力作:DAT大幅提升AI建模效率,建模时间节约10倍

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"机器学习模型在开发的过程中面临数据资源不足、人才有限、技术门槛高等挑战,而利用AutoML ,不论你有没有机器学习相关的背景,哪怕你是个小白,都可以通过 AutoML 简单、高效地进行工作所需的模型训练,AutoML甚至被称为下一代机器学习系统。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"什么是 AutoML?AutoML 是 Automated\/Automatic Machine Learning 的缩写,是要让机器自动完成建模、自动调参的工作。使用 AutoML,可以自动完成神经结构搜索、模型选择、特征工程、超参调优、模型压缩等任务。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AutoML 之所以重要,是因为它节省了时间和资源,省去了许多手动工作,并使数据科学家能够更快、更有效地交付业务价值。不少业内人士认为,对于整个 AI 领域来说,AutoML 一定是下一个时代发展重点,并且极有可能是机器学习的“大杀器”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IDC中国研究总监卢言霞表示,自动化机器学习是未来五年人工智能领域的六大技术趋势之一。自动化机器学习是推进行业AI应用落地的重要技术路径,将在降低AI应用门槛、培育AI人才、繁荣AI生态等方面产生深远影响。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最近几年,AutoML挺火。谷歌、微软等海外大厂纷纷入局 AutoML,国内也有一批科技公司如九章云极、第四范式等推出了自研的 AutoML 平台,国产AutoML发展大有可为。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近日,InfoQ了解到,九章云极在AutoML开源方面有了新动作。10月25日,九章云极公布了最新2项AI开源项目—— "},{"type":"text","marks":[{"type":"strong"}],"text":"面向自主建模、自动建模的DAT开源产品"},{"type":"text","text":"和"},{"type":"text","marks":[{"type":"strong"}],"text":"面向高并发、能够做实时分析的DingoDB数据库。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"九章云极DataCanvas联合创始人暨CTO尚明栋表示,成立8年来,九章云极一直希望解决让数据分析既快又简单这两个核心问题。一方面通过机器学习和深度学习的自动化,将机器学习建模的能力下沉,实现AI能力的普及化。另一方面,让数据分析的速度越来越快,服务越来越及时,从准实时变成毫秒级的实时响应。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"DAT开源AutoML为AI赋能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"据DAT负责人、九章云极DataCanvas 资深架构师杨建介绍,DAT(DataCanvas AutoML Toolkit)是一个自动机器学习工具套件包,它包含了一系列功能强大的AutoML开源工具,从底层的通用自动机器学习框架到用于结构化及非结构化领域端到端的自动建模工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"杨建表示,整个DAT里的所有工具可以面向不同的用户,每一个工具都可以单独使用。整个DAT的工具站,从面向任务来分,同时可覆盖结构化和非结构化;从面向人群来分,既可以面向于专业的AI从业人员,也可以让没有专业AI背景的人员利用AutoML相应的工具使用,既可以满足AI使用者的需求,还有面向AutoML工具开发者的相应框架。因此,DAT并不是一个面向某一个场景来开发的工具,希望AutoML能够面向于不同人群,从不同角度和各个层面全方位地释放AutoML能力,为用户来赋能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DAT的所有项目都是以开源方式来开发的,目前接收到来自于GitHub社区Star的数量超过2600个,来自于社区的安装和下载次数超过6万次。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DAT性能突破了机器学习建模过程中存在的不均衡、概念漂移、泛化能力、大规模数据这4大难点。DAT包含DeepTables、Hypernets、HyperGBM、Cooka。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DeepTables:用于结构化数据建模的深度学习工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DeepTables是一款易用的深度学习工具,仅需5行代码就可以训练出高质量的模型,其具有开箱即用、架构灵活、简单易用等特点,可以满足企业在结构化数据建模方面的大部分需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DT采用突破性的技术解决了深度学习在结构化数据上表现不佳的难题,在大量的公开数据集上击败了XGBoost、LightGBM等传统算法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DT里引入了以下4种主要组件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Embedding,深度学习重要的表示学习的方法;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特征交互层,专门针对结构化数据设计了一系列的子网络架构,如CIN、DCN、PNN、FM等,实现特征,实现非线性的、海量的交互学习和衍生;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特征提取,引入了很多机制,包括Transform著名的Extraction方法,用来做特征提取;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"GBM模型融合,采用迁移学习和特征知识提取的方法,把GBM模型里学习到的信息融合到神经网络里,进一步提升整个DeepTables最后建模的效果。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Hypernets: 通用自动机器学习框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hypernets是一个自动机器学习的底层通用框架,帮助用户快速开发专用领域的AutoML工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hypernets解决了自动建模领域的三个关键技术:搜索空间的表示、高效的搜索算法以及评估策略,可以与各种机器学习、深度学习框架结合开发出专用的自动机器学习工具;同时提供开放的训练服务框架,可以满足单节点及分布式高性能的模型训练需求,大大降低了AutoML工具的开发门槛;最新的神经网络架构搜索(NAS)算法的支持,也让深度学习的网络架构设计实现自动化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"杨建表示,Hypernets是DAT里的一个“重器”,它是面向于AutoML工具开发者专门设计的框架,利用它可以更自由地组装定制化AutoML工具的框架。在实际上,Hypernets能够满足开发AutoML工具所需要的绝大部分能力,同时也预留了足够的扩展空间,可以被用来定制化地满足特定建模场景的需求,大幅降低AutoML工具开发的门槛和成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就目前企业来说,每一个建模场景都有一些自己独特的需求,通用AutoML工具很难去应对企业在建模过程中个性化的需求。而基于Hypernets,可以简单地几百行、甚至几十行代码,就可以开发一款定制AutoML的工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"杨建举例,公司内部一名没有任何AutoML背景的实习生,基于Hypernets,从零开始,只用了不到两周时间就完成了一个基于聚类算法的AutoML工具。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"杨建表示,九章云极认为,未来可能会出现更多的AutoML工具来满足企业碎片化、个性化的建模需求,希望Hypernets能在这个过程中发挥作为一个基础架构应有的价值,希望Hypernets基础框架自动建模产生的模型,未来能够突破人类专家现有的水平。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"HyperGBM:基于GBM模型的自动建模工具"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HyperGBM是基于Hypernets框架融合了多款先进的GBM模型包括XGBoost、LightGBM、CatBoost模型的自动建模工具,根据先进的设计理念实现了从数据预处理、特征衍生、特征筛选、模型超参数优化、模型选择、模型融合全过程的全自动机器学习,不仅能实现一键训练,同时还能把整个Pipeline合成单一模型实现一键上线,彻底解决生产化困扰。模型效果出众,在多个公开数据集和客户实际业务场景上的表现超出人类专家水平。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HyperGBM具有很多高级特性。企业在建模过程中面临的很多挑战,如数据不均衡、概念漂移等问题,均可在HyperGBM里自动解决。针对海量的数据量级,也提供了基于集群的分布式训练能力,满足企业在海量数据中实现自动建模的需求。HyperGBM近期还在开发提供基于GPO的硬件加速特性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"杨建介绍,HyperGBM整个建模需要的时间成本较低,HyperGBM 通常只需要人工单次训练时间的 10 倍左右的时间就可以完成整个 AutoML 的过程。手工建模需要大量的超参数的调优,包括数据预处理、特征加工等,需要反复迭代实验,通常有时需要几十次、几百次重复实验,才找到一个相对满意的模型。通常,整个建模周期需要数周到数月的周期量级。而基于HyperGBM,建模周期降到了以天为级别的周期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,手工建模更多依赖于个人能力的上限,这给企业带来了很多不确定性,HyperGBM的搜索相对更加稳定,在某一个搜索空间里能找到一个最佳的Pipeline。这是它相对于手工建模的优势。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Cooka:轻量自动机器学习系统"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cooka是一款界面友好的开源交互式自动机器学习系统,资源要求低,安装在便携式电脑中即可运行。Cooka融合了HyperGBM、HyperDT自动机器学习工具,界面简单、操作简便,让没有任何专业基础的人员也可轻松完成机器学习建模工作,进一步拉低AutoML使用的门槛。借助Cooka,使用HyperGBM和DeepTables变得更加轻松。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DingoDB实时交互式分析数据库"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DingoDB是新一代集分析与服务于一体的实时分析数据库HSAP(Hybrid Serving & Analytical Processing),支持高频修改和查询、实时交互式分析、实时多维分析。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"DingoDB的由来以及设计目标"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"九章云极DataCanvas 产品总监胡宗星详细介绍了DingoDB的研发背景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"九章云极的客户群体主要以金融2B为主,在面向客户交付时,团队发现,企业的数据架构大多采用Lanmda架构,以P计算作为数据处理的主线,以流计算作为P计算的辅助,两者相互配合来共同支撑企业的数据应用开发和数据中台的建设。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Lanmda架构不仅是企业主流的数据架构,也是很多互联网公司主流的数据架构。但Lanmda架构存在很多技术上的不足,如数据散列存储的问题,存在多套存储引擎,这导致数据融合变得非常困难;数据存在多个存储引擎,也会让数据的一致性和准确性变的困难,由此会在生产运维中增加数据的核对和校验的难题;此外,基于传统的大数据和MPP数据架构,高并发的数据服务和及时修改的能力较差,通常会在数据服务层增加各种缓存和KV来进行数据提速,提高数据服务的并发性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多套存储引擎、计算引擎以及各种缓存的存在,让企业的数据平台架构变的异常复杂,学习和运维的成本变的极高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“随着业务的演进,九章云极借鉴了TB系统和AP系统各自的优点,衍生出了一种新的数据架构。在进行海量数据存储的同时,能进行高并发的数据查询,以及进行实时数据分析,这就是DingoDB诞生的主要原因”,胡宗星说, “Dingo DB不纯粹的解决TB类的交易性事物问题,也不纯粹的解决AP类的复杂分析问题,而是解决TB和AP中间既能提供高并发的数据服务,同时提供数据实时分析的问题,具体而言,主要解决三个方面的问题:数据存储问题、高并发数据服务问题、数据计算的问题。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"九章云极将DingoDB定义为实时交互式分析数据库。胡宗星团队希望,通过DingoDB,数据能实时的接入、实时存储,能够提供一种简洁化的方式,让用户能够快速进行分析,并对分析的结果能够得到及时的应答。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“我们的目标是将DingoDB打造成一款集分析和服务为一体的开源数据库,同时它能支持高并发的查询、修改和删除,能够进行实时的交互式分析和多维分析,多维一体的分布式数据库”,胡宗星表示。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"DingoDB核心技术创新"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"据介绍,DingoDB采用的核心技术有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"标准SQL"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo支持ANSI SQL语法,兼容TPC-H和TPC-DS,可以和Calcite客户端、BI报表工具无缝衔接。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智能优化器"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo数据库支持行存、列存和行列混合,同时表级支持多分区和副本机制。Dingo的SQL优化器基于数据的元信息提供最优执行计划,实现行、列的自动选择。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"实时高频更新"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo数据库能够基于主键,实现数据记录的Upsert、Delete操作;同时数据采用多分区副本机制,能够将Upsert、Delete操作转化为Key-Value操作,实现高频更新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"行列混合:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo支持行存、列存和行列混合的存储形式。针对多维分析场景的场景,为了保证计算的时效性,Dingo能够通过列存模式实现数据聚合计算,实现高效分析;针对记录级的查询、更新操作,Dingo通过行存的模式实现数据的快速定位,实现数据的查询和更新操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"插件化模式支持多种数据的导入:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了适应不同场景的用数需求,Dingo采用插件化的模式支持多种类型的Connector,如Kafka、Pulsar、离线文件、HDFS等多种形态的Connector,实现数据的无缝接入和服务能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存算分离、弹性部署:"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo将数据持久化到S3对象存储、通过基于SQL实现执行计划的分布式计算,能够实现存储、计算的分离;数据的分区、多副本模式和数据的分布式存储能够实现计算、存储的独立横向扩容和弹性扩展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"DingoDB的创新点"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"智能优化器实现行列优化选择"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo数据库内置智能SQL优化器,能够实现分析性SQL、记录级SQL的自动优化,基于不同的业务场景实现行存模式、列存模式的智能选择。Dingo能够通过列存模式实现数据聚合计算,实现高效分析;针对记录级的查询、更新操作,Dingo通过行存的模式实现数据的快速定位,实现数据的查询和更新操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"高频点查、修改操作"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了满足数据的时效性需求,Dingo数据存储采用Key-Value的模式实现存储,同时基于数据的副本策略实现数据的行列混合存储。针对高频记录级的场景,如数据关联、记录修改等场景,可以实现记录级的高并发、高频率的查询、修改操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"多副本机制存算弹性扩展"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dingo数据表采用多分区多副本机制,保证了数据的安全性和稳定性;同时存储、计算分离的模式保证了容器化部署的横向扩展,实现计算和存储的数据弹性。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章