William McKnight on Columnar Databases

原創

omg2012

2020-06-23 02:34

http://www.infoq.com/news/2011/09/nosqlnow-columnar-databases

Columnar databases offer better data storage capabilities for certain business use cases compared to the traditional relational database management systems (RDBMS).

列式数据库在一些业务场景比传统关系型数据库管理系统提供更好的数据存储能力。

William McKnight spoke at the NoSQL Now 2011 Conference last week about the columnar databases and how they can be effective for certain data storage needs.

He said the data queries using RDBMS solutions (which are based on the row-wise design) send up a lot of data. Data Input/Output (I/O) has become the true bottleneck in the data processing needs today and when you do I/O, it’s better to get more data while you are there. The real way to avoid this problem is to only do the I/O that you really need. Columnar databases provide the ability to pick the columns needed instead of getting the whole row and not using the other columns (overhead) after the data retrieval. They offer a better solution in use cases where the work load needs a small percentage of the overall column bytes.

数据处理的瓶颈在数据IO，列式数据库可以只检索需要的列，减小了IO量，提高了效率。

In columnar databases, the data is stored in columns keeping all columns in the same order. William discussed the data page layout of relational database record and compared it with that of a column database table. There is some overhead involved in the row page design (in RDBMS databases) because the row scan or index scan is used for data queries and it can be an expensive option given all the data involved. He showed an example of a use case where the data query took 500,000 I/Os for a row-based database versus 235 I/Os for a Columnar database.

There are different columnar data storage options like Decomposed Storage Model, Positional Representation, Modified B-Tree/Row Length Encryption, and Bitmap. He also talked about materialization strategies which include Function of 'projection', Early and Late Materialization.

Some of the columnar database vendors are Vertica, ParAccel, Sybase IQ, InfoBright, Exasol, VectorWise and open source products like MonetDB and InfiniDB.

William said that the relational row based data warehouses and data marts will still be there. Beside the data warehouse and Hadoop, you will have column databases to process the data lot faster. He concluded the session by saying the database designers should start with good design principles and then decide if you want to put the data in row based or column based solution.

数据库设计师首先要有一个好的设计原则，然后再决定是需要基于行还是基于列的数据存储方案。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

William McKnight on Columnar Databases

Power Automate Desktop 安装完，登录后老是提示one driver 错误

再谈23种设计模式（3）：行为型模式（学习笔记）

微前端学习笔记(4):从微前端到微模块之EMP与hel-micro方案探索

微前端学习笔记（1）：微前端总体架构概述，从微服务发微

985 硕士程序员，空窗 4 个月没有 Offer！

一文搞懂 Spring 循环依赖

赛博斗地主——使用大语言模型扮演Agent智能体玩牌类游戏。

VScode右键打开(添加到右键)

记一次 .NET某工控视觉自动化系统卡死分析

WindowsServer--SQL Server搭建主从同步实现读写分离 - 事务性分发

吵架的英語

William McKnight on Columnar Databases

The Problem with Cloud-Computing Standardization

Ruby on Rails 3.1 Released, Brings Assets Pipeline, Streaming, and Javascript Changes

PEOPLE IN AMERICA - Katharine Hepburn, 1907-2003: An Independent and Intelligent Actress

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結