William McKnight on Columnar Databases

http://www.infoq.com/news/2011/09/nosqlnow-columnar-databases

Columnar databases offer better data storage capabilities for certain business use cases compared to the traditional relational database management systems (RDBMS). 

列式数据库在一些业务场景比传统关系型数据库管理系统提供更好的数据存储能力。

William McKnight spoke at the NoSQL Now 2011 Conference last week about the columnar databases and how they can be effective for certain data storage needs.

He said the data queries using RDBMS solutions (which are based on the row-wise design) send up a lot of data. Data Input/Output (I/O) has become the true bottleneck in the data processing needs today and when you do I/O, it’s better to get more data while you are there. The real way to avoid this problem is to only do the I/O that you really need. Columnar databases provide the ability to pick the columns needed instead of getting the whole row and not using the other columns (overhead) after the data retrieval. They offer a better solution in use cases where the work load needs a small percentage of the overall column bytes.

数据处理的瓶颈在数据IO,列式数据库可以只检索需要的列,减小了IO量,提高了效率。

 
In columnar databases, the data is stored in columns keeping all columns in the same order. William discussed the data page layout of relational database record and compared it with that of a column database table. There is some overhead involved in the row page design (in RDBMS databases) because the row scan or index scan is used for data queries and it can be an expensive option given all the data involved. He showed an example of a use case where the data query took 500,000 I/Os for a row-based database versus 235 I/Os for a Columnar database.

 
 
There are different columnar data storage options like Decomposed Storage Model, Positional Representation, Modified B-Tree/Row Length Encryption, and Bitmap. He also talked about materialization strategies which include Function of 'projection', Early and Late Materialization.

Some of the columnar database vendors are VerticaParAccelSybase IQInfoBright, Exasol, VectorWise and open source products like MonetDB and InfiniDB.

William said that the relational row based data warehouses and data marts will still be there. Beside the data warehouse and Hadoop, you will have column databases to process the data lot faster. He concluded the session by saying the database designers should start with good design principles and then decide if you want to put the data in row based or column based solution.

数据库设计师首先要有一个好的设计原则,然后再决定是需要基于行还是基于列的数据存储方案。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章