shark應用cache

轉自官網

Unlike Hive, Shark allows users to exploit this temporallocality by caching their working set of data, or in database terms, to createin-memory materialized views. Common data types can be cached in a columnarformat (as Java primitives arrays), which is very efficient for storage andgarbage collection, yet provides maximum performance (orders of magnitudefaster than reading data from disk).

To create a cached table from the rows (or subset ofrows) of an existing table, set the shark.cache table property:

CREATE TABLE ... TBLPROPERTIES ("shark.cache" ="true") AS SELECT ...

We also extend HiveQL to provide a shortcut for thissyntax. Simply append _cached to the table name when using CREATE TABLE AS SELECT, andthat table will be cached in memory. To disable this shortcut, see theconfiguration options section. Below is an example:

CREATE TABLE logs_last_month_cached AS

SELECT * FROM logs WHERE time > date(...);

以上，建表即可，表名加後綴_cached。同時設置這個參數shark.cache.flag.checkTableName爲true。

說明：shark.cache.flag.checkTableName# 'true' or 'false', whether to cache tables ending in "_cached"

Once this table has been created, we can query it likeany other Hive table.

SELECT count(*) from logs_last_month_cached;

Note that queries which shuffle data require you to setthe number of reducers. This will be automatically determined in the nextversion of Shark:

set mapred.reduce.tasks=[num_tasks];

SELECT page, count(*) c FROM logs_last_month_cached

GROUP BY page ORDER BY c DESC LIMIT 10;

In addition to caching, Shark employs a number ofoptimization techniques such as limit push downs and hash-based shuffle, whichcan provide significant speedups in query processing. You can directly inspectthe execution plan for a given query using the explain statement:

explain SELECT * FROM logs_last_month ORDER BY timestamp LIMIT 10;

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

shark應用cache

redis的key亂碼問題和值自增問題

一個開源且全面的C#算法實戰教程

一款.NET開源、功能強大、跨平臺的繪圖庫 - OxyPlot

CORS error 但是 status code 是200 OK

壓縮上傳的GPU數據的方案

使用skopeo同步鏡像

hive優化

Spark on Yarn：性能調優

2011年5月10日

內存數據庫fastdb

數據庫水平切分的實現原理解析－分庫，分表，主從，集羣，負載均衡器

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結