HBase數據壓縮方式的選擇

官方文檔：http://hbase.apache.org/book.html#_which_compressor_or_data_block_encoder_to_use

The compression or codec type to use depends on the characteristics of your data. Choosing the wrong type could cause your data to take more space rather than less, and can have performance implications.
In general, you need to weigh your options between smaller size and faster compression/decompression. Following are some general guidelines, expanded from a discussion at Documenting Guidance on compression and codecs.

If you have long keys (compared to the values) or many columns, use a prefix encoder. FAST_DIFF is recommended, as more testing is needed for Prefix Tree encoding.

If the values are large (and not precompressed, such as images), use a data block compressor.

Use GZIP for cold data, which is accessed infrequently. GZIP compression uses more CPU resources than Snappy or LZO, but provides a higher compression ratio.
GZIP壓縮適合冷數據場景，相比較Snappy和LZO壓縮，壓縮率更高，但是CPU消耗的也更多。

Use Snappy or LZO for hot data, which is accessed frequently. Snappy and LZO use fewer CPU resources than GZIP, but do not provide as high of a compression ratio.

In most cases, enabling Snappy or LZO by default is a good choice, because they have a low performance overhead and provide space savings.

Before Snappy became available by Google in 2011, LZO was the default. Snappy has similar qualities as LZO but has been shown to perform better.
Snappy壓縮出現之前谷歌默認使用的是LZO，但是Snappy出現之後在性能上更加出色，因此Snappy成了默認壓縮方式。

HBase配置Snappy壓縮：http://blog.csdn.net/maomaosi2009/article/details/47019913

HBase數據壓縮方式的選擇

Hue（六）集成HBase

社區版Hadoop與商用版Hadoop

Hue（四）集成Hive

Phoenix（十）二級索引之— —Append-only Data

HBase表預分區

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結