HBase數據壓縮方式的選擇

官方文檔:http://hbase.apache.org/book.html#_which_compressor_or_data_block_encoder_to_use


The compression or codec type to use depends on the characteristics of your data. Choosing the wrong type could cause your data to take more space rather than less, and can have performance implications.
In general, you need to weigh your options between smaller size and faster compression/decompression. Following are some general guidelines, expanded from a discussion at Documenting Guidance on compression and codecs.

  • If you have long keys (compared to the values) or many columns, use a prefix encoder. FAST_DIFF is recommended, as more testing is needed for Prefix Tree encoding.

  • If the values are large (and not precompressed, such as images), use a data block compressor.

  • Use GZIP for cold data, which is accessed infrequently. GZIP compression uses more CPU resources than Snappy or LZO, but provides a higher compression ratio.
    GZIP壓縮適合冷數據場景,相比較Snappy和LZO壓縮,壓縮率更高,但是CPU消耗的也更多。

  • Use Snappy or LZO for hot data, which is accessed frequently. Snappy and LZO use fewer CPU resources than GZIP, but do not provide as high of a compression ratio.

  • In most cases, enabling Snappy or LZO by default is a good choice, because they have a low performance overhead and provide space savings.

  • Before Snappy became available by Google in 2011, LZO was the default. Snappy has similar qualities as LZO but has been shown to perform better.
    Snappy壓縮出現之前谷歌默認使用的是LZO,但是Snappy出現之後在性能上更加出色,因此Snappy成了默認壓縮方式。


HBase配置Snappy壓縮:http://blog.csdn.net/maomaosi2009/article/details/47019913

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章