Data compression on Hbase will make your mapreduce job fly

如果你需要在HBase的數據上做MapReduce任務,記得打開壓縮選項。


IO speed is always performance bottleneck in any case. So focus on IO performance generally is best practice for performance tuning.

Data compression is one of way to improve IO performance.

Below table is our case, use LZO compression on HBase compare with data none compression.

compression algorithm Record Count HDFS Space usage(GB) MapReduce Job Time
NONE 400,000 190 19mins, 24sec
LZO 400,000 46 9mins, 34sec

Almost 100% increase performance, impressive.

For the compression algorithm, Snappy is another option which seems more faster than LZO.

see, http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/ and http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章