HBase 壓縮算法設置及修改

Compression就是在用CPU換IO吞吐量/磁盤空間,如果沒有什麼特殊原因推薦針對Column Family設置compression,下面主要有三種算法: GZIP, LZO, Snappy,作者推薦使用Snappy,因爲它有較好的Encoding/Decoding速度和可以接受的壓縮率。

HBase comes with support for a number of compression algorithims that can be enabled at the column family level. Enabling compression is recommended unless you have a reason not to do so, for example, when using already compressed content, such as JPEG images. For every other use-case compression usually will yield an overall better performance, because the overhead of the CPU performing the compression and decompression is less than what is required to read more data from disk.

Available Codecs

You can choose from a fixed list of supported compression algorithms. They have different qualities when it comes to compression ratio, as well as CPU and installation requirements.

Table 11.1. Comparison between compression algorithms

Algorithm % remaining Encoding Decoding
GZIP 13.4% 21 MB/s 118 MB/s
LZO 20.5% 135 MB/s 410 MB/s
Zippy/Snappy 22.2% 172 MB/s 409 MB/s

Note that some of the algorithms have a better compression ration while others are faster for the encoding, and a lot faster during decoding. Depending on your use-case you can choose one that suits you best.

Enabling Compression

Enabling compression requires the installation of the JNI and native compression libraries (unless you only want to use the Java code based GZIP compression), as described above, and specifying the chosen algorithm in the column family schema.

One way to accomplish this is during the creation of the table. The possible values are listed in the section called “Column Families”:

  1. hbase(main):001:0> create 'testtable', { NAME => 'colfam1', COMPRESSION => 'GZ' }    
  2. 0 row(s) in 1.1920 seconds  
  3.   
  4. hbase(main):012:0> describe 'testtable'                                              
  5. DESCRIPTION                                                 ENABLED  
  6. {NAME => 'testtable', FAMILIES => [{NAME => 'colfam1',      true   
  7. BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS   
  8. => '3', COMPRESSION => 'GZ', TTL => '2147483647', BLOCKSIZE  
  9. => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}  
  10. 1 row(s) in 0.0400 seconds  

The describe shell command is used to read back the schema of the newly created table. You can see the compression is set to GZIP (using the shorter "GZ" value as required). Another option to enable - or change, or disable - the compression algorithm is using the alter command for existing tables:

  1. hbase(main):013:0> create 'testtable2', 'colfam1'  
  2. 0 row(s) in 1.1920 seconds  
  3.   
  4. hbase(main):014:0> disable 'testtable2'  
  5. 0 row(s) in 2.0650 seconds  
  6.   
  7. hbase(main):016:0> alter 'testtable2', { NAME => 'colfam1', COMPRESSION => 'GZ' }  
  8. 0 row(s) in 0.2190 seconds  
  9.   
  10. hbase(main):017:0> enable 'testtable2'  
  11. 0 row(s) in 2.0410 seconds  

Note how the table was first disabled. This is necessary to perform the alteration of the column family definition. The final enable command brings the table back online.
發佈了26 篇原創文章 · 獲贊 5 · 訪問量 25萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章