轉載自:http://blog.csdn.net/javajxz008/article/details/61173213
Hbase沒有提供類似於hive根據已有表的建表建表語句,如在hive中創建一個和已有表表結構完全一樣的表可執行SQL:create table tbl_test1 like tbl_test,在hbase只能採用笨辦法,將其表結構拷貝出來建表。如:
稍作整理:
create 'solrHbase2', {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0',KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
需要將原表結構中的TTL => 'FOREVER'去掉才能建。
1. 導出數據
使用hbase org.apache.hadoop.hbase.mapreduce.Driver export tablename hdfspath
或hbase org.apache.hadoop.hbase.mapreduce.Export tablename hdfspath
eg:hbase org.apache.hadoop.hbase.mapreduce.Driver export solrHbase /home/hdfs/export
此命令可加參數:
解釋如下:
Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]
Note: -D properties will be applied to the conf used.
For example:
-D mapred.output.compress=true 輸出壓縮
-D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec 壓縮方式
-D mapred.output.compression.type=BLOCK 按塊壓縮
Additionally, the following SCAN properties can be specified
to control/limit what is exported..
-D hbase.mapreduce.scan.column.family=<familyName> 列簇
-D hbase.mapreduce.include.deleted.rows=true
-D hbase.mapreduce.scan.row.start=<ROWSTART> 開始rowkey
-D hbase.mapreduce.scan.row.stop=<ROWSTOP> 終止rowkey
For performance consider the following properties:
-Dhbase.client.scanner.caching=100 客戶端緩存條數
-Dmapred.map.tasks.speculative.execution=false
-Dmapred.reduce.tasks.speculative.execution=false
For tables with very wide rows consider setting the batch size as below:
-Dhbase.export.scanner.batch=10 批次大小
輸入命令後會生成mapreduce作業,不想全部將表數據導出,可採用參數-D hbase.mapreduce.scan.row.start=<ROWSTART>和-D hbase.mapreduce.scan.row.stop=<ROWSTOP>指定rowkey範圍導出數據。如導出指定rowkey範圍的數據:
hbase org.apache.hadoop.hbase.mapreduce.Export -D hbase.mapreduce.scan.row.start=00 -D hbase.mapreduce.scan.row.stop=0d solrHbase /home/hdfs/export
這裏的開始rowkey 00和結束rowkey 0d是rowkey的開頭部分,該表是做過預分區的,在hbase的控制檯上看:
1. 導入數據
hbase org.apache.hadoop.hbase.mapreduce.Driver import tablename hdfspath
或hbase org.apache.hadoop.hbase.mapreduce.Import tablename hdfspath
Import也有一些使用說明:
將剛剛導出的數據導入新表中:
hbase org.apache.hadoop.hbase.mapreduce.Import solrHbase2 /home/hdfs/export
輸入命令生成mapreduce作業,完成後可查看新表數據是否導入成功。