Hbase表數據的導入和導出

原創

2018-08-22 00:10

轉載自:http://blog.csdn.net/javajxz008/article/details/61173213

Hbase沒有提供類似於hive根據已有表的建表建表語句，如在hive中創建一個和已有表表結構完全一樣的表可執行SQL：create table tbl_test1 like tbl_test,在hbase只能採用笨辦法，將其表結構拷貝出來建表。如：

稍作整理：

create 'solrHbase2', {NAME => 'f1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0',KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

需要將原表結構中的TTL => 'FOREVER'去掉才能建。

1．導出數據

使用hbase org.apache.hadoop.hbase.mapreduce.Driver export tablename hdfspath

或hbase org.apache.hadoop.hbase.mapreduce.Export tablename hdfspath

eg：hbase org.apache.hadoop.hbase.mapreduce.Driver export solrHbase /home/hdfs/export

此命令可加參數：

解釋如下：

Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]

Note: -D properties will be applied to the conf used.

For example:

-D mapred.output.compress=true 輸出壓縮

-D mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec 壓縮方式

-D mapred.output.compression.type=BLOCK 按塊壓縮

Additionally, the following SCAN properties can be specified

to control/limit what is exported..

-D hbase.mapreduce.scan.column.family=<familyName> 列簇

-D hbase.mapreduce.include.deleted.rows=true

-D hbase.mapreduce.scan.row.start=<ROWSTART> 開始rowkey

-D hbase.mapreduce.scan.row.stop=<ROWSTOP> 終止rowkey

For performance consider the following properties:

-Dhbase.client.scanner.caching=100 客戶端緩存條數

-Dmapred.map.tasks.speculative.execution=false

-Dmapred.reduce.tasks.speculative.execution=false

For tables with very wide rows consider setting the batch size as below:

-Dhbase.export.scanner.batch=10 批次大小

輸入命令後會生成mapreduce作業，不想全部將表數據導出，可採用參數-D hbase.mapreduce.scan.row.start=<ROWSTART>和-D hbase.mapreduce.scan.row.stop=<ROWSTOP>指定rowkey範圍導出數據。如導出指定rowkey範圍的數據：

hbase org.apache.hadoop.hbase.mapreduce.Export -D hbase.mapreduce.scan.row.start=00 -D hbase.mapreduce.scan.row.stop=0d solrHbase /home/hdfs/export

這裏的開始rowkey 00和結束rowkey 0d是rowkey的開頭部分，該表是做過預分區的，在hbase的控制檯上看：

1．導入數據

hbase org.apache.hadoop.hbase.mapreduce.Driver import tablename hdfspath

或hbase org.apache.hadoop.hbase.mapreduce.Import tablename hdfspath

Import也有一些使用說明：

將剛剛導出的數據導入新表中：

hbase org.apache.hadoop.hbase.mapreduce.Import solrHbase2 /home/hdfs/export

輸入命令生成mapreduce作業，完成後可查看新表數據是否導入成功。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hbase表數據的導入和導出

hive join 數據傾斜真實案例

查看python源碼之jieba安裝

hdfs設置回收站

周星馳成名前的故事

我的編程競賽之路 ——中國大學生計算機編程第一人樓天城訪談

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結