HBase與Hive整合

整合背景

hbase是一種面向列簇的nosql數據庫,主要用於存儲結構化和非結構化的數據,但是原生不支持sql,同時由於良好的寫入性能,主要用於實時數據的存儲。

hive將hdfs文件映射爲一張表,支持類SQL語句對其進行管理,不支持實時更新,主要用於離線數據倉庫。

hive同時也提供了與hbase的集成,使得能夠在hbase表上使用HQL語句進行查詢。

配置步驟
hive版本:apache 1.2.2
hbase版本:apache 1.2.6

1.拷貝hbase相關jar包到hive lib目錄下

 cp /root/software/hbase-1.2.6/lib/hbase-it-1.2.6.jar  /root/software/hive-1.2.2/lib
 cp /root/software/hbase-1.2.6/lib/hbase-server-1.2.6.jar  /root/software/hive-1.2.2/lib
 cp /root/software/hbase-1.2.6/lib/hbase-hadoop2-compat-1.2.6.jar  /root/software/hive-1.2.2/lib
 cp /root/software/hbase-1.2.6/lib/hbase-hadoop-compat-1.2.6.jar  /root/software/hive-1.2.2/lib
 cp /root/software/hbase-1.2.6/lib/hbase-client-1.2.6.jar  /root/software/hive-1.2.2/lib
 cp /root/software/hbase-1.2.6/lib/hbase-common-1.2.6.jar  /root/software/hive-1.2.2/lib
 cp /root/software/hbase-1.2.6/lib/hbase-protocol-1.2.6.jar  /root/software/hive-1.2.2/lib
 cp /root/software/hbase-1.2.6/lib/htrace-core-3.1.0-incubating.jar  /root/software/hive-1.2.2/lib

2.拷貝hbase-site.xml到hive conf目錄下

 cp /root/software/hbase-1.2.6/conf/hbase-site.xml  /root/software/hive-1.2.2/conf/hbase-site.xml

3.修改hive-env.sh文件

 vim /root/software/hive-1.2.2/conf/hive-env.sh
 export HBASE_HOME=/root/software/hbase-1.2.6
驗證

1.創建hbase表並插入數據

# 創建表
hbase(main):001:0> create 't1', {NAME=>'f1'};
# 插入數據
hbase(main):003:0> put 't1', '0001',  'f1:name', 'zhangsan' 
hbase(main):004:0> put 't1', '0001',  'f1:age', '30'
hbase(main):006:0> put 't1', '0002',  'f1:name', 'lisi' 
hbase(main):007:0> put 't1', '0002',  'f1:age', '29'

2.創建hive表

hive> CREATE EXTERNAL TABLE hbase_t1(id int,name string,age int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f1:name,f1:age") TBLPROPERTIES("hbase.table.name" = "t1");
#hive外部表字段與hbase表字段對應關係(id <--> rowkey, name <--> f1:name, age <--> f1:age)
"hbase.columns.mapping" = ":key,f1:name,f1:age"
#hive外部表對應的hbase表名爲t1
"hbase.table.name" = "t1"

3.通過hive sql訪問hbase表數據

hive> select * from hbase_t1;
OK
1       zhangsan        30
2       lisi    29
Time taken: 0.313 seconds, Fetched: 2 row(s)

hive> select count(*) from hbase_t1;
Query ID = root_20200303102934_f5bea28c-3c06-4d43-ac15-be436853189e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1583248050275_0001, Tracking URL = http://master:8088/proxy/application_1583248050275_0001/
Kill Command = /root/software/hadoop-2.6.5/bin/hadoop job  -kill job_1583248050275_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-03-03 10:29:49,791 Stage-1 map = 0%,  reduce = 0%
2020-03-03 10:30:08,544 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 14.12 sec
2020-03-03 10:30:17,965 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 16.93 sec
MapReduce Total cumulative CPU time: 16 seconds 930 msec
Ended Job = job_1583248050275_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 16.93 sec   HDFS Read: 7030 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 16 seconds 930 msec
OK
2
Time taken: 44.955 seconds, Fetched: 1 row(s)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章