Apache Hbase

作者:jiangzz 電話:15652034180 微信:jiangzz_wx 微信公衆賬號:jiangzz_wy

Hbase

概述

Hbase是一個基於Hadoop之上的數據庫服務,該數據庫是一個分佈式、可擴展的大的數據倉庫。當您需要對大數據進行隨機,實時讀/寫訪問時,請使用Apache HBase™(HDFS雖然可以存儲海量數據,但是對數據的管理粒度比較粗糙,只支持對文件的上傳下載,並不支持對文件內容行記錄級別的修改)。Apache HBase是一個開源,分佈式,版本化,非關係型數據庫,模仿了谷歌的Bigtable,正如Bigtable利用Google文件系統提供的分佈式數據存儲一樣,Apache HBase在Hadoop和HDFS之上提供類似Bigtable的功能。

HBase和HDFS關係&區別?
在這裏插入圖片描述

Hbase是構建在HDFS之上的一個數據庫服務,能夠使得用戶通過HBase數據庫服務間接的操作HDFS,能夠使得用戶對HDFS上的數據實現CRUD操作(細粒度操作)。

Hbase特性-官方

  • 線性和模塊化擴展。
  • 嚴格一致 reads 和 writes.
  • 表的自動和可配置分片(自動分區)
  • RegionServers之間的自動故障轉移支持。
  • 方便的基類,用於使用Apache HBase表支持Hadoop MapReduce作業。
  • 易於使用的Java API,用於客戶端訪問。
  • Block cache 和 Bloom Filters 以進行實時查詢。

列存儲

NoSQL:泛指非關係型數據通常包含以下類型:key-value型文檔型-JSON基於列型圖形關係存儲。每一種NoSQL產品彼此之間沒有任何關係,差異很大基本上彼此之間不能夠相互替換。

基於列型使用場景:
在這裏插入圖片描述

hbase支持存儲數十億級別的數據,但是Hbase不支持複雜查詢和事物操作。因此Hbase雖然存儲海量數據,但是基於海量數據的查詢是非常有限的。

列存儲和行存儲區別?
在這裏插入圖片描述

Hbase安裝

HDFS基本環境(存儲)

1、安裝JDK,配置環境變量JAVA_HOME

[root@CentOS ~]# rpm -ivh jdk-8u171-linux-x64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:jdk1.8-2000:1.8.0_171-fcs        ################################# [100%]
Unpacking JAR files...
        tools.jar...
        plugin.jar...
        javaws.jar...
        deploy.jar...
        rt.jar...
        jsse.jar...
        charsets.jar...
        localedata.jar...
[root@CentOS ~]# vi .bashrc

JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH  

[root@CentOS ~]# source .bashrc 
[root@CentOS ~]# jps
1933 Jps

2、關閉防火牆

[root@CentOS ~]# systemctl stop firewalld # 關閉 服務
[root@CentOS ~]# systemctl disable firewalld # 關閉開機自啓動
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@CentOS ~]# firewall-cmd --state
not running

3、配置主機名和IP映射關係

[root@CentOS ~]# cat /etc/hostname 
CentOS
[root@CentOS ~]# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.186.150 CentOS

4、配置SSH免密碼登錄

[root@CentOS ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:6yYiypvclJAZLU2WHvzakxv6uNpsqpwk8kzsjLv3yJA root@CentOS
The key's randomart image is:
+---[RSA 2048]----+
|  .o.            |
|  =+             |
| o.oo            |
|  =. .           |
| +  o . S        |
| o...=   .       |
|E.oo. + .        |
|BXX+o....        |
|B#%O+o o.        |
+----[SHA256]-----+
[root@CentOS ~]# ssh-copy-id CentOS
[root@CentOS ~]# ssh CentOS
Last failed login: Mon Jan  6 14:30:49 CST 2020 from centos on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Mon Jan  6 14:20:27 2020 from 192.168.186.1

5、上傳Hadoop安裝包,並解壓到/usr目錄

[root@CentOS ~]# tar -zxf  hadoop-2.9.2.tar.gz -C /usr/

6、配置HADOOP_HOME環境變量

[root@CentOS ~]# vi .bashrc
HADOOP_HOME=/usr/hadoop-2.9.2
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
[root@CentOS ~]# source .bashrc    
[root@CentOS ~]# hadoop classpath #打印Hadoop的類路徑
/usr/hadoop-2.9.2/etc/hadoop:/usr/hadoop-2.9.2/share/hadoop/common/lib/*:/usr/hadoop-2.9.2/share/hadoop/common/*:/usr/hadoop-2.9.2/share/hadoop/hdfs:/usr/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/usr/hadoop-2.9.2/share/hadoop/hdfs/*:/usr/hadoop-2.9.2/share/hadoop/yarn:/usr/hadoop-2.9.2/share/hadoop/yarn/lib/*:/usr/hadoop-2.9.2/share/hadoop/yarn/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/*:/usr/hadoop-2.9.2/contrib/capacity-scheduler/*.jar

7、修改core-site.xml

[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/core-site.xml
<!--nn訪問入口-->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://CentOS:9000</value>
</property>
<!--hdfs工作基礎目錄-->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/hadoop-2.9.2/hadoop-${user.name}</value>
</property>

8、修改hdfs-site.xml

[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/hdfs-site.xml 
<!--block副本因子-->
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
<!--配置Sencondary namenode所在物理主機-->
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>CentOS:50090</value>
</property>
<!--設置datanode最大文件操作數-->
<property>
        <name>dfs.datanode.max.xcievers</name>
        <value>4096</value>
</property>
<!--設置datanode並行處理能力-->
<property>
        <name>dfs.datanode.handler.count</name>
        <value>6</value>
</property>

9、修改slaves

[root@CentOS ~]# vi /usr/hadoop-2.9.2/etc/hadoop/slaves 
CentOS

10、格式化NameNode,生成fsimage

[root@CentOS ~]# hdfs namenode -format
[root@CentOS ~]# yum install -y tree
[root@CentOS ~]# tree /usr/hadoop-2.9.2/hadoop-root/
/usr/hadoop-2.9.2/hadoop-root/
└── dfs
    └── name
        └── current
            ├── fsimage_0000000000000000000
            ├── fsimage_0000000000000000000.md5
            ├── seen_txid
            └── VERSION

3 directories, 4 files

11、啓動HDFS服務

[root@CentOS ~]# start-dfs.sh 

Zookeeper安裝(協調)

1、上傳zookeeper的安裝包,並解壓在/usr目錄下

[root@CentOS ~]# tar -zxf zookeeper-3.4.12.tar.gz -C /usr/

2、配置Zookepeer的zoo.cfg

[root@CentOS ~]# tar -zxf zookeeper-3.4.12.tar.gz -C /usr/
[root@CentOS ~]# cd /usr/zookeeper-3.4.12/
[root@CentOS zookeeper-3.4.12]# cp conf/zoo_sample.cfg conf/zoo.cfg
[root@CentOS zookeeper-3.4.12]# vi conf/zoo.cfg 
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/root/zkdata
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

3、創建zookeeper的數據目錄

[root@CentOS ~]# mkdir /root/zkdata

4、啓動zookeeper服務

[root@CentOS ~]# cd /usr/zookeeper-3.4.12/
[root@CentOS zookeeper-3.4.12]# ./bin/zkServer.sh start zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.12/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@CentOS zookeeper-3.4.12]# ./bin/zkServer.sh status zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper-3.4.12/bin/../conf/zoo.cfg
Mode: standalone

Hbase配置與安裝(數據庫服務)

1、上傳Hbase安裝包,並解壓到/usr目錄下

[root@CentOS ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/

2、配置Hbase環境變量HBASE_HOME

[root@CentOS ~]# vi .bashrc 
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.9.2
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME

[root@CentOS ~]# source .bashrc 
[root@CentOS ~]# hbase classpath # 測試Hbase是否識別Hadoop
/usr/hbase-1.2.4/conf:/usr/java/latest/lib/tools.jar:/usr/hbase-1.2.4:/usr/hbase-1.2.4/lib/activation-1.1.jar:/usr/hbase-1.2.4/lib/aopalliance-1.0.jar:/usr/hbase-1.2.4/lib/apacheds-i18n-2.0.0-M15.jar:/usr/hbase-1.2.4/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hbase-1.2.4/lib/api-asn1-api-1.0.0-M20.jar:/usr/hbase-1.2.4/lib/api-util-1.0.0-M20.jar:/usr/hbase-1.2.4/lib/asm-3.1.jar:/usr/hbase-1.2.4/lib/avro-
...
1.7.4.jar:/usr/hbase-1.2.4/lib/commons-beanutils-1.7.0.jar:/usr/hbase-1.2.4/lib/commons-
2.9.2/share/hadoop/yarn/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.9.2/share/hadoop/mapreduce/*:/usr/hadoop-2.9.2/contrib/capacity-scheduler/*.jar

3、配置hbase-site.xml

[root@CentOS ~]# cd /usr/hbase-1.2.4/
[root@CentOS hbase-1.2.4]# vi conf/hbase-site.xml
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://CentOS:9000/hbase</value>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>
<property>
    <name>hbase.zookeeper.quorum</name>
    <value>CentOS</value>
</property>
<property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
</property>

4、修改hbase-env.sh,將HBASE_MANAGES_ZK修改爲false

[root@CentOS ~]# cd /usr/hbase-1.2.4/
[root@CentOS hbase-1.2.4]# grep -i HBASE_MANAGES_ZK conf/hbase-env.sh 
# export HBASE_MANAGES_ZK=true
[root@CentOS hbase-1.2.4]# vi conf/hbase-env.sh 
export HBASE_MANAGES_ZK=false
[root@CentOS hbase-1.2.4]# grep -i HBASE_MANAGES_ZK conf/hbase-env.sh 
export HBASE_MANAGES_ZK=false

export HBASE_MANAGES_ZK=false告知Hbase,使用外部Zookeeper

5、修改regionservers配置文件

[root@CentOS hbase-1.2.4]# vi conf/regionservers
CentOS

6、啓動Hbase

[root@CentOS hbase-1.2.4]# ./bin/start-hbase.sh 
starting master, logging to /usr/hbase-1.2.4/logs/hbase-root-master-CentOS.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
starting regionserver, logging to /usr/hbase-1.2.4/logs/hbase-root-1-regionserver-CentOS.out
[root@CentOS hbase-1.2.4]# jps
3090 NameNode
5027 HMaster
3188 DataNode
5158 HRegionServer
3354 SecondaryNameNode
5274 Jps
3949 QuorumPeerMain

地址欄輸入:http://ip:16010 查看啓動UI 界面

在這裏插入圖片描述

Hbase Shell

  • 鏈接Hbase Shell
[root@CentOS hbase-1.2.4]# ./bin/hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

hbase(main):001:0>

用戶可以通過help查看系統腳本命令hbase(main):001:0> help

常用命令

status, table_help, version, whoami

hbase(main):003:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
hbase(main):004:0> whoami
root (auth:SIMPLE)
    groups: root

hbase(main):005:0> version
1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

namespace:命名空間等價傳統數據中的database

alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

hbase(main):007:0> create_namespace 'baizhi',{'user'=>'zs'}
0 row(s) in 0.3920 seconds

hbase(main):009:0> alter_namespace 'baizhi', {METHOD => 'set', 'sex' => 'true'}
0 row(s) in 0.1430 seconds

hbase(main):010:0> describe_namespace 'baizhi'
DESCRIPTION
{NAME => 'baizhi', sex => 'true', user => 'zs'}
1 row(s) in 0.0050 seconds

hbase(main):011:0>  alter_namespace 'baizhi',{METHOD => 'unset', NAME=>'sex'}
0 row(s) in 0.1140 seconds

hbase(main):013:0> list_namespace
NAMESPACE
baizhi
default
hbase
3 row(s) in 0.1790 seconds

hbase(main):015:0> list_namespace '^b.*'
NAMESPACE
baizhi
1 row(s) in 0.0160 seconds

hbase(main):016:0> list_namespace_tables 'hbase'
TABLE
meta
namespace
2 row(s) in 0.1510 seconds

ddl命名:data define languge 數據定義命令,涵蓋建表、建庫命令

alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters

hbase(main):019:0> create 'baizhi:t_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',TTL=>300}
0 row(s) in 2.9600 seconds

=> Hbase::Table - baizhi:t_user

hbase(main):024:0> list
TABLE
baizhi:t_user
1 row(s) in 0.0560 seconds

=> ["baizhi:t_user"]

hbase(main):028:0> disable_all 'baizhi:t_u.*'
baizhi:t_user

Disable the above 1 tables (y/n)?
y
1 tables successfully disabled

hbase(main):029:0> drop
drop             drop_all         drop_namespace
hbase(main):029:0> drop_all 'baizhi:t_u.*'
baizhi:t_user

Drop the above 1 tables (y/n)?
y
1 tables successfully dropped

hbase(main):030:0> list
TABLE
0 row(s) in 0.0070 seconds

=> []

hbase(main):032:0> exists 'baizhi:t_user'
Table baizhi:t_user does not exist
0 row(s) in 0.0210 seconds

dml data manage language 數據管理語言,通常是一些數據庫的CRUD操作

append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve


hbase(main):001:0> count 'baizhi:t_user'
0 row(s) in 1.8630 seconds

=> 0

hbase(main):002:0> t = get_table 'baizhi:t_user'
0 row(s) in 0.0000 seconds

=> Hbase::Table - baizhi:t_user
hbase(main):003:0> t.count
0 row(s) in 0.1140 seconds
=> 0

put

hbase(main):004:0> put 'baizhi:t_user','001','cf1:name','zhangsan'
0 row(s) in 0.7430 seconds

hbase(main):005:0> put 'baizhi:t_user','001','cf1:age',18
0 row(s) in 0.1120 seconds
# 修改
hbase(main):006:0> put 'baizhi:t_user','001','cf1:age',20 
0 row(s) in 0.0720 seconds

get

hbase(main):008:0> get 'baizhi:t_user','001'
COLUMN                              CELL
 cf1:age                            timestamp=1553961219305, value=20
 cf1:name                           timestamp=1553961181804, value=zhangsan

hbase(main):009:0> get 'baizhi:t_user','001',{COLUMN=>'cf1',VERSIONS=>3}
COLUMN                              CELL
 cf1:age                            timestamp=1553961219305, value=20
 cf1:age                            timestamp=1553961198084, value=18
 cf1:name                           timestamp=1553961181804, value=zhangsan
3 row(s) in 0.1540 seconds

hbase(main):010:0> get 'baizhi:t_user','001',{COLUMN=>'cf1',TIMESTAMP=>1553961198084}
COLUMN                              CELL
 cf1:age                            timestamp=1553961198084, value=18
1 row(s) in 0.0900 seconds

hbase(main):015:0> get 'baizhi:t_user','001',{COLUMN=>'cf1',TIMERANGE=>[1553961198084,1553961219306],VERSIONS=>3}
COLUMN                              CELL
 cf1:age                            timestamp=1553961219305, value=20
 cf1:age                            timestamp=1553961198084, value=18
2 row(s) in 0.0180 seconds

hbase(main):018:0> get 'baizhi:t_user','001',{COLUMN=>'cf1',FILTER => "ValueFilter(=, 'binary:zhangsan')"}
COLUMN                              CELL
 cf1:name                           timestamp=1553961181804, value=zhangsan
1 row(s) in 0.0550 seconds

hbase(main):019:0> get 'baizhi:t_user','001',{COLUMN=>'cf1',FILTER => "ValueFilter(=, 'substring:zhang')"}
COLUMN                              CELL
 cf1:name                           timestamp=1553961181804, value=zhangsan
1 row(s) in 0.0780 seconds

delete/deleteall

# 刪除指定版本之前的所以cell
hbase(main):027:0> delete 'baizhi:t_user','001','cf1:age',1553961899630
0 row(s) in 0.1020 seconds
# 刪除cf1:age的所有單元格
hbase(main):031:0> delete 'baizhi:t_user','001','cf1:age'
0 row(s) in 0.0180 seconds

hbase(main):034:0> deleteall 'baizhi:t_user','001'
0 row(s) in 0.0360 seconds

hbase(main):035:0> t.count
0 row(s) in 0.0450 seconds
=> 0
hbase(main):036:0> get 'baizhi:t_user','001',{COLUMN=>'cf1',VERSIONS=>3}
COLUMN                              CELL
0 row(s) in 0.0130 seconds

scan

hbase(main):045:0> scan 'baizhi:t_user'
ROW                                 COLUMN+CELL
 001                                column=cf1:age, timestamp=1553962118964, value=21
 001                                column=cf1:name, timestamp=1553962147916, value=zs
 002                                column=cf1:age, timestamp=1553962166894, value=19
 002                                column=cf1:name, timestamp=1553962157743, value=ls
 003                                column=cf1:name, timestamp=1553962203754, value=zl
 005                                column=cf1:age, timestamp=1553962179379, value=19
 005                                column=cf1:name, timestamp=1553962192054, value=ww

hbase(main):054:0> scan 'baizhi:t_user',{ LIMIT => 2,STARTROW=>"003",REVERSED=>true}
ROW                                 COLUMN+CELL
 003                                column=cf1:name, timestamp=1553962203754, value=zl
 002                                column=cf1:age, timestamp=1553962166894, value=19
 002                                column=cf1:name, timestamp=1553962157743, value=ls

hbase(main):058:0> scan 'baizhi:t_user',{ LIMIT => 2,STARTROW=>"003",REVERSED=>true,VERSIONS=>3,TIMERANGE=>[1553962157743,1553962203790]}
ROW                                 COLUMN+CELL
 003                                column=cf1:name, timestamp=1553962203754, value=zl
 002                                column=cf1:age, timestamp=1553962166894, value=19
 002                                column=cf1:name, timestamp=1553962157743, value=ls
2 row(s) in 0.0810 seconds

truncate- 截斷表

hbase(main):072:0> truncate 'baizhi:t_user'
Truncating 'baizhi:t_user' table (it may take a while):
 - Disabling table...


Java API操作HBase

maven

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.2.4</version>
</dependency>

創建和Hbase鏈接參數

private static Admin admin;//負責執行DDL
private static Connection conn;//負責執行DML
static {
    try {
        Configuration conf = new Configuration();
        conf.set("hbase.zookeeper.quorum","CentOS");
        conn= ConnectionFactory.createConnection(conf);
        admin=conn.getAdmin();

    } catch (IOException e) {
        e.printStackTrace();
    }
}
public static void close() throws IOException {
    admin.close();
    conn.close();
}

Namespace操作

//創建
NamespaceDescriptor nd = NamespaceDescriptor.create("zpark")
    .addConfiguration("user","zhansgan")
    .build();
admin.createNamespace(nd);
//查看
NamespaceDescriptor[] nds = admin.listNamespaceDescriptors();
for (NamespaceDescriptor nd : nds) {
    System.out.println(nd.getName());
}
//刪除
 admin.deleteNamespace("zpark");

Table先關操作(重點)

TableName tname=TableName.valueOf("zpark:t_user");
HTableDescriptor td = new HTableDescriptor(tname);

//構建cf1、cf2
HColumnDescriptor cf1 = new HColumnDescriptor("cf1");
cf1.setMaxVersions(3);
//設置ROW+COL索引方式,比默認ROW佔用更多的內存信息
cf1.setBloomFilterType(BloomType.ROWCOL);

HColumnDescriptor cf2 = new HColumnDescriptor("cf2");
//設置失效時常5min
cf2.setTimeToLive(300);
cf2.setInMemory(true);

//設置column family
td.addFamily(cf1);
td.addFamily(cf2);


admin.createTable(td);

數據的DML(重點)

//2.447 秒
TableName tname = TableName.valueOf("zpark:t_user");
Table table = conn.getTable(tname);
//構建PUT指令
for(int i=0;i<1000;i++){
    DecimalFormat df = new DecimalFormat("0000");
    String rowKey = df.format(i);

    Put put=new Put(rowKey.getBytes());
    put.addColumn("cf1".getBytes(),"name".getBytes(), Bytes.toBytes("USER"+rowKey));
    put.addColumn("cf1".getBytes(),"age".getBytes(), Bytes.toBytes(i+""));
    put.addColumn("cf1".getBytes(),"sex".getBytes(), Bytes.toBytes((i%4==0)+""));
    put.addColumn("cf1".getBytes(),"salary".getBytes(), Bytes.toBytes(1000+(i/100.0)*100+""));

    table.put(put);
}
table.close();

批量插入

TableName tname = TableName.valueOf("zpark:t_user");
BufferedMutator bufferedMutator=conn.getBufferedMutator(tname);
//構建PUT指令 0.549 秒
long begin=System.currentTimeMillis();
for(int i=0;i<1000;i++){
    DecimalFormat df = new DecimalFormat("0000");
    String rowKey = df.format(i);

    Put put=new Put(rowKey.getBytes());
    put.addColumn("cf1".getBytes(),"name".getBytes(), Bytes.toBytes("USER"+rowKey));
    put.addColumn("cf1".getBytes(),"age".getBytes(), Bytes.toBytes(i+""));
    put.addColumn("cf1".getBytes(),"sex".getBytes(), Bytes.toBytes((i%4==0)+""));
    put.addColumn("cf1".getBytes(),"salary".getBytes(), Bytes.toBytes(1000+(i/100.0)*100+""));

    bufferedMutator.mutate(put);
    if(i%500==0){
        bufferedMutator.flush();
    }
}
long end=System.currentTimeMillis();
bufferedMutator.close();
System.out.println(((end-begin)/1000.0)+" 秒");

GET

TableName tname = TableName.valueOf("zpark:t_user");
Table table = conn.getTable(tname);

Get get=new Get("0010".getBytes());

Result result = table.get(get);
while (result.advance()){
    Cell cell = result.current();
    String row = Bytes.toString(CellUtil.cloneRow(cell));
    String cf = Bytes.toString(CellUtil.cloneFamily(cell));
    String col = Bytes.toString(CellUtil.cloneQualifier(cell));
    String v = Bytes.toString(CellUtil.cloneValue(cell));
    long ts=cell.getTimestamp();
    System.out.println(row+"=>"+cf+":"+col+"\t"+v+" ts:"+ts);
}
table.close();
TableName tname = TableName.valueOf("zpark:t_user");
Table table = conn.getTable(tname);

Get get=new Get("0010".getBytes());

Result result = table.get(get);
String row=Bytes.toString(result.getRow());
String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
String age = Bytes.toString(result.getValue("cf1".getBytes(), "age".getBytes()));
String sex = Bytes.toString(result.getValue("cf1".getBytes(), "sex".getBytes()));
String salary = Bytes.toString(result.getValue("cf1".getBytes(), "salary".getBytes()));
System.out.println(row+"\t"+name+" "+age+" "+sex+" "+salary);
table.close();

Scan

TableName tname = TableName.valueOf("zpark:t_user");
Table table = conn.getTable(tname);
Scan scan = new Scan();

scan.setStartRow("0000".getBytes());
scan.setStopRow("0200".getBytes());
scan.addFamily("cf1".getBytes());
Filter filter1=new RowFilter(CompareFilter.CompareOp.EQUAL,new RegexStringComparator("09$"));
Filter filter2=new RowFilter(CompareFilter.CompareOp.EQUAL,new SubstringComparator("80"));
FilterList filter=new FilterList(FilterList.Operator.MUST_PASS_ONE,filter1,filter2);
scan.setFilter(filter);

ResultScanner rs = table.getScanner(scan);

for (Result result : rs) {
    String row=Bytes.toString(result.getRow());
    String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
    String age = Bytes.toString(result.getValue("cf1".getBytes(), "age".getBytes()));
    String sex = Bytes.toString(result.getValue("cf1".getBytes(), "sex".getBytes()));
    String salary = Bytes.toString(result.getValue("cf1".getBytes(), "salary".getBytes()));
    System.out.println(row+"\t"+name+" "+age+" "+sex+" "+salary);
}

table.close();

MapReduce 集成 Hbase(重點)

Jar包依賴

Hbase 0.90.x版本以後,程序可以自主解決運行時依賴,底層通過conf.set(“tmpjars”,’…’),所以用戶無需使用-libjars參數,但是用戶需要解決系統的提交依賴,因爲系統如果讀取HBase上的數據在任務初期需要計算任務切片,此時需要配置HADOOP_CLASSPATH

[root@CentOS ~]# vi .bashrc

HADOOP_HOME=/usr/hadoop-2.6.0
JAVA_HOME=/usr/java/latest
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASSPATH=.
HBASE_MANAGES_ZK=false
export JAVA_HOME
export PATH
export CLASSPATH
export HADOOP_HOME
export HBASE_MANAGES_ZK
HADOOP_CLASSPATH=/root/mysql-connector-java-5.1.46.jar:`/usr/hbase-1.2.4/bin/hbase classpath`
export HADOOP_CLASSPATH
[root@CentOS ~]# source .bashrc

Maven


<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    <version>2.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.6.0</version>
</dependency>
<!--Hbase依賴-->
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.2.4</version>
</dependency>

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.2.4</version>
</dependency>

任務提交

public class CustomJobsubmitter extends Configured implements Tool {
    public int run(String[] args) throws Exception {
        //1.創建Job實例
        Configuration conf = getConf();
        //開啓Map端壓縮
        conf.setBoolean("mapreduce.map.output.compress",true);
        conf.setClass("mapreduce.map.output.compress.codec", GzipCodec.class, CompressionCodec.class);
        //設置hbase的鏈接參數
        conf.set("hbase.zookeeper.quorum","CentOS");

        Job job=Job.getInstance(conf);

        job.setJarByClass(CustomJobsubmitter.class);

        //2.設置數據讀入和寫出格式化
        job.setInputFormatClass(TableInputFormat.class);
        job.setOutputFormatClass(TableOutputFormat.class);
        Scan scan = new Scan();
        scan.addFamily("cf1".getBytes());
        TableMapReduceUtil.initTableMapperJob(
            "zpark:t_user",
            scan,
            UserMapper.class,
            Text.class,
            DoubleWritable.class,
            job);

        TableMapReduceUtil.initTableReducerJob(
            "zpark:t_result",
            UserReducer.class,
            job
        );
        job.setNumReduceTasks(1);
        job.setCombinerClass(UserCombiner.class);

        job.waitForCompletion(true);
        return 0;
    }

    public static void main(String[] args) throws Exception {
        ToolRunner.run(new CustomJobsubmitter(),args);
    }
}

UserMapper

public class UserMapper extends TableMapper<Text, DoubleWritable> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
        String sex = Bytes.toString(value.getValue("cf1".getBytes(), "sex".getBytes()));
        Double salary = Double.parseDouble(Bytes.toString(value.getValue("cf1".getBytes(), "salary".getBytes())));
        context.write(new Text(sex),new DoubleWritable(salary));
    }
}

UserReduce

public class UserReducer extends TableReducer<Text, DoubleWritable,NullWritable> {
    @Override
    protected void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
        double totalSalary=0.0;
        for (DoubleWritable value : values) {
            totalSalary+=value.get();
        }
        Put put =new Put(key.getBytes());
        put.addColumn("cf1".getBytes(),"totalSalary".getBytes(), Bytes.toBytes(totalSalary+""));
        context.write(null,put);
    }
}

UserCombiner

public class UserCombiner extends Reducer<Text, DoubleWritable,Text, DoubleWritable> {
    @Override
    protected void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
        double totalSalary=0.0;
        for (DoubleWritable value : values) {
            totalSalary+=value.get();
        }
        context.write(key,new DoubleWritable(totalSalary));
    }
}

HBase集羣構建

  • 保證所有物理主機的時鐘同步,否則集羣搭建失敗
[root@CentOSX ~]# date -s '2019-04-01 16:24:00'
Mon Apr  1 16:24:00 CST 2019
[root@CentOSX ~]# clock -w
  • 確保HDFS正常啓動(參考HDFS集羣構建)
  • 搭建HBase集羣
[root@CentOSX ~]# tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/
[root@CentOSX ~]# vi /usr/hbase-1.2.4/conf/hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
                <name>hbase.rootdir</name>
                <value>hdfs://mycluster/hbase</value>
    </property>
    <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
    </property>
    <property>
                <name>hbase.zookeeper.quorum</name>
                <value>CentOSA,CentOSB,CentOSC</value>
    </property>
    <property>
                <name>hbase.zookeeper.property.clientPort</name>
                <value>2181</value>
    </property>
</configuration>

  • 修改RegionServers
[root@CentOSX ~]# vi /usr/hbase-1.2.4/conf/regionservers
CentOSA
CentOSB
CentOSC
  • 修改環境變量
[root@CentOS ~]# vi .bashrc

HADOOP_HOME=/usr/hadoop-2.6.0
JAVA_HOME=/usr/java/latest
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASSPATH=.
export JAVA_HOME
export PATH
export CLASSPATH
export HADOOP_HOME

HBASE_MANAGES_ZK=false
HADOOP_CLASSPATH=`/usr/hbase-1.2.4/bin/hbase classpath`
export HBASE_MANAGES_ZK
export HADOOP_CLASSPATH

[root@CentOS ~]# source .bashrc
  • 啓動Hbase服務
[root@CentOSX hbase-1.2.4]# ./bin/hbase-daemon.sh start master
[root@CentOSX hbase-1.2.4]# ./bin/hbase-daemon.sh start regionserver

更多精彩內容關注

微信公衆賬號

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章