HBase
高并发处理方案简图
一、概述
官网地址: http://hbase.apache.org/
HBase是一种构建在HDFS之上的分布式、面向列的存储系统。在需要实时读写、随机访问超大规模数据集时,可以使用HBase。
特点
-
大:一个表可以有上亿行,上百万列
-
面向列:面向列表(簇)的存储和权限控制,列(簇)独立检索。
-
结构稀疏:对于为空(NULL)的列,并不占用存储空间,因此,表可以设计的非常稀疏。
-
无模式:每一行都有一个可以排序的主键和任意多的列,列可以根据需要动态增加,同一张表中不同的行可以有截然不同的列。
-
数据多版本:每个单元中的数据可以有多个版本,默认情况下,版本号自动分配,版本号就是单元格插入时的时间戳。
-
数据类型单一:HBase中的数据都是字符串,没有类型。
图形解析展示:
名词解释
- Row Key: 与 NoSQL 数据库一样,Row Key 是用来检索记录的主键
- Column Family: 列簇,列的集合。列族是表的 Schema 的一部分(而列不是),必须在使用表之前定义。
- Timestamp: 时间戳。HBase 中通过 Row 和 Columns 确定的一个存储单元称为 Cell。每个 Cell 都保存着同一份数据的多个版本。 版本通过时间戳来索引,时间戳的类型是 64 位整型。时间戳可以由HBase(在数据写入时自动)赋值, 此时时间戳是精确到毫秒的当前系统时间。时间戳也 可以由客户显示赋值。如果应用程序要避免数据版本冲突,就必须自己生成具有唯一性的时间戳。每个 Cell 中,不同版本的数据按照时间倒序排序,即最新的数据排在最前面。
- Cell: 单元格,由 {row key,column(=< family> + < label>),version} 唯一确定的单元。Cell 中的数据是没有类型的,全部是字节码形式存储。
HBase和关系数据库区别
- 数据库类型:HBase中的数据类型都是字符串类型(string)
- 数据操作:HBase只有普通的增删改查等操作,没有表之间的关联查询
- 存储模式:HBase是基于列式存储模式,而RDBMS是基于行式存储的
- 应用场景:HBase适合存储大量数据,查询效率极高
二、环境搭建
准备工作
- 确保HDFS运行正常
- 确保ZooKeeper运行正常
# 正常状态
[root@hadoop zookeeper-3.4.10]# jps
4858 QuorumPeerMain
3571 SecondaryNameNode
3324 DataNode
3213 NameNode
4921 Jps
安装配置启动
- 安装
[root@hadoop ~] tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/
-
配置
- 修改hbase-site.xml文件
[root@hadoop ~] vi /usr/hbase-1.2.4/conf/hbase-site.xml
#添加 <property> <name>hbase.rootdir</name> <value>hdfs://hadoop:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property>
- 修改regionservers
[root@hadoop ~] vi /usr/hbase-1.2.4/conf/regionservers
# 添加主机名 hadoop
-
环境变量
[root@hadoop ~]# vim .bashrc
#添加配置环境变量 HBASE_MANAGES_ZK=false HBASE_HOME=/usr/hbase-1.2.4 HADOOP_HOME=/usr/hadoop-2.6.0 JAVA_HOME=/usr/java/latest CLASSPATH=. PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin::$HBASE_HOME/bin export JAVA_HOME export CLASSPATH export PATH export HADOOP_HOME export HBASE_HOME export HBASE_MANAGES_ZK #使用配置 [root@hadoop ~]# source .bashrc
-
启动
[root@hadoop ~]# start-hbase.sh starting master, logging to /usr/hbase-1.2.4/logs/hbase-root-master-hadoop.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 hadoop: starting regionserver, logging to /usr/hbase-1.2.4/logs/hbase-root-regionserver-hadoop.out # 查看进程 [root@hadoop ~]# jps 4858 QuorumPeerMain 10231 HRegionServer //负责实际表数据的读写操作 3571 SecondaryNameNode 10365 Jps 3324 DataNode 10096 HMaster //类似namenode管理表相关元数据、管理ResgionServer 3213 NameNode
-
Web UI访问
三、HBase Shell操作
系统相关
-
连接HBase Server
[root@hadoop ~]# hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017
-
查看系统状态
hbase(main):001:0> status 1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
-
帮助
hbase(main):010:0> help HBase Shell, version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command. Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. COMMAND GROUPS: Group name: general Commands: status, table_help, version, whoami Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters Group name: namespace Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables Group name: dml Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve Group name: tools Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump Group name: replication Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs Group name: snapshots Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot Group name: configuration Commands: update_all_config, update_config Group name: quotas Commands: list_quotas, set_quota Group name: security Commands: grant, list_security_capabilities, revoke, user_permission
Namespace操作
-
Namespace 概念类似于RDBMS的数据库,用来管理组织HBase的表
-
查看
hbase(main):002:0> list_namespace NAMESPACE default hbase
-
创建
hbase(main):004:0> create_namespace 'baizhi' 0 row(s) in 0.0820 seconds hbase(main):006:0* list_namespace NAMESPACE baizhi default hbase
-
查看namespac下的表
hbase(main):007:0> list_namespace_tables 'baizhi'
-
删除namespace
hbase(main):009:0> drop_namespace 'baizhi' 0 row(s) in 0.0660 seconds
注意:HBase不允许删除有表的数据库
Table相关操作
-
创建表
# 方式一 hbase(main):011:0> create 't_user','cf1','cf2' 0 row(s) in 1.5170 seconds => Hbase::Table - t_user # 方式二 # 注意: VERSIONS=>3 表示HBase支持存储3个VERSIONS的版本列数据,VERSIONS版本号默认为1 hbase(main):014:0> create 'baizhi:tt_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',VERSIONS=>3,TTL=>3600} 0 row(s) in 1.2650 seconds => Hbase::Table - baizhi:tt_user
-
查看所建表的详情
hbase(main):018:0> describe 'baizhi:tt_user' Table baizhi:tt_user is ENABLED baizhi:tt_user COLUMN FAMILIES DESCRIPTION {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =>'0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =>'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =>'0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '3600 SECONDS (1 HOUR)', KEEP_DELETED_CELLS => 'FALSE ', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 2 row(s) in 0.0520 seconds
-
删除表
# 让表失效 hbase(main):022:0> disable 't_user' 0 row(s) in 2.2620 seconds # 删除表 hbase(main):023:0> drop 't_user' 0 row(s) in 1.2750 seconds
注意:没有失效的表不能够删除
-
查看所有表
hbase(main):030:0> list TABLE baizhi:tt_user 1 row(s) in 0.0230 seconds=> ["baizhi:tt_user"]
-
修改表
hbase(main):039:0> alter 'baizhi:tt_user',{NAME=>'cf2',TTL=>1800} Updating all regions with the new schema... 1/1 regions updated. Done.0 row(s) in 1.9850 seconds
数据相关操作
-
存值
语法:put ‘ns1:t1’, ‘r1’, ‘c1’, 'value’
hbase(main):042:0* put 'baizhi:tt_user',1,'cf1:name','zs' 0 row(s) in 0.1580 seconds hbase(main):049:0> put 'baizhi:tt_user',1,'cf1:age',18 0 row(s) in 0.0160 seconds
-
取值
语法:get ‘ns1:t1’, 'r1’
#根据rewkey获取数据 hbase(main):001:0> get 'baizhi:tt_user',1 COLUMN CELL cf1:age timestamp=1527299113905, value=18 cf1:name timestamp=1527299057422, value=zhangsan 2 row(s) in 0.3990 seconds #根据rewkey+簇获取数据 hbase(main):005:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1'} COLUMN CELL cf1:age timestamp=1527299113905, value=18 cf1:name timestamp=1527299057422, value=zhangsan cf1:name timestamp=1527298815331, value=zs #根据rewkey+簇+版本号获取所有版本的数据 hbase(main):005:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1',VERSIONS=>10} COLUMN CELL cf1:age timestamp=1527299113905, value=18 cf1:name timestamp=1527299057422, value=zhangsan cf1:name timestamp=1527298815331, value=zs #根据rewkey+簇+列获取具体列数据 hbase(main):010:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1:name',VERSIONS=>10} COLUMN CELL cf1:name timestamp=1527299616029, value=zzzzzss cf1:name timestamp=1527299608593, value=zss cf1:name timestamp=1527299057422, value=zhangsan 3 row(s) in 0.0170 seconds #根据rewkey+具体簇号+列获取指定版本的数据 hbase(main):025:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1',TIMESTAMP=>1527299608593} COLUMN CELL cf1:name timestamp=1527299608593, value=zss 1 row(s) in 0.0050 seconds
-
浏览表
hbase(main):029:0* scan 'baizhi:tt_user' ROW COLUMN+CELL 1 column=cf1:age, timestamp=1527299113905, value=18 1 column=cf1:name, timestamp=1527299616029, value=zzzzzss 1 column=cf2:sex, timestamp=1527299875532, value=male hbase(main):036:0> scan 'baizhi:tt_user',{COLUMNS=>'cf1',LIMIT=>1} ROW COLUMN+CELL 1 column=cf1:age, timestamp=1527299113905, value=18 1 column=cf1:name, timestamp=1527299616029, value=zzzzzss
-
删除
# 删除cf1:name的name列 hbase(main):041:0> delete 'baizhi:tt_user',1,'cf1:name' 0 row(s) in 0.1630 seconds # 查看 hbase(main):042:0> scan 'baizhi:tt_user' ROW COLUMN+CELL 1 column=cf1:age, timestamp=1527299113905, value=18 1 column=cf2:sex, timestamp=1527299875532, value=male 2 column=cf1:name, timestamp=1527300617838, value=xh # 删除该条数据 hbase(main):048:0> deleteall 'baizhi:tt_user',1 0 row(s) in 0.0190 seconds # 查看 hbase(main):049:0> scan 'baizhi:tt_user' ROW COLUMN+CELL 0 row(s) in 0.0280 seconds
四、JAVA API
Maven依赖
<!--HBase 依赖-->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.4</version>
</dependency>
<!-- hbase结束 -->
测试
创建连接和管理对象
/**
* 建立连接和创建对象
*/
private Connection connection = null;
private Admin admin = null;
@Before
public void init () throws IOException {
Configuration configuration = HBaseConfiguration.create();
// 因为HBase的服务在zk上注册, 需要通过zk来获取HBase服务建立连接
configuration.set("hbase.zookeeper.quorum","192.168.21.147");
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
}
@After
public void destroy() throws IOException {
if (admin != null) admin.close();
if (connection != null) connection.close();
}
创建Namespace
/**
* 创建Namespace(相当于数据库)
* @throws IOException
*/
@Test
public void testCreateNamespace() throws IOException {
NamespaceDescriptor descriptor = NamespaceDescriptor.create("zpark").addConfiguration("author","wzh").build();
admin.createNamespace(descriptor);
}
创建表
/**
* 建表
* @throws IOException
*/
@Test
public void testCreateTable() throws IOException {
// 定义表名
HTableDescriptor tableDescriptor = new HTableDescriptor("zpark:u_user");
// 定义族
HColumnDescriptor cf1 = new HColumnDescriptor("cf1");
// 设置版本
cf1.setMaxVersions(3);
HColumnDescriptor cf2 = new HColumnDescriptor("cf2");
// 设置存活时间
cf2.setTimeToLive(36000);
// 添加族
tableDescriptor.addFamily(cf1);
tableDescriptor.addFamily(cf2);
//创建 表
admin.createTable(tableDescriptor);
}
存值
/**
* 添加数据
* @throws IOException
*/
@Test
public void testPutData() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 存单个值
Put put = new Put("001".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs".getBytes());
table.put(put);
table.close();
}
@Test
public void testPutData2() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 批量添加
for (int i = 2; i <10 ; i++) {
String rowkey = "com";
if (i < 10){
rowkey += ":00"+i;
} else if (i < 100){
rowkey += ":0"+i;
} else if(i < 1000) {
rowkey += ":"+i;
}
Put put = new Put(rowkey.getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),("zs"+i).getBytes());
put.addColumn("cf2".getBytes(),"age".getBytes(), Bytes.toBytes(i));
table.put(put);
}
table.close();
}
修改
/**
* 修改数据
* @throws IOException
*/
@Test
public void testUpdate() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
Put put = new Put("001".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs2".getBytes());
put.addColumn("cf2".getBytes(),"age".getBytes(),"18".getBytes());
table.put(put);
table.close();
}
查询
/**
* 查询
* @throws IOException
*/
@Test
public void testScan() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 查询一条
Get get = new Get("001".getBytes());
get.addColumn("cf1".getBytes(),"name".getBytes());
get.addColumn("cf2".getBytes(),"age".getBytes());
Result result = table.get(get);
System.out.println(Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())));
System.out.println(Bytes.toString(result.getValue("cf2".getBytes(), "age".getBytes())));
table.close();
}
@Test
public void testScan2() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 查所有
Scan scan = new Scan();
// 设置扫描的族
scan.addFamily("cf1".getBytes());
scan.addFamily("cf2".getBytes());
// 设置开始结束值
scan.setStartRow("com:002".getBytes());
scan.setStopRow("com:009".getBytes());
ResultScanner results = table.getScanner(scan);
for (Result result : results) {
String rowkey = Bytes.toString(result.getRow());
String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
Integer age =Bytes.toInt(result.getValue("cf2".getBytes(), "age".getBytes()));
System.out.println("rowkey|:"+rowkey+"name|"+name+"age|"+age);
}
table.close();
}
测试类完整
package com.baizhi;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
/**
* Unit test for simple App.
*/
public class HBaseTest {
/**
* 建立连接和创建对象
*/
private Connection connection = null;
private Admin admin = null;
@Before
public void init () throws IOException {
Configuration configuration = HBaseConfiguration.create();
// 因为HBase的服务在zk上注册, 需要通过zk来获取HBase服务建立连接
configuration.set("hbase.zookeeper.quorum","192.168.21.147");
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
}
@After
public void destroy() throws IOException {
if (admin != null) admin.close();
if (connection != null) connection.close();
}
/**
* 创建Namespace(相当于数据库)
* @throws IOException
*/
@Test
public void testCreateNamespace() throws IOException {
NamespaceDescriptor descriptor = NamespaceDescriptor.create("zpark").addConfiguration("author","wzh").build();
admin.createNamespace(descriptor);
}
/**
* 建表
* @throws IOException
*/
@Test
public void testCreateTable() throws IOException {
// 定义表名
HTableDescriptor tableDescriptor = new HTableDescriptor("zpark:u_user");
// 定义族
HColumnDescriptor cf1 = new HColumnDescriptor("cf1");
// 设置版本
cf1.setMaxVersions(3);
HColumnDescriptor cf2 = new HColumnDescriptor("cf2");
// 设置存活时间
cf2.setTimeToLive(36000);
// 添加族
tableDescriptor.addFamily(cf1);
tableDescriptor.addFamily(cf2);
//创建 表
admin.createTable(tableDescriptor);
}
/**
* 添加数据
* @throws IOException
*/
@Test
public void testPutData() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 存单个值
Put put = new Put("001".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs".getBytes());
table.put(put);
table.close();
}
@Test
public void testPutData2() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 批量添加
for (int i = 2; i <10 ; i++) {
String rowkey = "com";
if (i < 10){
rowkey += ":00"+i;
} else if (i < 100){
rowkey += ":0"+i;
} else if(i < 1000) {
rowkey += ":"+i;
}
Put put = new Put(rowkey.getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),("zs"+i).getBytes());
put.addColumn("cf2".getBytes(),"age".getBytes(), Bytes.toBytes(i));
table.put(put);
}
table.close();
}
/**
* 修改数据
* @throws IOException
*/
@Test
public void testUpdate() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
Put put = new Put("001".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs2".getBytes());
put.addColumn("cf2".getBytes(),"age".getBytes(),"18".getBytes());
table.put(put);
table.close();
}
/**
* 查询
* @throws IOException
*/
@Test
public void testScan() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 查询一条
Get get = new Get("001".getBytes());
get.addColumn("cf1".getBytes(),"name".getBytes());
get.addColumn("cf2".getBytes(),"age".getBytes());
Result result = table.get(get);
System.out.println(Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())));
System.out.println(Bytes.toString(result.getValue("cf2".getBytes(), "age".getBytes())));
table.close();
}
@Test
public void testScan2() throws IOException {
// 获取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 查所有
Scan scan = new Scan();
// 设置扫描的族
scan.addFamily("cf1".getBytes());
scan.addFamily("cf2".getBytes());
// 设置开始结束值
scan.setStartRow("com:002".getBytes());
scan.setStopRow("com:009".getBytes());
ResultScanner results = table.getScanner(scan);
for (Result result : results) {
String rowkey = Bytes.toString(result.getRow());
String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
Integer age =Bytes.toInt(result.getValue("cf2".getBytes(), "age".getBytes()));
System.out.println("rowkey|:"+rowkey+"name|"+name+"age|"+age);
}
table.close();
}
}