HBase
高併發處理方案簡圖
一、概述
官網地址: http://hbase.apache.org/
HBase是一種構建在HDFS之上的分佈式、面向列的存儲系統。在需要實時讀寫、隨機訪問超大規模數據集時,可以使用HBase。
特點
-
大:一個表可以有上億行,上百萬列
-
面向列:面向列表(簇)的存儲和權限控制,列(簇)獨立檢索。
-
結構稀疏:對於爲空(NULL)的列,並不佔用存儲空間,因此,表可以設計的非常稀疏。
-
無模式:每一行都有一個可以排序的主鍵和任意多的列,列可以根據需要動態增加,同一張表中不同的行可以有截然不同的列。
-
數據多版本:每個單元中的數據可以有多個版本,默認情況下,版本號自動分配,版本號就是單元格插入時的時間戳。
-
數據類型單一:HBase中的數據都是字符串,沒有類型。
圖形解析展示:
名詞解釋
- Row Key: 與 NoSQL 數據庫一樣,Row Key 是用來檢索記錄的主鍵
- Column Family: 列簇,列的集合。列族是表的 Schema 的一部分(而列不是),必須在使用表之前定義。
- Timestamp: 時間戳。HBase 中通過 Row 和 Columns 確定的一個存儲單元稱爲 Cell。每個 Cell 都保存着同一份數據的多個版本。 版本通過時間戳來索引,時間戳的類型是 64 位整型。時間戳可以由HBase(在數據寫入時自動)賦值, 此時時間戳是精確到毫秒的當前系統時間。時間戳也 可以由客戶顯示賦值。如果應用程序要避免數據版本衝突,就必須自己生成具有唯一性的時間戳。每個 Cell 中,不同版本的數據按照時間倒序排序,即最新的數據排在最前面。
- Cell: 單元格,由 {row key,column(=< family> + < label>),version} 唯一確定的單元。Cell 中的數據是沒有類型的,全部是字節碼形式存儲。
HBase和關係數據庫區別
- 數據庫類型:HBase中的數據類型都是字符串類型(string)
- 數據操作:HBase只有普通的增刪改查等操作,沒有表之間的關聯查詢
- 存儲模式:HBase是基於列式存儲模式,而RDBMS是基於行式存儲的
- 應用場景:HBase適合存儲大量數據,查詢效率極高
二、環境搭建
準備工作
- 確保HDFS運行正常
- 確保ZooKeeper運行正常
# 正常狀態
[root@hadoop zookeeper-3.4.10]# jps
4858 QuorumPeerMain
3571 SecondaryNameNode
3324 DataNode
3213 NameNode
4921 Jps
安裝配置啓動
- 安裝
[root@hadoop ~] tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/
-
配置
- 修改hbase-site.xml文件
[root@hadoop ~] vi /usr/hbase-1.2.4/conf/hbase-site.xml
#添加 <property> <name>hbase.rootdir</name> <value>hdfs://hadoop:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property>
- 修改regionservers
[root@hadoop ~] vi /usr/hbase-1.2.4/conf/regionservers
# 添加主機名 hadoop
-
環境變量
[root@hadoop ~]# vim .bashrc
#添加配置環境變量 HBASE_MANAGES_ZK=false HBASE_HOME=/usr/hbase-1.2.4 HADOOP_HOME=/usr/hadoop-2.6.0 JAVA_HOME=/usr/java/latest CLASSPATH=. PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin::$HBASE_HOME/bin export JAVA_HOME export CLASSPATH export PATH export HADOOP_HOME export HBASE_HOME export HBASE_MANAGES_ZK #使用配置 [root@hadoop ~]# source .bashrc
-
啓動
[root@hadoop ~]# start-hbase.sh starting master, logging to /usr/hbase-1.2.4/logs/hbase-root-master-hadoop.out Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 hadoop: starting regionserver, logging to /usr/hbase-1.2.4/logs/hbase-root-regionserver-hadoop.out # 查看進程 [root@hadoop ~]# jps 4858 QuorumPeerMain 10231 HRegionServer //負責實際表數據的讀寫操作 3571 SecondaryNameNode 10365 Jps 3324 DataNode 10096 HMaster //類似namenode管理表相關元數據、管理ResgionServer 3213 NameNode
-
Web UI訪問
三、HBase Shell操作
系統相關
-
連接HBase Server
[root@hadoop ~]# hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017
-
查看系統狀態
hbase(main):001:0> status 1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
-
幫助
hbase(main):010:0> help HBase Shell, version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command. Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. COMMAND GROUPS: Group name: general Commands: status, table_help, version, whoami Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, locate_region, show_filters Group name: namespace Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables Group name: dml Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve Group name: tools Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, trace, unassign, wal_roll, zk_dump Group name: replication Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs Group name: snapshots Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot Group name: configuration Commands: update_all_config, update_config Group name: quotas Commands: list_quotas, set_quota Group name: security Commands: grant, list_security_capabilities, revoke, user_permission
Namespace操作
-
Namespace 概念類似於RDBMS的數據庫,用來管理組織HBase的表
-
查看
hbase(main):002:0> list_namespace NAMESPACE default hbase
-
創建
hbase(main):004:0> create_namespace 'baizhi' 0 row(s) in 0.0820 seconds hbase(main):006:0* list_namespace NAMESPACE baizhi default hbase
-
查看namespac下的表
hbase(main):007:0> list_namespace_tables 'baizhi'
-
刪除namespace
hbase(main):009:0> drop_namespace 'baizhi' 0 row(s) in 0.0660 seconds
注意:HBase不允許刪除有表的數據庫
Table相關操作
-
創建表
# 方式一 hbase(main):011:0> create 't_user','cf1','cf2' 0 row(s) in 1.5170 seconds => Hbase::Table - t_user # 方式二 # 注意: VERSIONS=>3 表示HBase支持存儲3個VERSIONS的版本列數據,VERSIONS版本號默認爲1 hbase(main):014:0> create 'baizhi:tt_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',VERSIONS=>3,TTL=>3600} 0 row(s) in 1.2650 seconds => Hbase::Table - baizhi:tt_user
-
查看所建表的詳情
hbase(main):018:0> describe 'baizhi:tt_user' Table baizhi:tt_user is ENABLED baizhi:tt_user COLUMN FAMILIES DESCRIPTION {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =>'0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =>'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =>'0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '3600 SECONDS (1 HOUR)', KEEP_DELETED_CELLS => 'FALSE ', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 2 row(s) in 0.0520 seconds
-
刪除表
# 讓表失效 hbase(main):022:0> disable 't_user' 0 row(s) in 2.2620 seconds # 刪除表 hbase(main):023:0> drop 't_user' 0 row(s) in 1.2750 seconds
注意:沒有失效的表不能夠刪除
-
查看所有表
hbase(main):030:0> list TABLE baizhi:tt_user 1 row(s) in 0.0230 seconds=> ["baizhi:tt_user"]
-
修改表
hbase(main):039:0> alter 'baizhi:tt_user',{NAME=>'cf2',TTL=>1800} Updating all regions with the new schema... 1/1 regions updated. Done.0 row(s) in 1.9850 seconds
數據相關操作
-
存值
語法:put ‘ns1:t1’, ‘r1’, ‘c1’, 'value’
hbase(main):042:0* put 'baizhi:tt_user',1,'cf1:name','zs' 0 row(s) in 0.1580 seconds hbase(main):049:0> put 'baizhi:tt_user',1,'cf1:age',18 0 row(s) in 0.0160 seconds
-
取值
語法:get ‘ns1:t1’, 'r1’
#根據rewkey獲取數據 hbase(main):001:0> get 'baizhi:tt_user',1 COLUMN CELL cf1:age timestamp=1527299113905, value=18 cf1:name timestamp=1527299057422, value=zhangsan 2 row(s) in 0.3990 seconds #根據rewkey+簇獲取數據 hbase(main):005:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1'} COLUMN CELL cf1:age timestamp=1527299113905, value=18 cf1:name timestamp=1527299057422, value=zhangsan cf1:name timestamp=1527298815331, value=zs #根據rewkey+簇+版本號獲取所有版本的數據 hbase(main):005:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1',VERSIONS=>10} COLUMN CELL cf1:age timestamp=1527299113905, value=18 cf1:name timestamp=1527299057422, value=zhangsan cf1:name timestamp=1527298815331, value=zs #根據rewkey+簇+列獲取具體列數據 hbase(main):010:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1:name',VERSIONS=>10} COLUMN CELL cf1:name timestamp=1527299616029, value=zzzzzss cf1:name timestamp=1527299608593, value=zss cf1:name timestamp=1527299057422, value=zhangsan 3 row(s) in 0.0170 seconds #根據rewkey+具體簇號+列獲取指定版本的數據 hbase(main):025:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1',TIMESTAMP=>1527299608593} COLUMN CELL cf1:name timestamp=1527299608593, value=zss 1 row(s) in 0.0050 seconds
-
瀏覽表
hbase(main):029:0* scan 'baizhi:tt_user' ROW COLUMN+CELL 1 column=cf1:age, timestamp=1527299113905, value=18 1 column=cf1:name, timestamp=1527299616029, value=zzzzzss 1 column=cf2:sex, timestamp=1527299875532, value=male hbase(main):036:0> scan 'baizhi:tt_user',{COLUMNS=>'cf1',LIMIT=>1} ROW COLUMN+CELL 1 column=cf1:age, timestamp=1527299113905, value=18 1 column=cf1:name, timestamp=1527299616029, value=zzzzzss
-
刪除
# 刪除cf1:name的name列 hbase(main):041:0> delete 'baizhi:tt_user',1,'cf1:name' 0 row(s) in 0.1630 seconds # 查看 hbase(main):042:0> scan 'baizhi:tt_user' ROW COLUMN+CELL 1 column=cf1:age, timestamp=1527299113905, value=18 1 column=cf2:sex, timestamp=1527299875532, value=male 2 column=cf1:name, timestamp=1527300617838, value=xh # 刪除該條數據 hbase(main):048:0> deleteall 'baizhi:tt_user',1 0 row(s) in 0.0190 seconds # 查看 hbase(main):049:0> scan 'baizhi:tt_user' ROW COLUMN+CELL 0 row(s) in 0.0280 seconds
四、JAVA API
Maven依賴
<!--HBase 依賴-->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>1.2.4</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.4</version>
</dependency>
<!-- hbase結束 -->
測試
創建連接和管理對象
/**
* 建立連接和創建對象
*/
private Connection connection = null;
private Admin admin = null;
@Before
public void init () throws IOException {
Configuration configuration = HBaseConfiguration.create();
// 因爲HBase的服務在zk上註冊, 需要通過zk來獲取HBase服務建立連接
configuration.set("hbase.zookeeper.quorum","192.168.21.147");
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
}
@After
public void destroy() throws IOException {
if (admin != null) admin.close();
if (connection != null) connection.close();
}
創建Namespace
/**
* 創建Namespace(相當於數據庫)
* @throws IOException
*/
@Test
public void testCreateNamespace() throws IOException {
NamespaceDescriptor descriptor = NamespaceDescriptor.create("zpark").addConfiguration("author","wzh").build();
admin.createNamespace(descriptor);
}
創建表
/**
* 建表
* @throws IOException
*/
@Test
public void testCreateTable() throws IOException {
// 定義表名
HTableDescriptor tableDescriptor = new HTableDescriptor("zpark:u_user");
// 定義族
HColumnDescriptor cf1 = new HColumnDescriptor("cf1");
// 設置版本
cf1.setMaxVersions(3);
HColumnDescriptor cf2 = new HColumnDescriptor("cf2");
// 設置存活時間
cf2.setTimeToLive(36000);
// 添加族
tableDescriptor.addFamily(cf1);
tableDescriptor.addFamily(cf2);
//創建 表
admin.createTable(tableDescriptor);
}
存值
/**
* 添加數據
* @throws IOException
*/
@Test
public void testPutData() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 存單個值
Put put = new Put("001".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs".getBytes());
table.put(put);
table.close();
}
@Test
public void testPutData2() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 批量添加
for (int i = 2; i <10 ; i++) {
String rowkey = "com";
if (i < 10){
rowkey += ":00"+i;
} else if (i < 100){
rowkey += ":0"+i;
} else if(i < 1000) {
rowkey += ":"+i;
}
Put put = new Put(rowkey.getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),("zs"+i).getBytes());
put.addColumn("cf2".getBytes(),"age".getBytes(), Bytes.toBytes(i));
table.put(put);
}
table.close();
}
修改
/**
* 修改數據
* @throws IOException
*/
@Test
public void testUpdate() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
Put put = new Put("001".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs2".getBytes());
put.addColumn("cf2".getBytes(),"age".getBytes(),"18".getBytes());
table.put(put);
table.close();
}
查詢
/**
* 查詢
* @throws IOException
*/
@Test
public void testScan() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 查詢一條
Get get = new Get("001".getBytes());
get.addColumn("cf1".getBytes(),"name".getBytes());
get.addColumn("cf2".getBytes(),"age".getBytes());
Result result = table.get(get);
System.out.println(Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())));
System.out.println(Bytes.toString(result.getValue("cf2".getBytes(), "age".getBytes())));
table.close();
}
@Test
public void testScan2() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 查所有
Scan scan = new Scan();
// 設置掃描的族
scan.addFamily("cf1".getBytes());
scan.addFamily("cf2".getBytes());
// 設置開始結束值
scan.setStartRow("com:002".getBytes());
scan.setStopRow("com:009".getBytes());
ResultScanner results = table.getScanner(scan);
for (Result result : results) {
String rowkey = Bytes.toString(result.getRow());
String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
Integer age =Bytes.toInt(result.getValue("cf2".getBytes(), "age".getBytes()));
System.out.println("rowkey|:"+rowkey+"name|"+name+"age|"+age);
}
table.close();
}
測試類完整
package com.baizhi;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
/**
* Unit test for simple App.
*/
public class HBaseTest {
/**
* 建立連接和創建對象
*/
private Connection connection = null;
private Admin admin = null;
@Before
public void init () throws IOException {
Configuration configuration = HBaseConfiguration.create();
// 因爲HBase的服務在zk上註冊, 需要通過zk來獲取HBase服務建立連接
configuration.set("hbase.zookeeper.quorum","192.168.21.147");
connection = ConnectionFactory.createConnection(configuration);
admin = connection.getAdmin();
}
@After
public void destroy() throws IOException {
if (admin != null) admin.close();
if (connection != null) connection.close();
}
/**
* 創建Namespace(相當於數據庫)
* @throws IOException
*/
@Test
public void testCreateNamespace() throws IOException {
NamespaceDescriptor descriptor = NamespaceDescriptor.create("zpark").addConfiguration("author","wzh").build();
admin.createNamespace(descriptor);
}
/**
* 建表
* @throws IOException
*/
@Test
public void testCreateTable() throws IOException {
// 定義表名
HTableDescriptor tableDescriptor = new HTableDescriptor("zpark:u_user");
// 定義族
HColumnDescriptor cf1 = new HColumnDescriptor("cf1");
// 設置版本
cf1.setMaxVersions(3);
HColumnDescriptor cf2 = new HColumnDescriptor("cf2");
// 設置存活時間
cf2.setTimeToLive(36000);
// 添加族
tableDescriptor.addFamily(cf1);
tableDescriptor.addFamily(cf2);
//創建 表
admin.createTable(tableDescriptor);
}
/**
* 添加數據
* @throws IOException
*/
@Test
public void testPutData() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 存單個值
Put put = new Put("001".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs".getBytes());
table.put(put);
table.close();
}
@Test
public void testPutData2() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 批量添加
for (int i = 2; i <10 ; i++) {
String rowkey = "com";
if (i < 10){
rowkey += ":00"+i;
} else if (i < 100){
rowkey += ":0"+i;
} else if(i < 1000) {
rowkey += ":"+i;
}
Put put = new Put(rowkey.getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),("zs"+i).getBytes());
put.addColumn("cf2".getBytes(),"age".getBytes(), Bytes.toBytes(i));
table.put(put);
}
table.close();
}
/**
* 修改數據
* @throws IOException
*/
@Test
public void testUpdate() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
Put put = new Put("001".getBytes());
put.addColumn("cf1".getBytes(),"name".getBytes(),"zs2".getBytes());
put.addColumn("cf2".getBytes(),"age".getBytes(),"18".getBytes());
table.put(put);
table.close();
}
/**
* 查詢
* @throws IOException
*/
@Test
public void testScan() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 查詢一條
Get get = new Get("001".getBytes());
get.addColumn("cf1".getBytes(),"name".getBytes());
get.addColumn("cf2".getBytes(),"age".getBytes());
Result result = table.get(get);
System.out.println(Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())));
System.out.println(Bytes.toString(result.getValue("cf2".getBytes(), "age".getBytes())));
table.close();
}
@Test
public void testScan2() throws IOException {
// 獲取表
Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
// 查所有
Scan scan = new Scan();
// 設置掃描的族
scan.addFamily("cf1".getBytes());
scan.addFamily("cf2".getBytes());
// 設置開始結束值
scan.setStartRow("com:002".getBytes());
scan.setStopRow("com:009".getBytes());
ResultScanner results = table.getScanner(scan);
for (Result result : results) {
String rowkey = Bytes.toString(result.getRow());
String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
Integer age =Bytes.toInt(result.getValue("cf2".getBytes(), "age".getBytes()));
System.out.println("rowkey|:"+rowkey+"name|"+name+"age|"+age);
}
table.close();
}
}