HBase

高併發處理方案簡圖

一、概述

官網地址: http://hbase.apache.org/
HBase是一種構建在HDFS之上的分佈式、面向列的存儲系統。在需要實時讀寫、隨機訪問超大規模數據集時，可以使用HBase。

特點

大：一個表可以有上億行，上百萬列
面向列：面向列表（簇）的存儲和權限控制，列（簇）獨立檢索。
結構稀疏：對於爲空（NULL）的列，並不佔用存儲空間，因此，表可以設計的非常稀疏。
無模式：每一行都有一個可以排序的主鍵和任意多的列，列可以根據需要動態增加，同一張表中不同的行可以有截然不同的列。
數據多版本：每個單元中的數據可以有多個版本，默認情況下，版本號自動分配，版本號就是單元格插入時的時間戳。
數據類型單一：HBase中的數據都是字符串，沒有類型。

圖形解析展示：

名詞解釋

Row Key: 與 NoSQL 數據庫一樣，Row Key 是用來檢索記錄的主鍵
Column Family: 列簇，列的集合。列族是表的 Schema 的一部分（而列不是），必須在使用表之前定義。
Timestamp: 時間戳。HBase 中通過 Row 和 Columns 確定的一個存儲單元稱爲 Cell。每個 Cell 都保存着同一份數據的多個版本。版本通過時間戳來索引，時間戳的類型是 64 位整型。時間戳可以由HBase（在數據寫入時自動）賦值，此時時間戳是精確到毫秒的當前系統時間。時間戳也可以由客戶顯示賦值。如果應用程序要避免數據版本衝突，就必須自己生成具有唯一性的時間戳。每個 Cell 中，不同版本的數據按照時間倒序排序，即最新的數據排在最前面。
Cell: 單元格，由 {row key，column(=< family> + < label>)，version} 唯一確定的單元。Cell 中的數據是沒有類型的，全部是字節碼形式存儲。

HBase和關係數據庫區別

數據庫類型：HBase中的數據類型都是字符串類型（string）
數據操作：HBase只有普通的增刪改查等操作，沒有表之間的關聯查詢
存儲模式：HBase是基於列式存儲模式，而RDBMS是基於行式存儲的
應用場景：HBase適合存儲大量數據，查詢效率極高

二、環境搭建

準備工作

確保HDFS運行正常
確保ZooKeeper運行正常

# 正常狀態
[root@hadoop zookeeper-3.4.10]# jps
4858 QuorumPeerMain
3571 SecondaryNameNode
3324 DataNode
3213 NameNode
4921 Jps

安裝配置啓動

安裝

[root@hadoop ~] tar -zxf hbase-1.2.4-bin.tar.gz -C /usr/

配置

修改hbase-site.xml文件

[root@hadoop ~] vi /usr/hbase-1.2.4/conf/hbase-site.xml

#添加
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://hadoop:9000/hbase</value>
</property>
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>hadoop</value>
</property>
<property>
  <name>hbase.zookeeper.property.clientPort</name>
  <value>2181</value>
</property>

修改regionservers

[root@hadoop ~] vi /usr/hbase-1.2.4/conf/regionservers

# 添加主機名
hadoop

環境變量

[root@hadoop ~]# vim .bashrc

#添加配置環境變量
HBASE_MANAGES_ZK=false
HBASE_HOME=/usr/hbase-1.2.4
HADOOP_HOME=/usr/hadoop-2.6.0
JAVA_HOME=/usr/java/latest
CLASSPATH=.
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin::$HBASE_HOME/bin
export JAVA_HOME
export CLASSPATH
export PATH
export HADOOP_HOME
export HBASE_HOME
export HBASE_MANAGES_ZK

#使用配置
[root@hadoop ~]# source .bashrc

啓動

[root@hadoop ~]# start-hbase.sh 
starting master, logging to /usr/hbase-1.2.4/logs/hbase-root-master-hadoop.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
hadoop: starting regionserver, logging to /usr/hbase-1.2.4/logs/hbase-root-regionserver-hadoop.out
# 查看進程
[root@hadoop ~]# jps
4858 QuorumPeerMain
10231 HRegionServer  //負責實際表數據的讀寫操作
3571 SecondaryNameNode
10365 Jps
3324 DataNode
10096 HMaster //類似namenode管理表相關元數據、管理ResgionServer
3213 NameNode

Web UI訪問

http://ip:16010

三、HBase Shell操作

系統相關

連接HBase Server

[root@hadoop ~]# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017

查看系統狀態

hbase(main):001:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load

幫助

hbase(main):010:0> help
HBase Shell, version 1.2.4, rUnknown, Wed Feb 15 18:58:00 CST 2017
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a
specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a
command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop,
drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list,
locate_region, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace,
list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put,
scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled,
catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush,
major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split,
trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication,
enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer,
remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots,
restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quotas, set_quota
Group name: security
Commands: grant, list_security_capabilities, revoke, user_permission

Namespace操作

Namespace 概念類似於RDBMS的數據庫，用來管理組織HBase的表

查看

hbase(main):002:0> list_namespace
NAMESPACE
default
hbase

創建

hbase(main):004:0> create_namespace 'baizhi'
0 row(s) in 0.0820 seconds
hbase(main):006:0* list_namespace
NAMESPACE
baizhi
default
hbase

查看namespac下的表

hbase(main):007:0> list_namespace_tables 'baizhi'

刪除namespace

hbase(main):009:0> drop_namespace 'baizhi'
0 row(s) in 0.0660 seconds

注意：HBase不允許刪除有表的數據庫

Table相關操作

創建表

# 方式一
hbase(main):011:0> create 't_user','cf1','cf2'
0 row(s) in 1.5170 seconds
=> Hbase::Table - t_user
# 方式二
# 注意： VERSIONS=>3 表示HBase支持存儲3個VERSIONS的版本列數據，VERSIONS版本號默認爲1
hbase(main):014:0> create 'baizhi:tt_user',{NAME=>'cf1',VERSIONS=>3},{NAME=>'cf2',VERSIONS=>3,TTL=>3600}
0 row(s) in 1.2650 seconds
=> Hbase::Table - baizhi:tt_user

查看所建表的詳情

hbase(main):018:0> describe 'baizhi:tt_user'
Table baizhi:tt_user is ENABLED
baizhi:tt_user
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =>'0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =>'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =>'0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '3600 SECONDS (1 HOUR)',
KEEP_DELETED_CELLS => 'FALSE
', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
2 row(s) in 0.0520 seconds

刪除表

# 讓表失效
hbase(main):022:0> disable 't_user'
0 row(s) in 2.2620 seconds
# 刪除表
hbase(main):023:0> drop 't_user'
0 row(s) in 1.2750 seconds

注意：沒有失效的表不能夠刪除

查看所有表

hbase(main):030:0> list
TABLE
baizhi:tt_user
1 row(s) in 0.0230 seconds=> ["baizhi:tt_user"]

修改表

hbase(main):039:0> alter 'baizhi:tt_user',{NAME=>'cf2',TTL=>1800}
Updating all regions with the new schema...
1/1 regions updated.
Done.0 row(s) in 1.9850 seconds

數據相關操作

存值

語法：put ‘ns1:t1’, ‘r1’, ‘c1’, 'value’

hbase(main):042:0* put 'baizhi:tt_user',1,'cf1:name','zs'
0 row(s) in 0.1580 seconds
hbase(main):049:0> put 'baizhi:tt_user',1,'cf1:age',18
0 row(s) in 0.0160 seconds

取值

語法：get ‘ns1:t1’, 'r1’

#根據rewkey獲取數據
hbase(main):001:0> get 'baizhi:tt_user',1
COLUMN         CELL
cf1:age         timestamp=1527299113905, value=18
cf1:name        timestamp=1527299057422, value=zhangsan
2 row(s) in 0.3990 seconds
#根據rewkey+簇獲取數據
hbase(main):005:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1'}
COLUMN         CELL
cf1:age         timestamp=1527299113905, value=18
cf1:name        timestamp=1527299057422, value=zhangsan
cf1:name        timestamp=1527298815331, value=zs
#根據rewkey+簇+版本號獲取所有版本的數據
hbase(main):005:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1',VERSIONS=>10}
COLUMN         CELL
cf1:age         timestamp=1527299113905, value=18
cf1:name        timestamp=1527299057422, value=zhangsan
cf1:name        timestamp=1527298815331, value=zs
#根據rewkey+簇+列獲取具體列數據
hbase(main):010:0> get 'baizhi:tt_user',1，{COLUMN=>'cf1:name',VERSIONS=>10}
COLUMN         CELL
cf1:name        timestamp=1527299616029, value=zzzzzss
cf1:name        timestamp=1527299608593, value=zss
cf1:name        timestamp=1527299057422, value=zhangsan
3 row(s) in 0.0170 seconds
#根據rewkey+具體簇號+列獲取指定版本的數據
hbase(main):025:0> get 'baizhi:tt_user',1,{COLUMN=>'cf1',TIMESTAMP=>1527299608593}
COLUMN         CELL
cf1:name        timestamp=1527299608593, value=zss
1 row(s) in 0.0050 seconds

瀏覽表

hbase(main):029:0* scan 'baizhi:tt_user'
ROW           COLUMN+CELL
1            column=cf1:age, timestamp=1527299113905, value=18
1            column=cf1:name, timestamp=1527299616029, value=zzzzzss
1            column=cf2:sex, timestamp=1527299875532, value=male
hbase(main):036:0> scan 'baizhi:tt_user',{COLUMNS=>'cf1',LIMIT=>1}
ROW           COLUMN+CELL
1            column=cf1:age, timestamp=1527299113905, value=18
1            column=cf1:name, timestamp=1527299616029, value=zzzzzss

刪除

# 刪除cf1:name的name列
hbase(main):041:0> delete 'baizhi:tt_user',1,'cf1:name'
0 row(s) in 0.1630 seconds
# 查看
hbase(main):042:0> scan 'baizhi:tt_user'
ROW           COLUMN+CELL
1            column=cf1:age, timestamp=1527299113905, value=18
1            column=cf2:sex, timestamp=1527299875532, value=male
2            column=cf1:name, timestamp=1527300617838, value=xh
# 刪除該條數據
hbase(main):048:0> deleteall 'baizhi:tt_user',1
0 row(s) in 0.0190 seconds
# 查看
hbase(main):049:0> scan 'baizhi:tt_user'
ROW           COLUMN+CELL
0 row(s) in 0.0280 seconds

四、JAVA API

Maven依賴

 <!--HBase 依賴-->
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>1.2.4</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-common</artifactId>
      <version>1.2.4</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-protocol</artifactId>
      <version>1.2.4</version>
    </dependency>

    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-server</artifactId>
      <version>1.2.4</version>
    </dependency>
    <!-- hbase結束 -->

測試

創建連接和管理對象

	/**
     *  建立連接和創建對象
     */
    private Connection connection  = null;
    private Admin admin = null;

    @Before
    public void init () throws IOException {
        Configuration configuration = HBaseConfiguration.create();
        // 因爲HBase的服務在zk上註冊， 需要通過zk來獲取HBase服務建立連接
        configuration.set("hbase.zookeeper.quorum","192.168.21.147");
        connection = ConnectionFactory.createConnection(configuration);
         admin = connection.getAdmin();
    }

    @After
    public void destroy() throws IOException {
        if (admin != null) admin.close();
        if (connection != null) connection.close();
    }

創建Namespace

 	/**
     *  創建Namespace（相當於數據庫）
     * @throws IOException
     */
    @Test
    public void testCreateNamespace() throws IOException {
        NamespaceDescriptor descriptor = NamespaceDescriptor.create("zpark").addConfiguration("author","wzh").build();
        admin.createNamespace(descriptor);
    }

創建表

/**
     *  建表
     * @throws IOException
     */
    @Test
    public void testCreateTable() throws IOException {
        // 定義表名
        HTableDescriptor tableDescriptor = new HTableDescriptor("zpark:u_user");
        // 定義族
        HColumnDescriptor cf1 = new HColumnDescriptor("cf1");
        // 設置版本
        cf1.setMaxVersions(3);
        HColumnDescriptor cf2 = new HColumnDescriptor("cf2");
        // 設置存活時間
        cf2.setTimeToLive(36000);
        // 添加族
        tableDescriptor.addFamily(cf1);
        tableDescriptor.addFamily(cf2);
        //創建 表
        admin.createTable(tableDescriptor);
    }

存值

 /**
     *  添加數據
     * @throws IOException
     */
    @Test
    public void testPutData() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        // 存單個值
        Put put = new Put("001".getBytes());
        put.addColumn("cf1".getBytes(),"name".getBytes(),"zs".getBytes());
        table.put(put);

        table.close();
    }
    @Test
    public void testPutData2() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        // 批量添加
        for (int i = 2; i <10 ; i++) {
            String rowkey = "com";
            if (i < 10){
                rowkey += ":00"+i;
            } else if (i < 100){
                rowkey += ":0"+i;
            } else if(i < 1000) {
                rowkey += ":"+i;
            }
            Put put = new Put(rowkey.getBytes());
            put.addColumn("cf1".getBytes(),"name".getBytes(),("zs"+i).getBytes());
            put.addColumn("cf2".getBytes(),"age".getBytes(), Bytes.toBytes(i));
            table.put(put);
        }
        table.close();
    }

修改

/**
     *  修改數據
     * @throws IOException
     */
    @Test
    public void testUpdate() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        Put put = new Put("001".getBytes());
        put.addColumn("cf1".getBytes(),"name".getBytes(),"zs2".getBytes());
        put.addColumn("cf2".getBytes(),"age".getBytes(),"18".getBytes());
        table.put(put);

        table.close();
    }

查詢

/**
     *   查詢
     * @throws IOException
     */
    @Test
    public void testScan() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        // 查詢一條
        Get get = new Get("001".getBytes());
        get.addColumn("cf1".getBytes(),"name".getBytes());
        get.addColumn("cf2".getBytes(),"age".getBytes());
        Result result = table.get(get);
        System.out.println(Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())));
        System.out.println(Bytes.toString(result.getValue("cf2".getBytes(), "age".getBytes())));
        table.close();
    }
    @Test
    public void testScan2() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        // 查所有
        Scan scan = new Scan();
        // 設置掃描的族
        scan.addFamily("cf1".getBytes());
        scan.addFamily("cf2".getBytes());
        //  設置開始結束值
        scan.setStartRow("com:002".getBytes());
        scan.setStopRow("com:009".getBytes());
        ResultScanner results = table.getScanner(scan);
        for (Result result : results) {
            String rowkey = Bytes.toString(result.getRow());
            String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
            Integer age  =Bytes.toInt(result.getValue("cf2".getBytes(), "age".getBytes()));
            System.out.println("rowkey|:"+rowkey+"name|"+name+"age|"+age);
        }
        table.close();
    }

測試類完整

package com.baizhi;



import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

/**
 * Unit test for simple App.
 */
public class HBaseTest {
    /**
     *  建立連接和創建對象
     */
    private Connection connection  = null;
    private Admin admin = null;

    @Before
    public void init () throws IOException {
        Configuration configuration = HBaseConfiguration.create();
        // 因爲HBase的服務在zk上註冊， 需要通過zk來獲取HBase服務建立連接
        configuration.set("hbase.zookeeper.quorum","192.168.21.147");
        connection = ConnectionFactory.createConnection(configuration);
         admin = connection.getAdmin();
    }

    @After
    public void destroy() throws IOException {
        if (admin != null) admin.close();
        if (connection != null) connection.close();
    }

    /**
     *  創建Namespace（相當於數據庫）
     * @throws IOException
     */
    @Test
    public void testCreateNamespace() throws IOException {
        NamespaceDescriptor descriptor = NamespaceDescriptor.create("zpark").addConfiguration("author","wzh").build();
        admin.createNamespace(descriptor);
    }

    /**
     *  建表
     * @throws IOException
     */
    @Test
    public void testCreateTable() throws IOException {
        // 定義表名
        HTableDescriptor tableDescriptor = new HTableDescriptor("zpark:u_user");
        // 定義族
        HColumnDescriptor cf1 = new HColumnDescriptor("cf1");
        // 設置版本
        cf1.setMaxVersions(3);
        HColumnDescriptor cf2 = new HColumnDescriptor("cf2");
        // 設置存活時間
        cf2.setTimeToLive(36000);
        // 添加族
        tableDescriptor.addFamily(cf1);
        tableDescriptor.addFamily(cf2);
        //創建 表
        admin.createTable(tableDescriptor);
    }

    /**
     *  添加數據
     * @throws IOException
     */
    @Test
    public void testPutData() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        // 存單個值
        Put put = new Put("001".getBytes());
        put.addColumn("cf1".getBytes(),"name".getBytes(),"zs".getBytes());
        table.put(put);

        table.close();
    }
    @Test
    public void testPutData2() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        // 批量添加
        for (int i = 2; i <10 ; i++) {
            String rowkey = "com";
            if (i < 10){
                rowkey += ":00"+i;
            } else if (i < 100){
                rowkey += ":0"+i;
            } else if(i < 1000) {
                rowkey += ":"+i;
            }
            Put put = new Put(rowkey.getBytes());
            put.addColumn("cf1".getBytes(),"name".getBytes(),("zs"+i).getBytes());
            put.addColumn("cf2".getBytes(),"age".getBytes(), Bytes.toBytes(i));
            table.put(put);
        }
        table.close();
    }

    /**
     *  修改數據
     * @throws IOException
     */
    @Test
    public void testUpdate() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        Put put = new Put("001".getBytes());
        put.addColumn("cf1".getBytes(),"name".getBytes(),"zs2".getBytes());
        put.addColumn("cf2".getBytes(),"age".getBytes(),"18".getBytes());
        table.put(put);

        table.close();
    }

    /**
     *   查詢
     * @throws IOException
     */
    @Test
    public void testScan() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        // 查詢一條
        Get get = new Get("001".getBytes());
        get.addColumn("cf1".getBytes(),"name".getBytes());
        get.addColumn("cf2".getBytes(),"age".getBytes());
        Result result = table.get(get);
        System.out.println(Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes())));
        System.out.println(Bytes.toString(result.getValue("cf2".getBytes(), "age".getBytes())));
        table.close();
    }
    @Test
    public void testScan2() throws IOException {
        // 獲取表
        Table table = connection.getTable(TableName.valueOf("zpark:u_user"));
        // 查所有
        Scan scan = new Scan();
        // 設置掃描的族
        scan.addFamily("cf1".getBytes());
        scan.addFamily("cf2".getBytes());
        //  設置開始結束值
        scan.setStartRow("com:002".getBytes());
        scan.setStopRow("com:009".getBytes());
        ResultScanner results = table.getScanner(scan);
        for (Result result : results) {
            String rowkey = Bytes.toString(result.getRow());
            String name = Bytes.toString(result.getValue("cf1".getBytes(), "name".getBytes()));
            Integer age  =Bytes.toInt(result.getValue("cf2".getBytes(), "age".getBytes()));
            System.out.println("rowkey|:"+rowkey+"name|"+name+"age|"+age);
        }
        table.close();
    }

}

HBase基礎使用篇01