HBase表設計 - 二級索引Secondary Indexing

本文的主線 概念 => 主鍵 => 配置 => 索引

本文基於HBase表設計 - 生存時間Time To Live

概念

  • Secondary indexes are an orthogonal way to access data from its primary access path.

  • In HBase, you have a single index that is lexicographically sorted on the primary row key. Access to records in any way other than through the primary row requires scanning over potentially all the rows in the table to test them against your filter.

  • With secondary indexing, the columns or expressions you index form an alternate row key to allow point lookups and range scans along this new axis.

主鍵

  • 單一主鍵
CREATE TABLE IF NOT EXISTS t_gps (
    row_key VARCHAR NOT NULL PRIMARY KEY,
    id BIGINT,
    city VARCHAR,
    created_ts BIGINT
);

UPSERT INTO t_gps VALUES('3001-1608605947',3001,'beijing',1608605947);

UPSERT INTO t_gps VALUES('3001-1608615947',3001,'shanghai',1608615947);
EXPLAIN SELECT * FROM t_gps WHERE row_key > '3001-1608605947';
+--------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                            PLAN                                            | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+--------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER T_GPS ['3001-1608605947'] - [*]  | null            | null           | null         |
+--------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.018 seconds)
  • 複合主鍵
DROP TABLE t_gps;
CREATE TABLE IF NOT EXISTS t_gps (
    id BIGINT NOT NULL,
    city VARCHAR,
    created_ts BIGINT NOT NULL,
    CONSTRAINT pk PRIMARY KEY (id,created_ts)
);

UPSERT INTO t_gps VALUES(3001,'beijing',1608605947);

UPSERT INTO t_gps VALUES(3001,'shanghai',1608615947);
EXPLAIN SELECT * FROM t_gps WHERE id = 3001 AND created_ts > 1608605947;
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                             PLAN                                              | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER T_GPS [3001,1608605948] - [3001,*]  | null            | null           | null         |
+-----------------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.017 seconds)

配置

vim ~/hbase-2.0.0/conf/hbase-site.xml
<configuration>
    <property>
        <name>hbase.regionserver.wal.codec</name>
        <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
    </property>
</configuration>
~/hbase-2.0.0/bin/stop-hbase.sh

~/hbase-2.0.0/bin/start-hbase.sh

索引

EXPLAIN SELECT * FROM t_gps WHERE created_ts > 1608605947;
+-----------------------------------------------------------------+-----------------+----------------+--------------+
|                              PLAN                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T_GPS  | null            | null           | null         |
|     SERVER FILTER BY CREATED_TS > 1608605947                    | null            | null           | null         |
+-----------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.017 seconds)
CREATE INDEX IF NOT EXISTS ts_idx ON t_gps (created_ts DESC);

EXPLAIN SELECT * FROM t_gps WHERE created_ts > 1608605947;
+-----------------------------------------------------------------+-----------------+----------------+--------------+
|                              PLAN                               | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+-----------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T_GPS  | null            | null           | null         |
|     SERVER FILTER BY CREATED_TS > 1608605947                    | null            | null           | null         |
+-----------------------------------------------------------------+-----------------+----------------+--------------+
2 rows selected (0.018 seconds)
DROP INDEX ts_idx ON t_gps;

CREATE INDEX IF NOT EXISTS ts_idx ON t_gps (created_ts DESC) INCLUDE (id, city);

EXPLAIN SELECT * FROM t_gps WHERE created_ts > 1608605947;
+---------------------------------------------------------------------------------------+-----------------+----------------+--------------+
|                                         PLAN                                          | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
+---------------------------------------------------------------------------------------+-----------------+----------------+--------------+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER TS_IDX [*] - [~1608605947]  | null            | null           | null         |
+---------------------------------------------------------------------------------------+-----------------+----------------+--------------+
1 row selected (0.023 seconds)

參考

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章