本文的主线 概念 => 分区 => 原理 => 优点 => 缺点 => 表格存储
本文基于Phoenix搭建
概念
HBase sequential write may suffer from region server hotspotting if your row key is monotonically increasing
Salting the row key provides a way to mitigate the problem
Phoenix provides a way to transparently salt the row key with a salting byte for a particular table. You need to specify this in table creation time by specifying a table property “SALT_BUCKETS” with a value from 1 to 256
分区
CREATE TABLE IF NOT EXISTS t_normal (
id VARCHAR PRIMARY KEY,
name VARCHAR,
age INTEGER,
address VARCHAR
);
UPSERT INTO t_normal VALUES('id1', 'XiaoWang', 22, 'London');
UPSERT INTO t_normal VALUES('id2', 'XiaoWeng', 18, 'New York');
./hbase-2.0.0/bin/hbase shell
scan 'T_NORMAL'
ROW COLUMN+CELL
id1 column=0:\x00\x00\x00\x00, timestamp=1609143024854, value=x
id1 column=0:\x80\x0B, timestamp=1609143024854, value=XiaoWang
id1 column=0:\x80\x0C, timestamp=1609143024854, value=\x80\x00\x00\x16
id1 column=0:\x80\x0D, timestamp=1609143024854, value=London
id2 column=0:\x00\x00\x00\x00, timestamp=1609143028213, value=x
id2 column=0:\x80\x0B, timestamp=1609143028213, value=XiaoWeng
id2 column=0:\x80\x0C, timestamp=1609143028213, value=\x80\x00\x00\x12
id2 column=0:\x80\x0D, timestamp=1609143028213, value=New York
2 row(s)
Took 0.1361 seconds
CREATE TABLE IF NOT EXISTS t_salt (
id VARCHAR PRIMARY KEY,
name VARCHAR,
age INTEGER,
address VARCHAR
) SALT_BUCKETS = 5;
UPSERT INTO t_salt VALUES('id1', 'XiaoWang', 22, 'London');
UPSERT INTO t_salt VALUES('id2', 'XiaoWeng', 18, 'New York');
./hbase-2.0.0/bin/hbase shell
scan 'T_SALT'
ROW COLUMN+CELL
\x00id1 column=0:\x00\x00\x00\x00, timestamp=1609143085641, value=x
\x00id1 column=0:\x80\x0B, timestamp=1609143085641, value=XiaoWang
\x00id1 column=0:\x80\x0C, timestamp=1609143085641, value=\x80\x00\x00\x16
\x00id1 column=0:\x80\x0D, timestamp=1609143085641, value=London
\x01id2 column=0:\x00\x00\x00\x00, timestamp=1609143089175, value=x
\x01id2 column=0:\x80\x0B, timestamp=1609143089175, value=XiaoWeng
\x01id2 column=0:\x80\x0C, timestamp=1609143089175, value=\x80\x00\x00\x12
\x01id2 column=0:\x80\x0D, timestamp=1609143089175, value=New York
2 row(s)
Took 0.5924 seconds
原理
new_row_key = (++index % BUCKETS_NUMBER) + original_row_key
优点
Using salted table with pre-split would help uniformly distribute write workload across all the region servers, thus improves the write performance
Reading from salted table can also reap benefits from the more uniform distribution of data
缺点
- When doing a parallel scan across all region servers, we can take advantage of this properties to perform a merge sort of the client side
表格存储
主键是数据表中每一行的唯一标识 主键由1到4个主键列组成
组成主键的第一个主键列又称为分区键
表格存储会根据数据表中每一行分区键的值所属的范围自动将一行数据分配到对应的分区和机器上 以达到负载均衡的目的