目錄
一、分區概述
分區是指根據一定的規則,數據庫把一個表分解成多個更小的、更容易管理的部分。分區有利於管理非常大的表。
MySQL分區的優點主要包括以下4個方面:
- 和單個磁盤或者文件系統相比,可以存儲更多的數據;
- 優化查詢。where子句包含分區條件時,可以只掃描對應分區,縮小了查詢範圍。同時在涉及count()和sum()等聚合函數時,可以在多個分區上並行處理;
- 對於已經過期或不需要的數據,可以通過刪除分區快速刪除;
- 跨多個磁盤來分散數據查詢,以獲得更大的查詢吞吐量;
查看當前版本是否支持分區,執行
SHOW PLUGINS;
二、分區類型
MySQL5.5之後分區類型主要有五大類:
- RANGE分區:列值在給定範圍內,則屬於該分區;
- LIST分區:和range類似,不同在於,不是範圍而是一組離散值;列值在這組離散值中就在這個分區;
- COLUMNS分區:分爲Range Columns分區和List Columns分區,這兩者分別是range分區和list分區的擴展;
- HASH分區:基於給定的分區個數進行分區;
- KEY分區:類似於HASH分區;
感念比較抽象,結合示例比較好理解
1. Range分區
語法:
partition by range (expr) (
partition pName values less than (val),
.....
)
其中,expr爲列或者基於列的表達式,類型必須爲整數[TINYINT
, SMALLINT
, MEDIUMINT
, INT
(INTEGER
), BIGINT]
pName是分區名稱,可自定義
less than (val) :比val小
val爲臨界值,整型,也可以是運算結果是整型的表達式
示例:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
maxvalue代表最大值(MAXVALUE
is used to represent the least upper bound for the type of integer in question. -MAXVALUE
represents the greatest lower bound.)
示例:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY RANGE ( YEAR(separated) ) (
PARTITION p0 VALUES LESS THAN (1991),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (2001),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
示例:
CREATE TABLE quarterly_report_status (
report_id INT NOT NULL,
report_status VARCHAR(20) NOT NULL,
report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) (
PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ),
PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ),
PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ),
PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ),
PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ),
PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ),
PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ),
PARTITION p9 VALUES LESS THAN (MAXVALUE)
);
由於range僅支持整型,這裏使用函數轉換
2.List分區
語法:
partition by list(expr) (
partition pName values in (val1,val2,...,valx),
.....
)
List分區和Range分區主要區別,在List分區中,每個分區都是基於一組離散值列表,而Range分區是基於連續範圍
示例:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY LIST(store_id) (
PARTITION pNorth VALUES IN (3,5,6,9,17),
PARTITION pEast VALUES IN (1,2,10,11,19,20),
PARTITION pWest VALUES IN (4,12,13,14,18),
PARTITION pCentral VALUES IN (7,8,15,16)
);
注意,如果插入數據不能劃分到任一分區,則插入失敗
示例:
mysql> CREATE TABLE h2 (
-> c1 INT,
-> c2 INT
-> )
-> PARTITION BY LIST(c1) (
-> PARTITION p0 VALUES IN (1, 4, 7),
-> PARTITION p1 VALUES IN (2, 5, 8)
-> );
Query OK, 0 rows affected (0.11 sec)
mysql> INSERT INTO h2 VALUES (3, 5);
ERROR 1525 (HY000): Table has no partition for value 3
該錯誤會導致事物回滾,批量插入時需小心,尤其出現null值時,null值不在列表裏,也會插入失敗
這個時候可以使用 ignore
mysql> INSERT IGNORE INTO h2 VALUES (2, 5), (6, 10), (7, 5), (3, 1), (1, 9);
Query OK, 3 rows affected (0.00 sec)
Records: 5 Duplicates: 2 Warnings: 0
mysql> SELECT * FROM h2;
+------+------+
| c1 | c2 |
+------+------+
| 7 | 5 |
| 1 | 9 |
| 2 | 5 |
+------+------+
3 rows in set (0.00 sec)
3. Columns分區
Columns分區分爲Range Columns和List Columns,分別是Range分區和List分區的變種。Columns分區可以在分區鍵上使用多列。另外,Columns分區支持非整型的列,支持的數據類型如下:
- 所有整型類型:
TINYINT
,SMALLINT
,MEDIUMINT
,INT
(INTEGER
),BIGINT
. (和Range分區及List分區一致),其他數字類型(如:DECIMAL
或FLOAT
)不支持 DATE
和DATETIME,其他日期類型不支持
- 字符類型包括:
CHAR
,VARCHAR
,BINARY
,VARBINARY,
TEXT
和BLOB不支持
3.1 Range Columns分區
語法:
partition by range columns( column_name [,column_name] [,...] ) (
partition pName values less than ( val [,val] [,...] ) ,
....
)
Range Columns分區和Range分區不同:
- Range Columns不接受表達式,僅支持列(column_name)
- Range Columns可以接受多個列
- Range Columns不限於整數列,支持string,
DATE
和DATETIME
示例:
CREATE TABLE rcx (
a INT,
b INT,
c CHAR(3),
d INT
)
PARTITION BY RANGE COLUMNS(a,d,c) (
PARTITION p0 VALUES LESS THAN (5,10,'ggg'),
PARTITION p1 VALUES LESS THAN (10,20,'mmm'),
PARTITION p2 VALUES LESS THAN (15,30,'sss'),
PARTITION p3 VALUES LESS THAN (MAXVALUE,MAXVALUE,MAXVALUE)
);
注意:column_name的數量和val數量要一致,並一一對應;這裏比較是元組的整體比較而不是單個數值
CREATE TABLE rc1 (
a INT,
b INT
)
PARTITION BY RANGE COLUMNS(a, b) (
PARTITION p0 VALUES LESS THAN (5, 12),
PARTITION p1 VALUES LESS THAN (MAXVALUE, MAXVALUE)
);
mysql> INSERT INTO rc1 VALUES (5,10), (5,11), (5,12);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT PARTITION_NAME,TABLE_ROWS
-> FROM INFORMATION_SCHEMA.PARTITIONS
-> WHERE TABLE_NAME = 'rc1';
+--------------+----------------+------------+
| TABLE_SCHEMA | PARTITION_NAME | TABLE_ROWS |
+--------------+----------------+------------+
| p | p0 | 2 |
| p | p1 | 1 |
+--------------+----------------+------------+
2 rows in set (0.00 sec)
這裏可以看出(5,10)(5,11)都進入了p0,(5,12)進入p1(information_schema下的partitions表中保存了各表分區信息)
mysql> SELECT (5,10) < (5,12), (5,11) < (5,12), (5,12) < (5,12);
+-----------------+-----------------+-----------------+
| (5,10) < (5,12) | (5,11) < (5,12) | (5,12) < (5,12) |
+-----------------+-----------------+-----------------+
| 1 | 1 | 0 |
+-----------------+-----------------+-----------------+
1 row in set (0.00 sec)
示例:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
);
ALTER TABLE employees PARTITION BY RANGE COLUMNS (hired) (
PARTITION p0 VALUES LESS THAN ('1970-01-01'),
PARTITION p1 VALUES LESS THAN ('1980-01-01'),
PARTITION p2 VALUES LESS THAN ('1990-01-01'),
PARTITION p3 VALUES LESS THAN ('2000-01-01'),
PARTITION p4 VALUES LESS THAN ('2010-01-01'),
PARTITION p5 VALUES LESS THAN (MAXVALUE)
);
3.2 List Columns分區
List Columns分區是List分區的變種,主要區別是支持的列類型增多
CREATE TABLE customers_1 (
first_name VARCHAR(25),
last_name VARCHAR(25),
street_1 VARCHAR(30),
street_2 VARCHAR(30),
city VARCHAR(15),
renewal DATE
)
PARTITION BY LIST COLUMNS(city) (
PARTITION pRegion_1 VALUES IN('Oskarshamn', 'Högsby', 'Mönsterås'),
PARTITION pRegion_2 VALUES IN('Vimmerby', 'Hultsfred', 'Västervik'),
PARTITION pRegion_3 VALUES IN('Nässjö', 'Eksjö', 'Vetlanda'),
PARTITION pRegion_4 VALUES IN('Uppvidinge', 'Alvesta', 'Växjo')
);
CREATE TABLE customers_2 (
first_name VARCHAR(25),
last_name VARCHAR(25),
street_1 VARCHAR(30),
street_2 VARCHAR(30),
city VARCHAR(15),
renewal DATE
)
PARTITION BY LIST COLUMNS(renewal) (
PARTITION pWeek_1 VALUES IN('2010-02-01', '2010-02-02', '2010-02-03',
'2010-02-04', '2010-02-05', '2010-02-06', '2010-02-07'),
PARTITION pWeek_2 VALUES IN('2010-02-08', '2010-02-09', '2010-02-10',
'2010-02-11', '2010-02-12', '2010-02-13', '2010-02-14'),
PARTITION pWeek_3 VALUES IN('2010-02-15', '2010-02-16', '2010-02-17',
'2010-02-18', '2010-02-19', '2010-02-20', '2010-02-21'),
PARTITION pWeek_4 VALUES IN('2010-02-22', '2010-02-23', '2010-02-24',
'2010-02-25', '2010-02-26', '2010-02-27', '2010-02-28')
);
4. Hash分區
語法:
partition by hash (expr) partitions num
其中,expr爲整型列或者返回值爲整型的表達式,num爲正整數,分區個數
示例:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY HASH(store_id)
PARTITIONS 4;
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY HASH( YEAR(hired) )
PARTITIONS 4;
Hash分區主要是用來確保數據均勻的分佈在各分區
算法邏輯:
N
= MOD(expr
, num
)
即expr對分區個數num取餘,得到所在分區
注意:expr的值需要一定區分度,即儘量滿足線性均勻分佈;否則會導致大量數據集中在某一個分區,導致分區效果不明顯
4.1 Linear Hash分區(線性Hash分區)
語法:
partition by linear hash (expr) partitions num
和Hash分區不同點:
- 語法上多了linear
- 求取N的算法不再是簡單的取餘,而是一套複雜的邏輯,有興趣的可以參考Linear Hash分區
和Hash分區比優勢在於,分區的添加,刪除,合併和拆分速度更快,這在處理包含大量數據(TB)的表時可能會很有用。缺點是,與使用常規哈希分區獲得的分佈相比,數據不太可能在分區之間均勻分佈
示例:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY LINEAR HASH( YEAR(hired) )
PARTITIONS 4;
5. Key分區
語法:
partition by key( [column_name] [,column_name] [,...] ) partitions num
和Hash分區類似,不同在於:
- Key分區僅能使用column
- Key分區支持多個列
- Key分區支持整型外的其他類型
示例:
CREATE TABLE k1 (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(20)
)
PARTITION BY KEY() --- 默認使用主鍵
PARTITIONS 2;
CREATE TABLE k1 (
id INT NOT NULL,
name VARCHAR(20),
UNIQUE KEY (id)
)
PARTITION BY KEY() --- 使用唯一索引列
PARTITIONS 2;
CREATE TABLE tm1 (
s1 CHAR(32) PRIMARY KEY
)
PARTITION BY KEY(s1)
PARTITIONS 10;
5.1 Linear Key分區(線性Key分區)
和Linear Hash分區邏輯相似
示例:
CREATE TABLE tk (
col1 INT NOT NULL,
col2 CHAR(5),
col3 DATE
)
PARTITION BY LINEAR KEY (col1)
PARTITIONS 3;
三、注意
- 如果有主鍵,則主鍵必須參與分區
轉載請註明出處,整理不易,感謝讚賞~~~~~~~~~~~~~~~~~~~