本文總結個這段時間研究MySQL水平分區表總結,列舉分區表的相關操作和通過實際數據對分區表讀寫的性能比較.
在網上看了很多文章,都太過於概念,注意集中在介紹分區表的優點,而不注重時間操作,跟大學教授論文似的,唯一由於的一片文章和大家分享一下吧http://fanqiang.chinaunix.net/db/mysql/2006-05-08/4135.shtml.
MySQL分區表操作代碼(本案例按月分區):
1. 創建分區表:
CREATE TABLE `表名` (
`EQUIPMENTID`char(17) NOT NULL,
`ATTRIBUTEID`char(4) NOT NULL,
`VALUE`varchar(20) NOT NULL,
`COLLECTTIME`datetime NOT NULL
) ENGINE=InnoDB(適用大部分引擎,可根據需要調整) DEFAULT CHARSET=latin1
PARTITION BY RANGE(to_days(`時間字段名`))
(PARTITION pminVALUES LESS THAN (to_days('2010-01-01')),
PARTITION p201001VALUES LESS THAN (to_days('2010-02-01')) ,
PARTITION p201002VALUES LESS THAN (to_days('2010-03-01')) ,
PARTITION p201003VALUES LESS THAN (to_days('2010-04-01')) ,
PARTITION p201004VALUES LESS THAN (to_days('2010-05-01')) ,
PARTITION p201005VALUES LESS THAN (to_days('2010-06-01')) ,
PARTITION p201006VALUES LESS THAN (to_days('2010-07-01')) ,
PARTITION p201007VALUES LESS THAN (to_days('2010-08-01')) ,
PARTITION p201008VALUES LESS THAN (to_days('2010-09-01')) ,
PARTITION p201009VALUES LESS THAN (to_days('2010-10-01')) ,
PARTITION p201010VALUES LESS THAN (to_days('2010-11-01')),
PARTITION p201011VALUES LESS THAN (to_days('2010-12-01')),
PARTITION p201012VALUES LESS THAN (to_days('2011-01-01')),
PARTITION pmax VALUESLESS THAN MAXVALUE );
2. 爲現有表創建分區:
alter table 表名
PARTITION BY RANGE(to_days(`時間字段名`))
(PARTITION pminVALUES LESS THAN (to_days('2010-01-01')),
PARTITION p201001VALUES LESS THAN (to_days('2010-02-01')) ,
PARTITION p201002VALUES LESS THAN (to_days('2010-03-01')) ,
PARTITION p201003VALUES LESS THAN (to_days('2010-04-01')) ,
PARTITION p201004 VALUESLESS THAN (to_days('2010-05-01')) ,
PARTITION p201005VALUES LESS THAN (to_days('2010-06-01')) ,
PARTITION p201006VALUES LESS THAN (to_days('2010-07-01')) ,
PARTITION p201007VALUES LESS THAN (to_days('2010-08-01')) ,
PARTITION p201008VALUES LESS THAN (to_days('2010-09-01')) ,
PARTITION p201009VALUES LESS THAN (to_days('2010-10-01')) ,
PARTITION p201010VALUES LESS THAN (to_days('2010-11-01')),
PARTITION p201011VALUES LESS THAN (to_days('2010-12-01')),
PARTITION p201012VALUES LESS THAN (to_days('2011-01-01')),
PARTITION pmax VALUESLESS THAN MAXVALUE );
3. 刪除表中的指定分區(刪除分區會導致分區數據丟失,建議先備份):
ALTERTABLE 表名DROP PARTITION p0;
4. 追加表分區
需要先刪除MAXVALUE分區後增加分區後再重建MAXVALUE分區,刪除前需要先備份MAXVALUE分區數據.
ALTER TABLE 表名 DROPPARTITION pmax;
ALTER TABLE表名
ADD PARTITION (
PARTITION p201201VALUES LESS THAN (to_days('2012-2-1')),
PARTITION pmax VALUESLESS THAN MAXVALUE);
5. 查看標分區信息
SELECT
partition_namepart,
partition_expressionexpr,
partition_descriptiondescr,
table_rows
FROM
INFORMATION_SCHEMA.partitions
WHERE
TABLE_SCHEMA= schema()
AND
TABLE_NAME='表名';
6. 查看查詢語句涉及分區信息
explainpartitions
select …from 表名 where …;
性能對比:
1. 測試環境
CPU: Intel 奔騰雙核 E5300
硬盤: 西數(320GB/7200/16M 藍盤)
內存: 南亞易勝 DDR2 800MHz 1GB + 三星 DDR2 800MHz 1GB
操作系統:Windows XP
MySQL版本: 5.1.57(5.1+版本支持分區表)
2. 表信息
表結構:
名 |
類型 |
長度 |
|
EQUIPMENTID |
char |
17 |
主鍵1 |
ATTRIBUTEID |
char |
4 |
主鍵2 |
VALUE |
varchar |
20 |
|
COLLECTTIME |
datetime |
|
主鍵3 |
總記錄數:580W
分區信息(紅色爲主要測試區域):
part |
expr |
descr |
table_rows |
pmin |
to_days(COLLECTTIME) |
734138 |
2686 |
p201001 |
to_days(COLLECTTIME) |
734169 |
2511883 |
p201002 |
to_days(COLLECTTIME) |
734197 |
192497 |
p201003 |
to_days(COLLECTTIME) |
734228 |
811103 |
p201004 |
to_days(COLLECTTIME) |
734258 |
82894 |
p201005 |
to_days(COLLECTTIME) |
734289 |
109297 |
p201006 |
to_days(COLLECTTIME) |
734319 |
555065 |
p201007 |
to_days(COLLECTTIME) |
734350 |
742949 |
p201008 |
to_days(COLLECTTIME) |
734381 |
525900 |
p201009 |
to_days(COLLECTTIME) |
734411 |
89 |
p201010 |
to_days(COLLECTTIME) |
734442 |
71665 |
p201011 |
to_days(COLLECTTIME) |
734472 |
85964 |
p201012 |
to_days(COLLECTTIME) |
734503 |
1612 |
p201101 |
to_days(COLLECTTIME) |
734534 |
176 |
p201102 |
to_days(COLLECTTIME) |
734562 |
253 |
p201103 |
to_days(COLLECTTIME) |
734593 |
44824 |
p201104 |
to_days(COLLECTTIME) |
734623 |
62324 |
p201105 |
to_days(COLLECTTIME) |
734654 |
50658 |
p201106 |
to_days(COLLECTTIME) |
734684 |
0 |
p201107 |
to_days(COLLECTTIME) |
734715 |
0 |
p201108 |
to_days(COLLECTTIME) |
734746 |
0 |
p201109 |
to_days(COLLECTTIME) |
734776 |
0 |
p201110 |
to_days(COLLECTTIME) |
734807 |
0 |
p201111 |
to_days(COLLECTTIME) |
734837 |
0 |
p201112 |
to_days(COLLECTTIME) |
734868 |
0 |
p201201 |
to_days(COLLECTTIME) |
734899 |
0 |
p201202 |
to_days(COLLECTTIME) |
734928 |
0 |
pmax |
to_days(COLLECTTIME) |
MAXVALUE |
921 |
3. 查詢效率對比
對比表:無分區表名nopart_data,有分區表名part_data
查詢條件:select count(*) from 表名 where COLLECTTIME > 起始時間 and COLLECTTIME < 終止時間
查詢耗時按照3次平均值統計
統計表:
開始時間 |
結束時間 |
查詢結果 |
無分區耗時 |
有分區耗時 |
涉及分區 |
全部 |
5848859 |
6.26s |
9.58s |
全部 |
|
2010-5-1 |
2010-6-1 |
109086 |
7.04s |
0.48s |
pmin,p201005 |
2010-6-1 |
2010-7-1 |
554695 |
8.34s |
0.38s |
pmin,p201006 |
2010-7-1 |
2010-8-1 |
742565 |
7.57s |
0.43s |
pmin,p201007 |
2010-5-1 |
2010-7-1 |
663781 |
7.07s |
0.51s |
pmin,p201005,p201006 |
2010-6-1 |
2010-8-1 |
1297260 |
6.84s |
1.93s |
pmin,p201006,p201007 |
2010-5-1 |
2010-8-1 |
1406346 |
6.97s |
2.30s |
pmin,p201006,p201007,p201008 |
小結:
1) 分區表查詢在查詢上有明顯優勢.但在跨區查詢時會有查詢時間消耗,因此需要注意分區的疏密程度.
2) 每次查詢都會查詢pmin(第一個分區),因此需要儘量減少這個分區的數據.
4. 寫入數據效率對比
COLLECTTIME |
無分區耗時 |
有分區耗時 |
2010-5-22 15:36 |
0.05s |
0.03s |
2010-6-22 15:36 |
0.02s |
0.05s |
2010-7-22 15:36 |
0.03s |
0.03s |
小結:
1) 分區對單條數據的插入操作無較大影響.
以上是我對MySQL的初體驗總結,沒啥心得體會,只有一點點成就感,希望和大家分享.
另外分區表尚存在問題:
1,是否可將分區表設置在不同硬盤,innodb可行?
2,是否可根據多條件進行水平分區,類似group by 列1,列2...
3,是否能將分區設置成不同引擎,例如當前使用中的分區爲innodb,老的分區使用MyISAM