前言
- MySQL:5.7.17
- 存儲引擎:InnoDB
- 實驗目的:本文主要測試在某字段有無索引、各種不同值個數情況下,記錄對此字段其使用
DISTINCT/GROUP BY
去重的查詢語句執行時間,對比兩者在不同場景下的去重性能,實驗過程中關閉MySQL查詢緩存。 - 實驗表格:
表名 | 記錄數 | 查詢字段有無索引 | 查詢字段不同值個數 | DISTINCT | GROUP BY |
---|---|---|---|---|---|
tab_1 | 100000 | N | 3 | ||
tab_2 | 100000 | Y | 3 | ||
tab_3 | 100000 | N | 10000 | ||
tab_4 | 100000 | Y | 10000 |
實驗過程
1)創建測試表
表創建語句:
DROP TABLE IF EXISTS `tab_1`;
CREATE TABLE `tab_1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `tab_2`;
CREATE TABLE `tab_2` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_value` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `tab_3`;
CREATE TABLE `tab_3` LIKE `tab_1`;
DROP TABLE IF EXISTS `tab_4`;
CREATE TABLE `tab_4` LIKE `tab_2`;
2)生成測試數據
表數據插入過程:
DROP PROCEDURE IF EXISTS generateRandomData;
delimiter $$
-- tblName爲插入表,field爲插入字段,num爲插入字段值上限,count爲插入的記錄數
CREATE PROCEDURE generateRandomData(IN tblName VARCHAR(30),IN field VARCHAR(30),IN num INT UNSIGNED,IN count INT UNSIGNED)
BEGIN
-- 聲明循環變量
DECLARE i INT UNSIGNED DEFAULT 1;
-- 循環插入隨機整數1~num,共插入count條數據
w1:WHILE i<=count DO
set i=i+1;
set @val = FLOOR(RAND()*num+1);
set @statement = CONCAT('INSERT INTO ',tblName,'(`',field,'`) VALUES(',@val,')');
PREPARE stmt FROM @statement;
EXECUTE stmt;
END WHILE w1;
END $$
delimiter ;
調用過程隨機生成測試數據:
call generateRandomData('tab_1','value',3,100000);
INSERT INTO tab_2 SELECT * FROM tab_1;
call generateRandomData('tab_3','value',10000,100000);
INSERT INTO tab_4 SELECT * FROM tab_3;
3)執行查詢語句,記錄執行時間
查詢語句及對應執行時間如下:
SELECT DISTINCT(`value`) FROM tab_1;
SELECT `value` FROM tab_1 GROUP BY `value`;
SELECT DISTINCT(`value`) FROM tab_2;
SELECT `value` FROM tab_2 GROUP BY `value`;
SELECT DISTINCT(`value`) FROM tab_3;
SELECT `value` FROM tab_3 GROUP BY `value`;
SELECT DISTINCT(`value`) FROM tab_4;
SELECT `value` FROM tab_4 GROUP BY `value`;
4)實驗結果
表名 | 記錄數 | 查詢字段有無索引 | 查詢字段不同值個數 | DISTINCT | GROUP BY |
---|---|---|---|---|---|
tab_1 | 100000 | N | 3 | 0.058s | 0.059s |
tab_2 | 100000 | Y | 3 | 0.030s | 0.027s |
tab_3 | 100000 | N | 10000 | 0.072s | 0.073s |
tab_4 | 100000 | Y | 10000 | 0.047s | 0.049s |
實驗結論
MySQL 5.7.17中使用distinct和group by進行去重時,性能相差不大
實驗過程及結論,如有不足之處,歡迎指正,此實驗結論僅供參考。