MySQL中bit類型的應用(1)

MySQL中的bit類型, 支持1-64 個bit長度. 我們可以用bit(1)來保存一個"是否是xxx"的數據, 雖然我們通常使用tinyint這個類型.

但是如果我們有多個"是否xxx"這種欄位, 一般來說就需要設置多個欄位, 而且需要設置多個索引.

對於這種只有1和0的欄位, 因爲重複率很高, 索引是否會被使用, 也要看查詢引擎的分析以及取捨.

我們現在就以3個這種欄位的簡單需求來比較一下使用bit和tinyint的查詢差異.

我們把這3種類型定義爲t1, t2, t3. 一般我們就建這3個欄位. 某一個類型爲true就設置爲1, 否則設置0.

mysql會把bit類型當做整數來識別. 我們可以使用位操作, 也可以直接與整數比較大小.

如果用一個bit(3)來表示, 就是 b'xxx', 每一個類型佔一個bit. 對應的整數值爲: t1 * 1+ t2 * 2 + t3 *4

首先來定義一個表:

USE test;
CREATE TABLE `test`.`test_bit`(
    `id` int unsigned NOT NULL AUTO_INCREMENT,
    `types` bit(3) NOT NULL COMMENT '類型(000): 類型1使用第一位, 類型2使用第二位, 類型3使用第三位(從右往左)',
    `t1` tinyint NOT NULL COMMENT '是否是類型1. 是=1, 否=0',
    `t2` tinyint NOT NULL COMMENT  '是否是類型2. 是=1, 否=0',
    `t3` tinyint NOT NULL COMMENT  '是否是類型3. 是=1, 否=0',
    PRIMARY KEY(`id`)
) ENGINE=InnoDB COMMENT 'bit操作測試';

現在我們建立一個存儲過程, 來批量插入數據:

USE test;
DROP PROCEDURE IF EXISTS `mysp_test_bit`;
DELIMITER $$
CREATE PROCEDURE `mysp_test_bit`(IN count int)
BEGIN
    DECLARE i int default 0;
    -- DECLARE rnd int;
    SET i = 0;
    WHILE i < count do
        INSERT INTO `test`.`test_bit`(`types`,`t1`,`t2`,`t3`)
        SELECT a.t , a.t1, a.t2, a.t3 
        FROM (
            SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重複1
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重複2
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重複3
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重複4
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重複5
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重複6
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
            -- 重複7
            UNION ALL SELECT b'001' AS t,1 AS t1,0 AS t2,0 AS t3
            UNION ALL SELECT b'010' AS t,0 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'100' AS t,0 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'011' AS t,1 AS t1,1 AS t2,0 AS t3 UNION ALL SELECT b'101' AS t,1 AS t1,0 AS t2,1 AS t3
            UNION ALL SELECT b'110' AS t,0 AS t1,1 AS t2,1 AS t3 UNION ALL SELECT b'111' AS t,1 AS t1,1 AS t2,1 AS t3
        ) AS a  ORDER BY rand() LIMIT 0,50; -- 從56箇中隨機取50個
        SET i = i +1;
    END WHILE;
END$$
DELIMITER ;

在這個存儲過程中, 我們把1-7的值重複8遍爲56個數, 然後隨機取50個(這裏暫不考慮t1,t2,t3全部爲0的情況).

執行2w次, 生成100w行數據:

CALL mysp_test_bit(20000);

好了, 100w的數據插入成功, 現在來添加索引:

ALTER TABLE `test`.`test_bit` ADD INDEX IX_types(`types`), ADD INDEX IX_t1(`t1`), ADD INDEX IX_t2(`t2`), ADD INDEX IX_t3(`t3`);

看下數據分佈情況:

首先查詢單個條件符合的情況

先看 t1=1的

對應的bit方式爲 b'xx1', 包括001, 011,101,111, 對應的數值爲:1,3,5,7

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'001' =1;-- 查詢第1位爲1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(1,3,5,7);-- 查詢第1位爲1的,時間與上面的差不多
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t1=1;-- 查詢第1位爲1的

執行4次的時間爲:

上圖中, 爲了容易區分, 把常用的第三種查詢方式的執行時間, 用紅框圈出來了.

查詢t2=1的情況

對應的bit方式爲b'x1x',包括010,011,110,111對應的數值爲: 2,3,6,7

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'010' =2;-- 查詢第2位爲1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types`IN(2,3,6,7);-- 查詢第2位爲1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t2=1;-- 查詢第2位爲1的

執行4次的時間爲:

查詢t3=1的情況

對應的bit方式爲b'1xx',包括 100, 101, 110, 111 對應的數值爲: 4,5,6,7

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'100' =4;-- 查詢第3位爲1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(4,5,6,7);-- 查詢第3位爲1的: 
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t3=1;-- 查詢第3位爲1的

執行4次的時間爲:

總體上來看, 在查詢一個條件時, 單個欄位的方式效率比較好, 因爲bit方式使用了數據處理(這個一般是不推薦的)或IN查詢

現在我們來查詢2個欄位同時符合要求的數據

查詢第t1,t2都爲1的

對應的bit爲 b'x11',包括: 011,111,對應的數值爲 3,7

數據總數: 285767

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'011' =3;-- 查詢第1,2位都爲1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(3,7);-- 查詢第1,2位都爲1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t1=1 AND t2=1;-- 查詢第1,2位都爲1的

執行4次的時間爲:

查詢第t2,t3都爲1的

對應的bit是b'11x',包括110,111,對應的數值爲: 6,7

數據總數: 285683

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'110' =6;-- 查詢第2,3位都爲1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(6,7);-- 查詢第2,3位都爲1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t2=1 AND t3=1;-- 查詢第2,3位都爲1的

執行4次的時間爲:

查詢第t1,t3都爲1的

對應的bit是b'1x1',包括101,111,對應的數值爲: 5,7

數據總數: 285641

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'101' =5;-- 查詢第1,3位都爲1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` IN(5,7);-- 查詢第1,3位都爲1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t1=1 AND t3=1;-- 查詢第1,3位都爲1的

執行4次的時間爲:

查詢t1, t2, t3都爲1的

數據總數: 142836

SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` & b'111' =7;-- 查詢第1,2,3位都爲1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE `types` = 7;-- 查詢第1,2,3位都爲1的
SELECT SQL_NO_CACHE COUNT(*) AS e FROM `test`.`test_bit` WHERE t1=1 AND t2=1 AND t3=1;-- 查詢第1,2,3位都爲1的

執行4次的時間爲:

建立t1, t2, t3的聯合索引

使用兩兩聯合索引, 進行單個、兩個、三個欄位的查詢

ALTER TABLE `test`.`test_bit` DROP INDEX IX_t1, DROP INDEX IX_t2, DROP INDEX IX_t3;
ALTER TABLE `test`.`test_bit`  ADD INDEX IX_t1t2(t1,t2), ADD INDEX IX_t1t3(t1,t3), ADD INDEX IX_t2t3(t2,t3);

單個欄位符合條件的查詢:

單個欄位符合條件的只查詢一次

兩個欄位同時符合條件的查詢:

兩個欄位同時符合條件的只查詢一次

三個欄位同時符合條件的查詢:

三個欄位同時符合條件的我們查詢了3次

使用3個欄位的聯合索引

ALTER TABLE `test`.`test_bit` DROP INDEX IX_t1t2, DROP INDEX IX_t1t3, DROP INDEX IX_t2t3;
ALTER TABLE `test`.`test_bit`  ADD INDEX IX_t1t2t3(t1,t2,t3);

單個欄位符合條件的查詢:

單個欄位符合條件的只查詢一次

兩個欄位同時符合條件的查詢:

兩個欄位同時符合條件的只查詢一次

三個欄位同時符合條件的查詢:

三個欄位同時符合條件的我們查詢了3次

大家還可以看下3種查詢方式的explain, 以3個聯合索引爲例

  • types & b'011' =3 這種操作, 是走 Full Index Scan
  • types IN(3,7) 這種操作, 是走Index Range Scan
  • t1=1 AND t2=1 這種操作, 是走Non_Unique Key Lookup

Final, 都不要索引, 看一下

ALTER TABLE `test`.`test_bit`  DROP INDEX IX_t1t2t3, DROP INDEX IX_types;

對於這種有3個左右欄位需要進行1和0設置的, 使用一個欄位做索引 + IN查詢的方式, 效果是比多個索引要好, 只是查詢條件的處理會稍微複雜一些.

對於僅僅依靠MySQL自身的優化來說, 還是能起到一些效果的. 畢竟索引重複率太高以及索引列太多, 都不太友好.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章