INFOBRIGHT 引擎下的DISTINCT的內存爆:
mysql> explain SELECT count(DISTINCT wuuid) as vnum FROM visit_info WHERE (1=1) and usertype in (-1) and (begintime >= 1380556800 and begintime <= 1387209600);
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
| 1 | SIMPLE | visit_info | ALL | NULL | NULL | NULL | NULL | 37778749 | Using where with pushed condition |
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
1 row in set (0.65 sec)
mysql> explain SELECT count(DISTINCT uuid) as vnum FROM visit_info WHERE (1=1) and usertype in (-1) and (begintime >= 1380556800 and begintime <= 1387209600);
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
| 1 | SIMPLE | visit_info | ALL | NULL | NULL | NULL | NULL | 37778749 | Using where with pushed condition |
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
1 row in set (0.00 sec)
mysql> SELECT count(DISTINCT uuid) as vnum FROM visit_info WHERE (1=1) and usertype in (-1) and (begintime >= 1380556800 and begintime <= 1387209600);
ERROR 9 (HY000): Brighthouse out of resources error: Insufficient memory/disk space
mysql> SELECT count( uuid) as vnum FROM visit_info WHERE (1=1) and usertype in (-1) and (begintime >= 1380556800 and begintime <= 1387209600);
+----------+
| vnum |
+----------+
| 37196097 |
+----------+
1 row in set (2.82 sec)
Distinct實現原理:
在數據庫的設計中,如何實現Distinct操作呢?一般有兩種基本思路:排序(Sort)法,哈希(Hash)法。
排序法將表格中的數據全部按照distinct指定的列爲key進行排序,然後逐行迭代,
每迭代出一行數據都與上一行數據根據key作對比,如果相同,則丟棄當前行繼續迭代下一行,
如果不同則輸出。排序法帶來的一個副作用就是數據輸出按照key有序。
哈希法將表格中的數據全部按照distinct指定的列值爲key作爲hash key進行分桶,key相同的行自然就被區分出來了。
排序法在具體實現中會遇到這麼一些問題:
1. 數據集超出了內存限制,如何排序?
2. 如何實現可以儘可能減少數據拷貝?
3. 如果已經有了Sort運算符,如何實現代碼重用。
問題:上面兩種方法在內存佔用上那個更省?
最後說個題外話,distinct跟groupby蠻像的,那麼他們的區別又在哪裏呢? 簡單地說,distinct是一種很弱的groupby。詳細見網上轉載的一篇文章:
參考博客:
http://blog.csdn.net/maray/article/details/7634543