infobright distinct 內存被爆掉

INFOBRIGHT 引擎下的DISTINCT的內存爆:


mysql> explain SELECT count(DISTINCT wuuid) as vnum  FROM visit_info WHERE (1=1) and usertype in (-1) and (begintime >= 1380556800 and begintime <= 1387209600);
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows     | Extra                             |
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
|  1 | SIMPLE      | visit_info | ALL  | NULL          | NULL | NULL    | NULL | 37778749 | Using where with pushed condition |
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
1 row in set (0.65 sec)

mysql> explain  SELECT count(DISTINCT uuid) as vnum  FROM visit_info WHERE (1=1) and usertype in (-1) and (begintime >= 1380556800 and begintime <= 1387209600);
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows     | Extra                             |
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
|  1 | SIMPLE      | visit_info | ALL  | NULL          | NULL | NULL    | NULL | 37778749 | Using where with pushed condition |
+----+-------------+------------+------+---------------+------+---------+------+----------+-----------------------------------+
1 row in set (0.00 sec)


mysql> SELECT count(DISTINCT uuid) as vnum  FROM visit_info WHERE (1=1) and usertype in (-1) and (begintime >= 1380556800 and begintime <= 1387209600);


ERROR 9 (HY000): Brighthouse out of resources error: Insufficient memory/disk space

mysql> SELECT count( uuid) as vnum  FROM visit_info WHERE (1=1) and usertype in (-1) and (begintime >= 1380556800 and begintime <= 1387209600);
+----------+
| vnum     |
+----------+
| 37196097 |
+----------+
1 row in set (2.82 sec)


Distinct實現原理:

在數據庫的設計中,如何實現Distinct操作呢?一般有兩種基本思路:排序(Sort)法,哈希(Hash)法。

排序法將表格中的數據全部按照distinct指定的列爲key進行排序,然後逐行迭代,
每迭代出一行數據都與上一行數據根據key作對比,如果相同,則丟棄當前行繼續迭代下一行,
如果不同則輸出。排序法帶來的一個副作用就是數據輸出按照key有序。


哈希法將表格中的數據全部按照distinct指定的列值爲key作爲hash key進行分桶,key相同的行自然就被區分出來了。

排序法在具體實現中會遇到這麼一些問題:

1. 數據集超出了內存限制,如何排序?

2. 如何實現可以儘可能減少數據拷貝?

3. 如果已經有了Sort運算符,如何實現代碼重用。

問題:上面兩種方法在內存佔用上那個更省?


最後說個題外話,distinct跟groupby蠻像的,那麼他們的區別又在哪裏呢? 簡單地說,distinct是一種很弱的groupby。詳細見網上轉載的一篇文章:


參考博客:

http://blog.csdn.net/maray/article/details/7634543


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章