A.COLUMN LIKE B.COLUMN% 關聯的優化方法

現在有個SQL要跑10秒:

SQL> select a0.id,
  2         a1.room_no,
  3         a1.user_name,
  4         a1.user_no,
  5         row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
  6    from vid_attachment a0
  7   inner join vid_room_log a1
  8      on a0.file_name like a1.room_md5 || '%'
  9   where a0.room_no is null
 10     and a1.room_md5 is not null;

未選定行

已用時間:  00: 00: 10.53

執行計劃
----------------------------------------------------------
Plan hash value: 374412539

----------------------------------------------------------------------------------------------
| Id  | Operation           | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                |   728K|   146M|       |   116K  (1)| 00:23:16 |
|   1 |  WINDOW SORT        |                |   728K|   146M|   162M|   116K  (1)| 00:23:16 |
|   2 |   NESTED LOOPS      |                |   728K|   146M|       | 82835   (1)| 00:16:35 |
|*  3 |    TABLE ACCESS FULL| VID_ATTACHMENT |   592 | 74000 |       |   384   (1)| 00:00:05 |
|*  4 |    TABLE ACCESS FULL| VID_ROOM_LOG   |  1231 |   103K|       |   139   (0)| 00:00:02 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("A0"."ROOM_NO" IS NULL)
   4 - filter("A1"."ROOM_MD5" IS NOT NULL AND "A0"."FILE_NAME" LIKE
              "A1"."ROOM_MD5"||'%')


統計信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     305333  consistent gets
       1320  physical reads
          0  redo size
        524  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          0  rows processed

這個SQL兩個表關聯條件是a0.file_name like a1.room_md5 || '%'

LIKE,INSERT,SUBSTR 等變長模糊匹配,只能走NL,不能走HASH

執行計劃中,ID=3 VID_ATTACHMENT過濾之後剩下30091條數據:

SQL> select count(*) from VID_ATTACHMENT where room_no is not null;

  COUNT(*)
----------
     30091

VID_ROOM_LOG 是NL被驅動表,它走的是全表掃描,要被掃描30091次,這就是爲啥SQL要跑10秒鐘

現在將SQL等價改寫:

SQL> select a0.id,
  2         a1.room_no,
  3         a1.user_name,
  4         a1.user_no,
  5         row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
  6    from (select a.*, b.min_len
  7            from vid_attachment a,
  8                 (select min(length(room_md5)) min_len from vid_room_log) b) a0
  9   inner join (select a.*, min(length(room_md5)) over() min_len
 10                 from vid_room_log a) a1
 11      on a0.file_name like a1.room_md5 || '%'
 12     and substr(a0.file_name, 1, a0.min_len) =
 13         substr(a1.room_md5, 1, a1.min_len)
 14   where a0.room_no is null
 15     and a1.room_md5 is not null;

未選定行

已用時間:  00: 00: 00.07

執行計劃
----------------------------------------------------------
Plan hash value: 413666598

----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                |  7288 |  2142K|       |  1053   (1)| 00:00:13 |
|   1 |  WINDOW SORT              |                |  7288 |  2142K|  2344K|  1053   (1)| 00:00:13 |
|*  2 |   HASH JOIN               |                |  7288 |  2142K|       |   577   (1)| 00:00:07 |
|   3 |    NESTED LOOPS           |                |   592 | 81696 |       |   435   (1)| 00:00:06 |
|   4 |     VIEW                  |                |     1 |    13 |       |    51   (0)| 00:00:01 |
|   5 |      SORT AGGREGATE       |                |     1 |    39 |       |            |          |
|   6 |       INDEX FAST FULL SCAN| IDX_ROOMMD5    | 24623 |   937K|       |    51   (0)| 00:00:01 |
|*  7 |     TABLE ACCESS FULL     | VID_ATTACHMENT |   592 | 74000 |       |   384   (1)| 00:00:05 |
|*  8 |    VIEW                   |                | 24623 |  3919K|       |   141   (1)| 00:00:02 |
|   9 |     WINDOW BUFFER         |                | 24623 |  2067K|       |   141   (1)| 00:00:02 |
|  10 |      TABLE ACCESS FULL    | VID_ROOM_LOG   | 24623 |  2067K|       |   141   (1)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(SUBSTR("A"."FILE_NAME",1,INTERNAL_FUNCTION("B"."MIN_LEN"))=SUBSTR("A1"."ROOM_M
              D5",1,INTERNAL_FUNCTION("A1"."MIN_LEN")))
       filter("A"."FILE_NAME" LIKE "A1"."ROOM_MD5"||'%')
   7 - filter("A"."ROOM_NO" IS NULL)
   8 - filter("A1"."ROOM_MD5" IS NOT NULL)


統計信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2017  consistent gets
          0  physical reads
          0  redo size
        524  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          0  rows processed

在原有的關聯條件a0.file_name like a1.room_md5 || '%' 上面 再加上 

substr(a0.file_name, 1, a0.min_len) =substr(a1.room_md5, 1, a1.min_len)

讓兩個表可以走HASH,SQL就可以秒殺了

如果SQL是:

select a0.id,
       a1.room_no,
       a1.user_name,
       a1.user_no,
       row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
  from vid_attachment a0
 inner join vid_room_log a1
    on a0.file_name like  '%' || a1.room_md5 || '%'
 where a0.room_no is null
   and a1.room_md5 is not null;
   

這種情況無解,無法優化

最後我想說的是,關係型數據庫本質就是讓你來=值關聯的,不是讓你來模糊關聯的,表設計的時候就應該杜絕模糊關聯

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章