A.COLUMN LIKE B.COLUMN% 關聯的優化方法

現在有個SQL要跑10秒：

SQL> select a0.id,
  2         a1.room_no,
  3         a1.user_name,
  4         a1.user_no,
  5         row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
  6    from vid_attachment a0
  7   inner join vid_room_log a1
  8      on a0.file_name like a1.room_md5 || '%'
  9   where a0.room_no is null
 10     and a1.room_md5 is not null;

未選定行

已用時間:  00: 00: 10.53

執行計劃
----------------------------------------------------------
Plan hash value: 374412539

----------------------------------------------------------------------------------------------
| Id  | Operation           | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                |   728K|   146M|       |   116K  (1)| 00:23:16 |
|   1 |  WINDOW SORT        |                |   728K|   146M|   162M|   116K  (1)| 00:23:16 |
|   2 |   NESTED LOOPS      |                |   728K|   146M|       | 82835   (1)| 00:16:35 |
|*  3 |    TABLE ACCESS FULL| VID_ATTACHMENT |   592 | 74000 |       |   384   (1)| 00:00:05 |
|*  4 |    TABLE ACCESS FULL| VID_ROOM_LOG   |  1231 |   103K|       |   139   (0)| 00:00:02 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("A0"."ROOM_NO" IS NULL)
   4 - filter("A1"."ROOM_MD5" IS NOT NULL AND "A0"."FILE_NAME" LIKE
              "A1"."ROOM_MD5"||'%')


統計信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     305333  consistent gets
       1320  physical reads
          0  redo size
        524  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          0  rows processed

這個SQL兩個表關聯條件是a0.file_name like a1.room_md5 || '%'

LIKE，INSERT，SUBSTR 等變長模糊匹配，只能走NL，不能走HASH

執行計劃中，ID=3 VID_ATTACHMENT過濾之後剩下30091條數據：

SQL> select count(*) from VID_ATTACHMENT where room_no is not null;

COUNT(*)
----------
30091

VID_ROOM_LOG 是NL被驅動表，它走的是全表掃描，要被掃描30091次，這就是爲啥SQL要跑10秒鐘

現在將SQL等價改寫：

SQL> select a0.id,
  2         a1.room_no,
  3         a1.user_name,
  4         a1.user_no,
  5         row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
  6    from (select a.*, b.min_len
  7            from vid_attachment a,
  8                 (select min(length(room_md5)) min_len from vid_room_log) b) a0
  9   inner join (select a.*, min(length(room_md5)) over() min_len
 10                 from vid_room_log a) a1
 11      on a0.file_name like a1.room_md5 || '%'
 12     and substr(a0.file_name, 1, a0.min_len) =
 13         substr(a1.room_md5, 1, a1.min_len)
 14   where a0.room_no is null
 15     and a1.room_md5 is not null;

未選定行

已用時間:  00: 00: 00.07

執行計劃
----------------------------------------------------------
Plan hash value: 413666598

----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                |  7288 |  2142K|       |  1053   (1)| 00:00:13 |
|   1 |  WINDOW SORT              |                |  7288 |  2142K|  2344K|  1053   (1)| 00:00:13 |
|*  2 |   HASH JOIN               |                |  7288 |  2142K|       |   577   (1)| 00:00:07 |
|   3 |    NESTED LOOPS           |                |   592 | 81696 |       |   435   (1)| 00:00:06 |
|   4 |     VIEW                  |                |     1 |    13 |       |    51   (0)| 00:00:01 |
|   5 |      SORT AGGREGATE       |                |     1 |    39 |       |            |          |
|   6 |       INDEX FAST FULL SCAN| IDX_ROOMMD5    | 24623 |   937K|       |    51   (0)| 00:00:01 |
|*  7 |     TABLE ACCESS FULL     | VID_ATTACHMENT |   592 | 74000 |       |   384   (1)| 00:00:05 |
|*  8 |    VIEW                   |                | 24623 |  3919K|       |   141   (1)| 00:00:02 |
|   9 |     WINDOW BUFFER         |                | 24623 |  2067K|       |   141   (1)| 00:00:02 |
|  10 |      TABLE ACCESS FULL    | VID_ROOM_LOG   | 24623 |  2067K|       |   141   (1)| 00:00:02 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(SUBSTR("A"."FILE_NAME",1,INTERNAL_FUNCTION("B"."MIN_LEN"))=SUBSTR("A1"."ROOM_M
              D5",1,INTERNAL_FUNCTION("A1"."MIN_LEN")))
       filter("A"."FILE_NAME" LIKE "A1"."ROOM_MD5"||'%')
   7 - filter("A"."ROOM_NO" IS NULL)
   8 - filter("A1"."ROOM_MD5" IS NOT NULL)


統計信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2017  consistent gets
          0  physical reads
          0  redo size
        524  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          0  rows processed

在原有的關聯條件a0.file_name like a1.room_md5 || '%' 上面再加上

substr(a0.file_name, 1, a0.min_len) =substr(a1.room_md5, 1, a1.min_len)

讓兩個表可以走HASH，SQL就可以秒殺了

如果SQL是：

select a0.id,
a1.room_no,
a1.user_name,
a1.user_no,
row_number() over(partition by a0.id order by a1.room_enter_time desc) as fn
from vid_attachment a0
inner join vid_room_log a1
on a0.file_name like '%' || a1.room_md5 || '%'
where a0.room_no is null
and a1.room_md5 is not null;

這種情況無解，無法優化

最後我想說的是，關係型數據庫本質就是讓你來=值關聯的，不是讓你來模糊關聯的，表設計的時候就應該杜絕模糊關聯

A.COLUMN LIKE B.COLUMN% 關聯的優化方法

.Net 8.0 下的新RPC，IceRPC之試試的新玩法"打洞"

關於遊戲付費的一點想法

我通過CKA和CKS啦！

《最新出爐》系列入門篇-Python+Playwright自動化測試-42-強大的可視化追蹤利器Trace Viewer

大數據怎麼學？對大數據開發領域及崗位的詳細解讀，完整理解大數據開發領域技術體系

利用Python多進程並行執行加快MySQL批量UPDATE執行速度

A.COLUMN LIKE B.COLUMN% 關聯的優化方法

MySQL根據主鍵切割大事務(變相ROWID切片)

MySQL8.0.19 MGR MySQL router MySQL connector failover 組合實現高可用

抓出Oralce當前賬戶下所有表建表語句，遷移到MySQL

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結