【Oracle index】組合索引如何選擇前導列的幾點考慮

選擇組合索引的前導列，必須根據具體的業務（SQL）寫法和列的數據分佈不同而不同，很多書或網上都說，前導列要選擇高選擇性的，但是，脫離具體的業務，這些是沒有意義的，本文就舉一些常見的例子來分析下如何正確選擇前導列,以拋磚引玉，實際應用中，有更多複雜的情況需要具體分析。

1.都是等值條件的列，誰做前導列都一樣

DROP TABLE t;
CREATE TABLE t
AS
SELECT * FROM dba_objects;
CREATE INDEX idx1_t ON t(owner,object_id);
CREATE INDEX idx2_t ON t(object_id,owner);
BEGIN
dbms_stats.gather_table_stats(ownname => USER,tabname => 'T',estimate_percent => 100,cascade => TRUE);
END;
/
dingjun123@ORADB> SELECT COUNT(DISTINCT owner),COUNT(DISTINCT object_id),COUNT(*) FROM t;
COUNT(DISTINCTOWNER) COUNT(DISTINCTOBJECT_ID) COUNT(*)
-------------------- ------------------------ ----------
33 75250 75251
1 row selected.

owner有33個不同的值，object_id有75250,顯然object_id的選擇性更好。但是下面的查詢，應用idx1_t與idx2_t的性能一樣(COST與CONSISTENT GETS一樣）。

dingjun123@ORADB> SELECT/*+index(t idx1_t)*/ * FROM t
2 WHERE owner='DINGJUN123' AND object_id=75677;
1 row selected.
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2071967826
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 97 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 97 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX1_T | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID"=75677)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1403 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

dingjun123@ORADB> SELECT/*+index(t idx2_t)*/ * FROM t
2 WHERE owner='DINGJUN123' AND object_id=75677;
1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3787301248
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 97 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 97 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX2_T | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=75677 AND "OWNER"='DINGJUN123')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1403 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

雖然如此，但是要記住，這個索引不是僅僅給這2條SQL使用的，事實上可能我們有的查詢謂詞只有owner或object_id,這時候得考慮使用owner作爲前導列還是使用object_id作爲前導列。
還有其他引用owner,object_id的情況，比如GROUP BY ,ORDER BY,甚至SELECT...都需要進行整體的分析，這樣才能建立最佳的索引。

2.有的列是大於(等於)或小於(等於）或者是like 模糊匹配等不等條件，有的列是等值的條件，等值的一般作爲前導列更好
--做5次，增加幾十萬行SYS的進去
INSERT INTO t SELECT * FROM t WHERE owner='SYS';
COMMIT;
--重新收集統計信息（省略）

dingjun123@ORADB> SELECT * FROM t
2 WHERE owner='DINGJUN123'
3 AND object_id>=107889;
1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 2071967826
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 96 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 96 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX1_T | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID">=107889 AND "OBJECT_ID" IS
NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
1399 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

上面的SQL走idx1_t,注意觀察謂詞，只有access,說明索引完全被利用上，很顯然因爲owner是前導列，而且是等值查詢，按照前導列查詢，然後只要分析索引的第2列object_id,當發現不滿足條件object_id>=107889之後就停止了，索引掃描沒有浪費。

dingjun123@ORADB> SELECT/*+index(t idx2_t)*/ * FROM t
2 WHERE owner='DINGJUN123'
3 AND object_id>=107889;
1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3787301248
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 96 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 96 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX2_T | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID">=107889 AND "OWNER"='DINGJUN123' AND "OBJECT_ID" IS
NOT NULL)
filter("OWNER"='DINGJUN123')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
1399 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

強制使用idx2_t,object_id是前導列，謂詞有access,還有filter,說明索引沒有被完全利用上，這是因爲object_id的不是等值查詢，滿足object_id>=107889的，按照順序搜索
所以，中間可能有一些不滿足owner='DINGJUN123'的，還要filter掉。
這種查詢和不等值條件作爲前導列的查詢，一旦object_id>=107889不滿足owner='DINGJUN123'的很多，那麼必然造成過多不必要的索引搜索，COST與邏輯讀會上升很快，
從而性能急劇下降，因爲本例子基本都滿足owner條件，所以沒有啥浪費。但是下面的例子：

dingjun123@ORADB> SELECT * FROM t
2 WHERE owner='DINGJUN123'
3 AND object_id>=100;
2540 rows selected.
Elapsed: 00:00:00.15
Execution Plan
----------------------------------------------------------
Plan hash value: 2071967826
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2539 | 238K| 499 (0)| 00:00:06 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 2539 | 238K| 499 (0)| 00:00:06 |
|* 2 | INDEX RANGE SCAN | IDX1_T | 2539 | | 12 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID">=100 AND "OBJECT_ID" IS
NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
527 consistent gets
21 physical reads
0 redo size
268134 bytes sent via SQL*Net to client
2275 bytes received via SQL*Net from client
171 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2540 rows processed

還是使用idx1_t,沒有問題。看下面的強制使用idx2_t的。

dingjun123@ORADB> SELECT/*+index(t idx2_t)*/ * FROM t
2 WHERE owner='DINGJUN123'
3 AND object_id>=100;

2540 rows selected.
Elapsed: 00:00:00.33
Execution Plan
----------------------------------------------------------
Plan hash value: 3787301248
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2539 | 238K| 3762 (1)| 00:00:46 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 2539 | 238K| 3762 (1)| 00:00:46 |
|* 2 | INDEX RANGE SCAN | IDX2_T | 2539 | | 3274 (1)| 00:00:40 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID">=100 AND "OWNER"='DINGJUN123' AND "OBJECT_ID" IS
NOT NULL)
filter("OWNER"='DINGJUN123')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3763 consistent gets
0 physical reads
0 redo size
268134 bytes sent via SQL*Net to client
2275 bytes received via SQL*Net from client
171 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2540 rows processed

dingjun123@ORADB> SELECT COUNT(*) FROM t WHERE object_id >= 100;
COUNT(*)
----------
1032649

SELECT COUNT(*) FROM t WHERE object_id >= 100;
返回1032649行，但是WHERE owner='DINGJUN123' AND object_id>=100 只返回2540行，要filter掉百萬行，輪詢索引，造成了極大的浪費。

3.如果都是比較，都是<,>之類的表達式
這種情況，前導列，根據謂詞，選擇條件能夠定位最接近處理結果的基數，並能夠減少索引後filter的工作，因爲必然有一列是要走access之後的filter,最好是filter能夠過濾較少數據，不要做過多過濾。

例如：

dingjun123@ORADB> SELECT * FROM t
2 WHERE owner>='DINGJUN123'
3 AND object_id>=107872;

37 rows selected.
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 3787301248
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 205 | 19680 | 43 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 205 | 19680 | 43 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX2_T | 205 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID">=107872 AND "OWNER">='DINGJUN123' AND "OBJECT_ID"
IS NOT NULL)
filter("OWNER">='DINGJUN123')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
14 consistent gets
0 physical reads
0 redo size
6468 bytes sent via SQL*Net to client
438 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
37 rows processed

dingjun123@ORADB> SELECT COUNT(*) FROM t WHERE object_id>=107872;
COUNT(*)
----------
37
1 row selected.

關閉index SKIP SCAN,因爲owner種類很少，oracle選擇skip SCAN
alter session set "_optimizer_skip_scan_enabled" = false;

dingjun123@ORADB> SELECT/*+index(t idx1_t)*/ * FROM t
2 WHERE owner>='DINGJUN123'
3 AND object_id>=107872;
37 rows selected.
Elapsed: 00:00:00.23
Execution Plan
----------------------------------------------------------
Plan hash value: 2071967826
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 205 | 19680 | 3740 (1)| 00:00:45 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 205 | 19680 | 3740 (1)| 00:00:45 |
|* 2 | INDEX RANGE SCAN | IDX1_T | 205 | | 3700 (1)| 00:00:45 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER">='DINGJUN123' AND "OBJECT_ID">=107872 AND "OWNER" IS
NOT NULL)
filter("OBJECT_ID">=107872)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3688 consistent gets
0 physical reads
0 redo size
6468 bytes sent via SQL*Net to client
438 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
37 rows processed

因爲owner>='DINGJUN123'返回大量行，但是事實結果很少，只有幾十行，過濾object_id>=107872,需要做大量工作，邏輯讀和COST增大千倍+，性能低下。

後記：
當然如何選擇前導列的順序很複雜，得全盤考慮對應的謂詞，SELECT的列等要素，還要考慮ORDER BY ,GROUP BY等列，比如3列組合索引，如何考慮順序。
後續會補充更多的組合索引如何創建的要點。

【Oracle index】組合索引如何選擇前導列的幾點考慮

Wireshark 安裝+使用（一）

博客園商業化之路-衆包平臺：繼續召集早期合作開發者

歸檔日誌路徑三個參數DB_RECOVERY_FILE_DEST和LOG_ARCHIVE_DEST和LOG_ARCHIVE_DEST_n 2012-06

Linux操作系統實用技巧精彩彙集

【Oracle index】組合索引如何選擇前導列的幾點考慮

library cache pin/lock的簡單解決辦法

oracle 表連接 - hash join 哈希連接

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結