【Oracle index】組合索引如何選擇前導列的幾點考慮

選擇組合索引的前導列,必須根據具體的業務(SQL)寫法和列的數據分佈不同而不同,很多書或網上都說,前導列要選擇高選擇性的,但是,脫離具體的業務,這些是沒有意義的,本文就舉一些常見的例子來分析下如何正確選擇前導列,以拋磚引玉,實際應用中,有更多複雜的情況需要具體分析。

1.都是等值條件的列,誰做前導列都一樣

DROP TABLE t;
CREATE TABLE t 
AS
SELECT * FROM dba_objects;
CREATE INDEX idx1_t ON t(owner,object_id);
CREATE INDEX idx2_t ON t(object_id,owner);
BEGIN
  dbms_stats.gather_table_stats(ownname => USER,tabname => 'T',estimate_percent => 100,cascade => TRUE);
END;
 /
dingjun123@ORADB>  SELECT COUNT(DISTINCT owner),COUNT(DISTINCT object_id),COUNT(*) FROM t;
COUNT(DISTINCTOWNER) COUNT(DISTINCTOBJECT_ID)   COUNT(*)
-------------------- ------------------------ ----------
                  33                    75250      75251
1 row selected.

owner有33個不同的值,object_id有75250,顯然object_id的選擇性更好。但是下面的查詢,應用idx1_t與idx2_t的性能一樣(COST與CONSISTENT GETS一樣)。

dingjun123@ORADB>  SELECT/*+index(t idx1_t)*/ * FROM t
  2   WHERE owner='DINGJUN123' AND object_id=75677;
1 row selected.
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2071967826
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    97 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |     1 |    97 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX1_T |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID"=75677)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
       1403  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


dingjun123@ORADB>   SELECT/*+index(t idx2_t)*/ * FROM t
  2   WHERE owner='DINGJUN123' AND object_id=75677;
1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3787301248
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    97 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |     1 |    97 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX2_T |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_ID"=75677 AND "OWNER"='DINGJUN123')
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
       1403  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


      雖然如此,但是要記住,這個索引不是僅僅給這2條SQL使用的,事實上可能我們有的查詢謂詞只有owner或object_id,這時候得考慮使用owner作爲前導列還是使用object_id作爲前導列。
    還有其他引用owner,object_id的情況,比如GROUP BY ,ORDER BY,甚至SELECT...都需要進行整體的分析,這樣才能建立最佳的索引。
 
2.有的列是大於(等於)或小於(等於)或者是like 模糊匹配等不等條件,有的列是等值的條件,等值的一般作爲前導列更好
--做5次,增加幾十萬行SYS的進去

INSERT INTO t SELECT * FROM t WHERE owner='SYS';
COMMIT;
--重新收集統計信息(省略)
dingjun123@ORADB> SELECT * FROM t
  2  WHERE owner='DINGJUN123'
  3  AND object_id>=107889;
1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 2071967826
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    96 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |     1 |    96 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX1_T |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID">=107889 AND "OBJECT_ID" IS
              NOT NULL)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       1399  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

      上面的SQL走idx1_t,注意觀察謂詞,只有access,說明索引完全被利用上,很顯然因爲owner是前導列,而且是等值查詢,按照前導列查詢,然後只要分析索引的第2列object_id,當發現不滿足條件object_id>=107889之後就停止了,索引掃描沒有浪費。

dingjun123@ORADB> SELECT/*+index(t idx2_t)*/ * FROM t
  2  WHERE owner='DINGJUN123'
  3  AND object_id>=107889;
1 row selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3787301248
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |    96 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |     1 |    96 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX2_T |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_ID">=107889 AND "OWNER"='DINGJUN123' AND "OBJECT_ID" IS
              NOT NULL)
       filter("OWNER"='DINGJUN123')
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       1399  bytes sent via SQL*Net to client
        416  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

      強制使用idx2_t,object_id是前導列,謂詞有access,還有filter,說明索引沒有被完全利用上,這是因爲object_id的不是等值查詢,滿足object_id>=107889的,按照順序搜索
所以,中間可能有一些不滿足owner='DINGJUN123'的,還要filter掉。
這種查詢和不等值條件作爲前導列的查詢,一旦object_id>=107889不滿足owner='DINGJUN123'的很多,那麼必然造成過多不必要的索引搜索,COST與邏輯讀會上升很快,
從而性能急劇下降,因爲本例子基本都滿足owner條件,所以沒有啥浪費。但是下面的例子

dingjun123@ORADB> SELECT * FROM t
  2  WHERE owner='DINGJUN123'
  3  AND object_id>=100;
2540 rows selected.
Elapsed: 00:00:00.15
Execution Plan
----------------------------------------------------------
Plan hash value: 2071967826
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  2539 |   238K|   499   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |  2539 |   238K|   499   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | IDX1_T |  2539 |       |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER"='DINGJUN123' AND "OBJECT_ID">=100 AND "OBJECT_ID" IS
              NOT NULL)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        527  consistent gets
         21  physical reads
          0  redo size
     268134  bytes sent via SQL*Net to client
       2275  bytes received via SQL*Net from client
        171  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2540  rows processed


還是使用idx1_t,沒有問題。看下面的強制使用idx2_t的。

dingjun123@ORADB> SELECT/*+index(t idx2_t)*/ * FROM t
  2  WHERE owner='DINGJUN123'
  3  AND object_id>=100;

2540 rows selected.
Elapsed: 00:00:00.33
Execution Plan
----------------------------------------------------------
Plan hash value: 3787301248
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  2539 |   238K|  3762   (1)| 00:00:46 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |  2539 |   238K|  3762   (1)| 00:00:46 |
|*  2 |   INDEX RANGE SCAN          | IDX2_T |  2539 |       |  3274   (1)| 00:00:40 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_ID">=100 AND "OWNER"='DINGJUN123' AND "OBJECT_ID" IS
              NOT NULL)
       filter("OWNER"='DINGJUN123')
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3763  consistent gets
          0  physical reads
          0  redo size
     268134  bytes sent via SQL*Net to client
       2275  bytes received via SQL*Net from client
        171  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2540  rows processed

dingjun123@ORADB> SELECT COUNT(*) FROM t WHERE object_id >= 100;
  COUNT(*)
----------
   1032649


 SELECT COUNT(*) FROM t WHERE object_id >= 100;
返回1032649行,但是WHERE owner='DINGJUN123' AND object_id>=100 只返回2540行,要filter掉百萬行,輪詢索引,造成了極大的浪費。


3.如果都是比較,都是<,>之類的表達式
  這種情況,前導列,根據謂詞,選擇條件能夠定位最接近處理結果的基數,並能夠減少索引後filter的工作
,因爲必然有一列是要走access之後的filter,最好是filter能夠過濾較少數據,不要做過多過濾。
  
例如:
dingjun123@ORADB> SELECT * FROM t
  2  WHERE owner>='DINGJUN123'
  3  AND object_id>=107872;

37 rows selected.
Elapsed: 00:00:00.00
Execution Plan
----------------------------------------------------------
Plan hash value: 3787301248
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   205 | 19680 |    43   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |   205 | 19680 |    43   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX2_T |   205 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_ID">=107872 AND "OWNER">='DINGJUN123' AND "OBJECT_ID"
              IS NOT NULL)
       filter("OWNER">='DINGJUN123')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
       6468  bytes sent via SQL*Net to client
        438  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         37  rows processed

dingjun123@ORADB> SELECT COUNT(*) FROM t WHERE object_id>=107872;
  COUNT(*)
----------
        37
1 row selected.


  關閉index SKIP SCAN,因爲owner種類很少,oracle選擇skip SCAN
 alter session set "_optimizer_skip_scan_enabled" = false;
 
dingjun123@ORADB> SELECT/*+index(t idx1_t)*/ * FROM t
  2  WHERE owner>='DINGJUN123'
  3  AND object_id>=107872;
37 rows selected.
Elapsed: 00:00:00.23
Execution Plan
----------------------------------------------------------
Plan hash value: 2071967826
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   205 | 19680 |  3740   (1)| 00:00:45 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |   205 | 19680 |  3740   (1)| 00:00:45 |
|*  2 |   INDEX RANGE SCAN          | IDX1_T |   205 |       |  3700   (1)| 00:00:45 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OWNER">='DINGJUN123' AND "OBJECT_ID">=107872 AND "OWNER" IS
              NOT NULL)
       filter("OBJECT_ID">=107872)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3688  consistent gets
          0  physical reads
          0  redo size
       6468  bytes sent via SQL*Net to client
        438  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         37  rows processed


因爲owner>='DINGJUN123'返回大量行,但是事實結果很少,只有幾十行,過濾object_id>=107872,需要做大量工作,邏輯讀和COST增大千倍+,性能低下。


後記:
     當然如何選擇前導列的順序很複雜,得全盤考慮對應的謂詞,SELECT的列等要素,還要考慮ORDER BY ,GROUP BY等列,比如3列組合索引,如何考慮順序。
     後續會補充更多的組合索引如何創建的要點。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章