http://blog.csdn.net/tswisdom/article/details/7396826
注:low-cardinality是指該列或者列的組合具有的不同值的個數較少,即該列有很多重複值。high-cardinality是指該列或者列的組合具有不同的值的個數較多,即該列有很少的重複值。
理解每種索引的適用場合將對性能產生重大影響。
傳統觀念認爲位圖索引最適用於擁有很少不同值的列 ---- 例如GENDER, MARITAL_STATUS,和RELATION。但是,這種假設是不準確的。實際上,對於大多非頻繁更新的併發系統,位圖索引也是適用的。事實上,下面將會展示,對以一個具有100%唯一值的列(主鍵的候選列)來說,位圖索引和B樹索引一樣有效。
本文將提供一些例子以及優化建議,它們對於low-cardinality和high-cardinality上的兩種索引都是通用的。這些例子將幫助DBA理解位圖索引不是依賴於cardinality而是依賴於程序自身。
索引比較
在唯一列上使用位圖索引有一些缺點 --- 其中一個是需要足夠的空間(Oracle不推薦使用)。但是,位圖索引的大小不但與位圖索引列的cardinality有關,還與數據的分佈有關。因此,GENDER列上的位圖索引比B樹索引要小,相反,EMPNO上的位圖索引比B樹索引大的多。但是,相對於OLTP系統來說,決策支持系統只有很少的用戶訪問,因而對於這些系統,資源不是問題。
爲了闡明這個觀點,我創建了兩個表,test_normal和test_random。用PL/SQL塊在test_normal中插入100萬條記錄,然後在test_random表中隨機插入相同的記錄。
- Create table test_normal (empno number(10), ename varchar2(30), sal number(10));
- Begin
- For i in 1..1000000
- Loop
- Insert into test_normal
- values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
- If mod(i, 10000) = 0 then
- Commit;
- End if;
- End loop;
- End;
- /
- Create table test_random
- as
- select /*+ append */ * from test_normal order by dbms_random.random;
- SQL> select count(*) "Total Rows" from test_normal;
- Total Rows
- ----------
- 1000000
- Elapsed: 00:00:01.09
- SQL> select count(distinct empno) "Distinct Values" from test_normal;
- Distinct Values
- ---------------
- 1000000
- Elapsed: 00:00:06.09
- SQL> select count(*) "Total Rows" from test_random;
- Total Rows
- ----------
- 1000000
- Elapsed: 00:00:03.05
- SQL> select count(distinct empno) "Distinct Values" from test_random;
- Distinct Values
- ---------------
- 1000000
- Elapsed: 00:00:12.07
- Create table test_normal (empno number(10), ename varchar2(30), sal number(10));
- Begin
- For i in 1..1000000
- Loop
- Insert into test_normal
- values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
- If mod(i, 10000) = 0 then
- Commit;
- End if;
- End loop;
- End;
- /
- Create table test_random
- as
- select /*+ append */ * from test_normal order by dbms_random.random;
- SQL> select count(*) "Total Rows" from test_normal;
- Total Rows
- ----------
- 1000000
- Elapsed: 00:00:01.09
- SQL> select count(distinct empno) "Distinct Values" from test_normal;
- Distinct Values
- ---------------
- 1000000
- Elapsed: 00:00:06.09
- SQL> select count(*) "Total Rows" from test_random;
- Total Rows
- ----------
- 1000000
- Elapsed: 00:00:03.05
- SQL> select count(distinct empno) "Distinct Values" from test_random;
- Distinct Values
- ---------------
- 1000000
- Elapsed: 00:00:12.07
注意,test_normal表是組織良好的,test_random表是隨機創建的,因此,其中的數據是無組織的。在上面的表中,EMPNO列上的值完全不同,因此可以作爲候選主鍵。如果你把該列定義爲主鍵,oracle將會建立一個B樹索引,因爲Oracle不支持主鍵位圖索引。
爲了分析這些索引的行爲,我們執行下面的步驟:
- 在表test_normal上:
- 在EMPNO列上建立一個位圖索引,並執行一些相等性查詢。
- 在EMPNO列上建立一個B樹索引,執行一些相等性查詢,並且比較獲得不同結果集所執行的查詢需要的物理I/O和邏輯I/O的次數。
- 在表test_random表上:
- 和1.1相同的步驟
- 和1.2相同的步驟
- 在表test_normal上:
- 和1.1相同的步驟,但是執行範圍查詢。
- 和1.2相同的步驟,但是執行範圍查詢。比較統計結果。
- 在表test_random表上:
- 和3.1相同的步驟。
- 和3.2相同的步驟
- 在表test_normal上:
- 在SAL列上建立一個位圖索引,並且執行一些相等性查詢和範圍查詢。
- 在SAL列上建立一個B樹索引,並且執行一些相等性查詢和範圍查詢(和5.1相同的結果集),比較獲取結果執行的I/O次數。
- 在兩個表中添加GENDER列,並且把該列更新爲3個可能的值:M(女性), F(男性), null(未知)。根據一些條件更新該列的值。
- 在該列上建立一個位圖索引並且執行一些相等性查詢。
- 在GENDER列上建立一個B樹索引並且執行一些相等性查詢,和步驟7的結果比較。
步驟1到4涉及一個high-cardinality列(完全不同),步驟5是一個normal-cardinality列,步驟7和8是一個low-cardinality列。
步驟1.1(在表test_normal上)
在該步中,我們在表test_normal上建立一個位圖索引,然後檢查索引的大小、聚簇因子(clustering factor)和表的大小。然後執行一些相等性查詢並且查看使用位圖索引時查詢需要的I/O次數。
- SQL> create bitmap index normal_empno_bmx on test_normal(empno);
- Index created.
- Elapsed: 00:00:29.06
- SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- Elapsed: 00:00:19.01
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX');
- SEGMENT_NAME Size in MB
- ------------------------------------ ---------------
- TEST_NORMAL 50
- NORMAL_EMPNO_BMX 28
- Elapsed: 00:00:02.00
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ------------------------------ ---------------------------------
- NORMAL_EMPNO_BMX 1000000
- Elapsed: 00:00:00.00
- SQL> create bitmap index normal_empno_bmx on test_normal(empno);
- Index created.
- Elapsed: 00:00:29.06
- SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- Elapsed: 00:00:19.01
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX');
- SEGMENT_NAME Size in MB
- ------------------------------------ ---------------
- TEST_NORMAL 50
- NORMAL_EMPNO_BMX 28
- Elapsed: 00:00:02.00
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ------------------------------ ---------------------------------
- NORMAL_EMPNO_BMX 1000000
- Elapsed: 00:00:00.00
可以看到,表上索引的大小是28M並且聚簇因子的大小等於表中的行數。現在我們爲不同的結果集執行一些相等性查詢:
- SQL> set autotrace only
- SQL> select * from test_normal where empno=&empno;
- Enter value for empno: 1000
- old 1: select * from test_normal where empno=&empno
- new 1: select * from test_normal where empno=1000
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
- d=1 Bytes=34)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 5 consistent gets
- 0 physical reads
- 0 redo size
- 515 bytes sent via SQL*Net to client
- 499 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
- SQL> set autotrace only
- SQL> select * from test_normal where empno=&empno;
- Enter value for empno: 1000
- old 1: select * from test_normal where empno=&empno
- new 1: select * from test_normal where empno=1000
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
- d=1 Bytes=34)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 5 consistent gets
- 0 physical reads
- 0 redo size
- 515 bytes sent via SQL*Net to client
- 499 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
步驟1.2(在表test_normal上)
現在刪除表中EMPNO列上的位圖索引並創建一個B樹索引。像前面一樣我們查看索引的大小、聚簇因子的大小並且執行相同的查詢,比較I/O的次數。
- SQL> drop index NORMAL_EMPNO_BMX;
- Index dropped.
- SQL> create index normal_empno_idx on test_normal(empno);
- Index created.
- SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');
- SEGMENT_NAME Size in MB
- ---------------------------------- ---------------
- TEST_NORMAL 50
- NORMAL_EMPNO_IDX 18
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ---------------------------------- ----------------------------------
- NORMAL_EMPNO_IDX 6210
- SQL> drop index NORMAL_EMPNO_BMX;
- Index dropped.
- SQL> create index normal_empno_idx on test_normal(empno);
- Index created.
- SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');
- SEGMENT_NAME Size in MB
- ---------------------------------- ---------------
- TEST_NORMAL 50
- NORMAL_EMPNO_IDX 18
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ---------------------------------- ----------------------------------
- NORMAL_EMPNO_IDX 6210
很明顯,在該表的EMPNO列上,B樹索引比位圖索引要小。B樹索引上的聚簇因子接近於表中的數據塊數;因此B樹索引對於範圍查詢更有效。
現在,我們使用B樹索引執行相同的查詢。
- SQL> set autot trace
- SQL> select * from test_normal where empno=&empno;
- Enter value for empno: 1000
- old 1: select * from test_normal where empno=&empno
- new 1: select * from test_normal where empno=1000
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
- d=1 Bytes=34)
- 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
- ost=3 Card=1)
- Statistics
- ----------------------------------------------------------
- 29 recursive calls
- 0 db block gets
- 5 consistent gets
- 0 physical reads
- 0 redo size
- 515 bytes sent via SQL*Net to client
- 499 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
- SQL> set autot trace
- SQL> select * from test_normal where empno=&empno;
- Enter value for empno: 1000
- old 1: select * from test_normal where empno=&empno
- new 1: select * from test_normal where empno=1000
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
- d=1 Bytes=34)
- 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
- ost=3 Card=1)
- Statistics
- ----------------------------------------------------------
- 29 recursive calls
- 0 db block gets
- 5 consistent gets
- 0 physical reads
- 0 redo size
- 515 bytes sent via SQL*Net to client
- 499 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
可以看到,對於相同的結果集,在唯一列上的位圖索引和B樹索引需要相同的物理和邏輯讀取次數。
BITMAP(位圖) | EMPNO | B-TREE(B樹) | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | |
5 | 0 | 1000 | 5 | 0 |
5 | 2 | 2398 | 5 | 2 |
5 | 2 | 8545 | 5 | 2 |
5 | 2 | 98008 | 5 | 2 |
5 | 2 | 85342 | 5 | 2 |
5 | 2 | 128444 | 5 | 2 |
5 | 2 | 858 | 5 | 2 |
步驟2.1(在表test_random上)
現在,在test_random表上執行相同的操作:
- SQL> create bitmap index random_empno_bmx on test_random(empno);
- Index created.
- SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');
- SEGMENT_NAME Size in MB
- ------------------------------------ ---------------
- TEST_RANDOM 50
- RANDOM_EMPNO_BMX 28
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ------------------------------ ---------------------------------
- RANDOM_EMPNO_BMX 1000000
- SQL> create bitmap index random_empno_bmx on test_random(empno);
- Index created.
- SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');
- SEGMENT_NAME Size in MB
- ------------------------------------ ---------------
- TEST_RANDOM 50
- RANDOM_EMPNO_BMX 28
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ------------------------------ ---------------------------------
- RANDOM_EMPNO_BMX 1000000
再次,索引上的統計結果(大小和聚簇因子)和在表test_normal中是相同的:
- SQL> select * from test_random where empno=&empno;
- Enter value for empno: 1000
- old 1: select * from test_random where empno=&empno
- new 1: select * from test_random where empno=1000
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 5 consistent gets
- 0 physical reads
- 0 redo size
- 515 bytes sent via SQL*Net to client
- 499 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
- SQL> select * from test_random where empno=&empno;
- Enter value for empno: 1000
- old 1: select * from test_random where empno=&empno
- new 1: select * from test_random where empno=1000
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 5 consistent gets
- 0 physical reads
- 0 redo size
- 515 bytes sent via SQL*Net to client
- 499 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
步驟2.2(在表test_random上)
現在,和步驟1.2一樣,我們刪除EMPNO列上的位圖索引並且創建一個B樹索引。
- SQL> drop index RANDOM_EMPNO_BMX;
- Index dropped.
- SQL> create index random_empno_idx on test_random(empno);
- Index created.
- SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');
- SEGMENT_NAME Size in MB
- ---------------------------------- ---------------
- TEST_RANDOM 50
- RANDOM_EMPNO_IDX 18
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ---------------------------------- ----------------------------------
- RANDOM_EMPNO_IDX 999830
- SQL> drop index RANDOM_EMPNO_BMX;
- Index dropped.
- SQL> create index random_empno_idx on test_random(empno);
- Index created.
- SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');
- SEGMENT_NAME Size in MB
- ---------------------------------- ---------------
- TEST_RANDOM 50
- RANDOM_EMPNO_IDX 18
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ---------------------------------- ----------------------------------
- RANDOM_EMPNO_IDX 999830
該表的索引大小和表test_normal是一樣的,但是聚簇因子更接近於行數,這就使得該索引對於範圍查詢不再高效。該聚簇因子不影響相等性查詢,因爲該列的值是唯一的,每個鍵對應1行記錄。
現在,在相同的結果集上執行相等性查詢。
- SQL> select * from test_random where empno=&empno;
- Enter value for empno: 1000
- old 1: select * from test_random where empno=&empno
- new 1: select * from test_random where empno=1000
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
- 2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 5 consistent gets
- 0 physical reads
- 0 redo size
- 515 bytes sent via SQL*Net to client
- 499 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
- SQL> select * from test_random where empno=&empno;
- Enter value for empno: 1000
- old 1: select * from test_random where empno=&empno
- new 1: select * from test_random where empno=1000
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
- 2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 5 consistent gets
- 0 physical reads
- 0 redo size
- 515 bytes sent via SQL*Net to client
- 499 bytes received via SQL*Net from client
- 2 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1 rows processed
再次表明,結果和步驟1.1和1.2幾乎相同。對於唯一列來說,數據分佈不影響邏輯和物理I/O。
步驟3.1(在表test_normal上)
在該步中,我們將創建一個位圖索引。我們知道索引的聚簇因子大小和表中的行數相同。現在我們執行一些範圍查詢。
- SQL> select * from test_normal where empno between &range1 and &range2;
- Enter value for range1: 1
- Enter value for range2: 2300
- old 1: select * from test_normal where empno between &range1 and &range2
- new 1: select * from test_normal where empno between 1 and 2300
- 2300 rows selected.
- Elapsed: 00:00:00.03
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299 Bytes=78166)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 331 consistent gets
- 0 physical reads
- 0 redo size
- 111416 bytes sent via SQL*Net to client
- 2182 bytes received via SQL*Net from client
- 155 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 2300 rows processed
- SQL> select * from test_normal where empno between &range1 and &range2;
- Enter value for range1: 1
- Enter value for range2: 2300
- old 1: select * from test_normal where empno between &range1 and &range2
- new 1: select * from test_normal where empno between 1 and 2300
- 2300 rows selected.
- Elapsed: 00:00:00.03
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299 Bytes=78166)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 331 consistent gets
- 0 physical reads
- 0 redo size
- 111416 bytes sent via SQL*Net to client
- 2182 bytes received via SQL*Net from client
- 155 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 2300 rows processed
步驟3.2(在表test_normal上)
該步中,我們在test_normal的B樹索引上執行查詢。
- SQL> select * from test_normal where empno between &range1 and &range2;
- Enter value for range1: 1
- Enter value for range2: 2300
- old 1: select * from test_normal where empno between &range1 and &range2
- new 1: select * from test_normal where empno between 1 and 2300
- 2300 rows selected.
- Elapsed: 00:00:00.02
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166)
- 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 329 consistent gets
- 15 physical reads
- 0 redo size
- 111416 bytes sent via SQL*Net to client
- 2182 bytes received via SQL*Net from client
- 155 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 2300 rows processed
- SQL> select * from test_normal where empno between &range1 and &range2;
- Enter value for range1: 1
- Enter value for range2: 2300
- old 1: select * from test_normal where empno between &range1 and &range2
- new 1: select * from test_normal where empno between 1 and 2300
- 2300 rows selected.
- Elapsed: 00:00:00.02
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166)
- 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 329 consistent gets
- 15 physical reads
- 0 redo size
- 111416 bytes sent via SQL*Net to client
- 2182 bytes received via SQL*Net from client
- 155 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 2300 rows processed
在不同的範圍上執行的查詢結果如下:
BITMAP | EMPNO (Range) | B-TREE | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | |
331 | 0 | 1-2300 | 329 | 0 |
285 | 0 | 8-1980 | 283 | 0 |
346 | 19 | 1850-4250 | 344 | 16 |
427 | 31 | 28888-31850 | 424 | 28 |
371 | 27 | 82900-85478 | 367 | 23 |
2157 | 149 | 984888-1000000 | 2139 | 35 |
可以看到,在兩種索引上需要的邏輯和物理IO基本上是相同的。最後一個範圍(984888-1000000)差不多返回了15,000行,是所有範圍查詢中最大的。當我們執行全表掃描時(通過/*+ full(test_normal) */ ),物理和邏輯IO的次數是7239和5663.
步驟4.1(在表test_random上)
在該步中,我們將在表test_random的位圖索引上執行範圍查詢,在這兒,你將看到聚簇因子的影響。
- SQL>select * from test_random where empno between &range1 and &range2;
- Enter value for range1: 1
- Enter value for range2: 2300
- old 1: select * from test_random where empno between &range1 and &range2
- new 1: select * from test_random where empno between 1 and 2300
- 2300 rows selected.
- Elapsed: 00:00:08.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 2463 consistent gets
- 1200 physical reads
- 0 redo size
- 111416 bytes sent via SQL*Net to client
- 2182 bytes received via SQL*Net from client
- 155 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 2300 rows processed
- SQL>select * from test_random where empno between &range1 and &range2;
- Enter value for range1: 1
- Enter value for range2: 2300
- old 1: select * from test_random where empno between &range1 and &range2
- new 1: select * from test_random where empno between 1 and 2300
- 2300 rows selected.
- Elapsed: 00:00:08.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 2463 consistent gets
- 1200 physical reads
- 0 redo size
- 111416 bytes sent via SQL*Net to client
- 2182 bytes received via SQL*Net from client
- 155 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 2300 rows processed
步驟4.2(在表test_random上)
在該步中,我們將在test_random的B樹索引上執行範圍查詢。回想一下,該索引上的聚簇因子接近於表中記錄的行數。下面是優化器的輸出:
- SQL> select * from test_random where empno between &range1 and &range2;
- Enter value for range1: 1
- Enter value for range2: 2300
- old 1: select * from test_random where empno between &range1 and &range2
- new 1: select * from test_random where empno between 1 and 2300
- 2300 rows selected.
- Elapsed: 00:00:03.04
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 6415 consistent gets
- 4910 physical reads
- 0 redo size
- 111416 bytes sent via SQL*Net to client
- 2182 bytes received via SQL*Net from client
- 155 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 2300 rows processed
- SQL> select * from test_random where empno between &range1 and &range2;
- Enter value for range1: 1
- Enter value for range2: 2300
- old 1: select * from test_random where empno between &range1 and &range2
- new 1: select * from test_random where empno between 1 and 2300
- 2300 rows selected.
- Elapsed: 00:00:03.04
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 6415 consistent gets
- 4910 physical reads
- 0 redo size
- 111416 bytes sent via SQL*Net to client
- 2182 bytes received via SQL*Net from client
- 155 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 2300 rows processed
因爲聚簇因子的緣故,優化器選擇了全表掃描而不是使用索引:
BITMAP | EMPNO (Range) | B-TREE | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | |
2463 | 1200 | 1-2300 | 6415 | 4910 |
2114 | 31 | 8-1980 | 6389 | 4910 |
2572 | 1135 | 1850-4250 | 6418 | 4909 |
3173 | 1620 | 28888-31850 | 6456 | 4909 |
2762 | 1358 | 82900-85478 | 6431 | 4909 |
7254 | 3329 | 984888-1000000 | 7254 | 4909 |
僅對於最後一個範圍(984888-1000000),對於位圖索引優化器選擇了全表掃描。然而,對於B樹索引,全部使用全表掃描。引起這種差異的原因是聚簇因子:優化器在產生執行計劃時不考慮位圖索引的聚簇因子,但是對於B樹索引來說,則需要 考慮聚簇因子。在上面的情況中,位圖索引比B樹索引更有效。
下面的步驟揭示了這些索引更有趣的方面。
步驟5.1(在表test_normal上)
在表test_normal的SAL列上建立一個位圖索引,該列擁有普通的cardinality。
- SQL> create bitmap index normal_sal_bmx on test_normal(sal);
- Index created.
- SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- SQL> create bitmap index normal_sal_bmx on test_normal(sal);
- Index created.
- SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
得到索引的大小和聚簇因子:
- SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2* from user_segments
- 3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');
- SEGMENT_NAME Size in MB
- ------------------------------ --------------
- TEST_NORMAL 50
- NORMAL_SAL_BMX 4
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ------------------------------ ----------------------------------
- NORMAL_SAL_BMX 6001
- SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2* from user_segments
- 3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');
- SEGMENT_NAME Size in MB
- ------------------------------ --------------
- TEST_NORMAL 50
- NORMAL_SAL_BMX 4
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ------------------------------ ----------------------------------
- NORMAL_SAL_BMX 6001
下面執行查詢,首先執行相等性查詢:
- SQL> set autot trace
- SQL> select * from test_normal where sal=&sal;
- Enter value for sal: 1869
- old 1: select * from test_normal where sal=&sal
- new 1: select * from test_normal where sal=1869
- 164 rows selected.
- Elapsed: 00:00:00.08
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 165 consistent gets
- 0 physical reads
- 0 redo size
- 8461 bytes sent via SQL*Net to client
- 609 bytes received via SQL*Net from client
- 12 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 164 rows processed
- SQL> set autot trace
- SQL> select * from test_normal where sal=&sal;
- Enter value for sal: 1869
- old 1: select * from test_normal where sal=&sal
- new 1: select * from test_normal where sal=1869
- 164 rows selected.
- Elapsed: 00:00:00.08
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 165 consistent gets
- 0 physical reads
- 0 redo size
- 8461 bytes sent via SQL*Net to client
- 609 bytes received via SQL*Net from client
- 12 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 164 rows processed
接下來是範圍查詢:
- SQL> select * from test_normal where sal between &sal1 and &sal2;
- Enter value for sal1: 1500
- Enter value for sal2: 2000
- old 1: select * from test_normal where sal between &sal1 and &sal2
- new 1: select * from test_normal where sal between 1500 and 2000
- 83743 rows selected.
- Elapsed: 00:00:05.00
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
- =2001024)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
- Bytes=2001024)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 11778 consistent gets
- 5850 physical reads
- 0 redo size
- 4123553 bytes sent via SQL*Net to client
- 61901 bytes received via SQL*Net from client
- 5584 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 83743 rows processed
- SQL> select * from test_normal where sal between &sal1 and &sal2;
- Enter value for sal1: 1500
- Enter value for sal2: 2000
- old 1: select * from test_normal where sal between &sal1 and &sal2
- new 1: select * from test_normal where sal between 1500 and 2000
- 83743 rows selected.
- Elapsed: 00:00:05.00
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
- =2001024)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
- Bytes=2001024)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 11778 consistent gets
- 5850 physical reads
- 0 redo size
- 4123553 bytes sent via SQL*Net to client
- 61901 bytes received via SQL*Net from client
- 5584 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 83743 rows processed
現在,刪除test_normal上的位圖索引並且建立一個B樹索引。
- SQL> create index normal_sal_idx on test_normal(sal);
- Index created.
- SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
- SQL> create index normal_sal_idx on test_normal(sal);
- Index created.
- SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;
- Table analyzed.
查看索引大小和聚簇因子:
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');
- SEGMENT_NAME Size in MB
- ------------------------------ ---------------
- TEST_NORMAL 50
- NORMAL_SAL_IDX 17
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ------------------------------ ----------------------------------
- NORMAL_SAL_IDX 986778
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');
- SEGMENT_NAME Size in MB
- ------------------------------ ---------------
- TEST_NORMAL 50
- NORMAL_SAL_IDX 17
- SQL> select index_name, clustering_factor from user_indexes;
- INDEX_NAME CLUSTERING_FACTOR
- ------------------------------ ----------------------------------
- NORMAL_SAL_IDX 986778
從上表可以看出,B樹索引大於相同列上的位圖索引,它的聚簇因子接近於表中的行數。
現在,先執行相等性查詢:
- SQL> set autot trace
- SQL> select * from test_normal where sal=&sal;
- Enter value for sal: 1869
- old 1: select * from test_normal where sal=&sal
- new 1: select * from test_normal where sal=1869
- 164 rows selected.
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032)
- 2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 177 consistent gets
- 0 physical reads
- 0 redo size
- 8461 bytes sent via SQL*Net to client
- 609 bytes received via SQL*Net from client
- 12 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 164 rows processed
- SQL> set autot trace
- SQL> select * from test_normal where sal=&sal;
- Enter value for sal: 1869
- old 1: select * from test_normal where sal=&sal
- new 1: select * from test_normal where sal=1869
- 164 rows selected.
- Elapsed: 00:00:00.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032)
- 2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 177 consistent gets
- 0 physical reads
- 0 redo size
- 8461 bytes sent via SQL*Net to client
- 609 bytes received via SQL*Net from client
- 12 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 164 rows processed
接下來是範圍查詢:
- SQL> select * from test_normal where sal between &sal1 and &sal2;
- Enter value for sal1: 1500
- Enter value for sal2: 2000
- old 1: select * from test_normal where sal between &sal1 and &sal2
- new 1: select * from test_normal where sal between 1500 and 2000
- 83743 rows selected.
- Elapsed: 00:00:04.03
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
- =2001024)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
- Bytes=2001024)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 11778 consistent gets
- 3891 physical reads
- 0 redo size
- 4123553 bytes sent via SQL*Net to client
- 61901 bytes received via SQL*Net from client
- 5584 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 83743 rows processed
- SQL> select * from test_normal where sal between &sal1 and &sal2;
- Enter value for sal1: 1500
- Enter value for sal2: 2000
- old 1: select * from test_normal where sal between &sal1 and &sal2
- new 1: select * from test_normal where sal between 1500 and 2000
- 83743 rows selected.
- Elapsed: 00:00:04.03
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
- =2001024)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
- Bytes=2001024)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 11778 consistent gets
- 3891 physical reads
- 0 redo size
- 4123553 bytes sent via SQL*Net to client
- 61901 bytes received via SQL*Net from client
- 5584 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 83743 rows processed
在不同的數據集上執行查詢的結果如下,可以看出邏輯和物理I/O的次數基本上是相同的。
BITMAP |
SAL (Equality)
|
B-TREE | Rows Fetched | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | ||
165 | 0 | 1869 | 177 | 164 | |
169 | 163 | 3548 | 181 | 167 | |
174 | 166 | 6500 | 187 | 172 | |
75 | 69 | 7000 | 81 | 73 | |
177 | 163 | 2500 | 190 | 175 |
BITMAP |
SAL (Range)
|
B-TREE | Rows Fetched | ||
Consistent Reads | Physical Reads | Consistent Reads | Physical Reads | ||
11778 | 5850 | 1500-2000 | 11778 | 3891 | 83743 |
11765 | 5468 | 2000-2500 | 11765 | 3879 | 83328 |
11753 | 5471 | 2500-3000 | 11753 | 3884 | 83318 |
17309 | 5472 | 3000-4000 | 17309 | 3892 | 166999 |
39398 | 5454 | 4000-7000 | 39398 | 3973 | 500520 |
對於範圍查詢,優化器選擇了全表掃描,根本沒有使用索引。但是對於相等性查詢,優化器使用了索引。再次,邏輯和物理I/O是相同的。
因此,可以得出結論,對於一個具有normal-cardinality的列來說,優化器對於兩種類型的索引的選擇是相同的,並且沒有明顯的I/O差異。
步驟6(增加GENDER列)
在測試low-cardinality列之前,我們先增加一個GENDER列並且把它的值更新成M,F或者null。
- SQL> alter table test_normal add GENDER varchar2(1);
- Table altered.
- SQL> select GENDER, count(*) from test_normal group by GENDER;
- S COUNT(*)
- - ----------
- F 333769
- M 499921
- 166310
- 3 rows selected.
- SQL> alter table test_normal add GENDER varchar2(1);
- Table altered.
- SQL> select GENDER, count(*) from test_normal group by GENDER;
- S COUNT(*)
- - ----------
- F 333769
- M 499921
- 166310
- 3 rows selected.
該列上位圖索引的大小大約爲570KB,如下表所示:
- SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);
- Index created.
- Elapsed: 00:00:02.08
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');
- SEGMENT_NAME Size in MB
- ------------------------------ ---------------
- TEST_NORMAL 50
- NORMAL_GENDER_BMX .5625
- 2 rows selected.
- SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);
- Index created.
- Elapsed: 00:00:02.08
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');
- SEGMENT_NAME Size in MB
- ------------------------------ ---------------
- TEST_NORMAL 50
- NORMAL_GENDER_BMX .5625
- 2 rows selected.
相對而言,改列上的B樹索引的大小爲13M,比位圖索引大的多。
- SQL> create index normal_GENDER_idx on test_normal(GENDER);
- Index created.
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');
- SEGMENT_NAME Size in MB
- ------------------------------ ---------------
- TEST_NORMAL 50
- NORMAL_GENDER_IDX 13
- 2 rows selected.
- SQL> create index normal_GENDER_idx on test_normal(GENDER);
- Index created.
- SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
- 2 from user_segments
- 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');
- SEGMENT_NAME Size in MB
- ------------------------------ ---------------
- TEST_NORMAL 50
- NORMAL_GENDER_IDX 13
- 2 rows selected.
現在,執行相等性查詢,優化器將不使用該索引,不論是位圖索引還是B樹索引,它將使用全部掃描。
- SQL> select * from test_normal where GENDER is null;
- 166310 rows selected.
- Elapsed: 00:00:06.08
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310 Bytes=4157750)
- SQL> select * from test_normal where GENDER='M';
- 499921 rows selected.
- Elapsed: 00:00:16.07
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=499921Bytes=12498025)
- SQL>select * from test_normal where GENDER='F'
- /
- 333769 rows selected.
- Elapsed: 00:00:12.02
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte
- s=8344225)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769
- Bytes=8344225)
- SQL> select * from test_normal where GENDER is null;
- 166310 rows selected.
- Elapsed: 00:00:06.08
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310 Bytes=4157750)
- SQL> select * from test_normal where GENDER='M';
- 499921 rows selected.
- Elapsed: 00:00:16.07
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=499921Bytes=12498025)
- SQL>select * from test_normal where GENDER='F'
- /
- 333769 rows selected.
- Elapsed: 00:00:12.02
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte
- s=8344225)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769
- Bytes=8344225)
結論
現在我們瞭解了優化器對於這些技術做出的反應,現在我們來看對於位圖索引和B樹索引最適合的程序。
保持GENDER列上的位圖索引,在SAL列上再建立一個位圖索引然後執行一些查詢。對這些列上的B樹索引執行同樣的查詢。
在表test_normal中,你需要所有工資等於下列值的所有女性僱員的僱員號碼:
1000
1500
2000
2500
3000
3500
4000
4500
因此:
- SQL>select * from test_normal
- where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
- SQL>select * from test_normal
- where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
這是一個典型的數據倉庫查詢,絕對不要在OLTP系統中執行該查詢。下面是兩列上具有位圖索引時的結果:
- SQL>select * from test_normal
- where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
- 1453 rows selected.
- Elapsed: 00:00:02.03
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754 Bytes=18850)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP AND
- 4 3 BITMAP OR
- 5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 1353 consistent gets
- 920 physical reads
- 0 redo size
- 75604 bytes sent via SQL*Net to client
- 1555 bytes received via SQL*Net from client
- 98 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1453 rows processed
- SQL>select * from test_normal
- where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
- 1453 rows selected.
- Elapsed: 00:00:02.03
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850)
- 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754 Bytes=18850)
- 2 1 BITMAP CONVERSION (TO ROWIDS)
- 3 2 BITMAP AND
- 4 3 BITMAP OR
- 5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
- 14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 1353 consistent gets
- 920 physical reads
- 0 redo size
- 75604 bytes sent via SQL*Net to client
- 1555 bytes received via SQL*Net from client
- 98 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1453 rows processed
下面是B樹索引時的結果:
- SQL>select * from test_normal
- where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
- 1453 rows selected.
- Elapsed: 00:00:03.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 6333 consistent gets
- 4412 physical reads
- 0 redo size
- 75604 bytes sent via SQL*Net to client
- 1555 bytes received via SQL*Net from client
- 98 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1453 rows processed
- SQL>select * from test_normal
- where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';
- 1453 rows selected.
- Elapsed: 00:00:03.01
- Execution Plan
- ----------------------------------------------------------
- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850)
- 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850)
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 6333 consistent gets
- 4412 physical reads
- 0 redo size
- 75604 bytes sent via SQL*Net to client
- 1555 bytes received via SQL*Net from client
- 98 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 1453 rows processed
可以看出,如果使用B樹索引,優化器使用全表掃描;而對於位圖索引,則使用索引。從獲取結果需要的I/O次數可以推斷出性能。
總之,基於如下的原因,位圖索引適用於決策支持系統,而不管cardinality的高低:
- 使用位圖索引,優化器可以高效地執行包含AND,OR或者XOR的查詢。
- 使用位圖索引,優化器可以回答對null的查詢和計數。null值在位圖索引時同樣被加上索引(不像B樹索引)。
- 最重要的是,在決策支持系統中,位圖索引支持特殊的查詢,但B樹索引則不能。具體來說,如果你有一個包含50列的表,用戶經常查詢其中的10列 ---- 10列的組合或者有時是其中一列,創建B樹索引會比價困難。如果你在這些列上建立10個位圖索引,這些查詢都可以通過索引回答,不論你查詢的是全部10列,還是10列中的4列或者6列,或者其中的一列。
相反,B樹索引非常適合於OLTP系統,其用戶執行的都是常規的查詢。因爲在OLTP系統中,數據會頻繁地更新和刪除,如果使用位圖索引將會引起嚴重的鎖定性能問題。
兩種索引都有一個共同的目的:儘快地得到結果。但是你應該依據程序的類型來選擇其中之一,而不是根據cardinality水平。