直方圖統計導致錯誤的執行計劃

今天下午一哥們遇到個case.他說如下SQL語句

SELECT WORKITEMID
  FROM WFWIPARTICIPANT 
 WHERE PARTICIPANT IN ('771', '99999', '41', '146', '李錦');

用DBMS_STATS收集統計信息之後會走全表掃描,而用

analyze table WFWIPARTICIPANT  compute statistics ---收集統計信息會走索引

analyze table WFWIPARTICIPANT  delete statistics    ----刪除統計信息也會走索引,Oracle採用動態採樣

表結構如下:

SQL> desc WFWIPARTICIPANT;
Name            Type          Nullable Default Comments 
--------------- ------------- -------- ------- -------- 
WIPARTICID      NUMBER                                  
WORKITEMID      NUMBER        Y                         
PARTICIPANTTYPE VARCHAR2(20)  Y                         
PARTICIPANT     VARCHAR2(256) Y                         
PARTICIPANT2    VARCHAR2(64)  Y                         
WORKITEMSTATE   NUMBER(2)     Y                         
PARTIINTYPE     VARCHAR2(20)  Y                         
EXTEND1         VARCHAR2(64)  Y     

並且在列partintype 上面有個索引。

我叫那位哥們把表用exp導出,然後我自己導入到我電腦測試一下,測試過程:

首先用DBMS_STATS收集統計信息:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TSSA',TABNAME=>'WFWIPARTICIPANT',ESTIMATE_PERCENT=>100,CASCADE=>TRUE);

PL/SQL procedure successfully completed

執行該SQL

SQL> SELECT WORKITEMID
  2    FROM WFWIPARTICIPANT
  3   WHERE PARTICIPANT IN ('771', '99999', '41', '146', '李錦');

已選擇60行。


執行計劃
----------------------------------------------------------
Plan hash value: 1708170390

-------------------------------------------------------------------------------------
| Id  | Operation         | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                 |   530 |  7420 |   103   (5)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| WFWIPARTICIPANT |   530 |  7420 |   103   (5)| 00:00:02 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("PARTICIPANT"='146' OR "PARTICIPANT"='41' OR
              "PARTICIPANT"='771' OR "PARTICIPANT"='99999' OR "PARTICIPANT"='李錦')

發現確實是走了全表掃描,於是改用ANALYZE 收集統計信息

SQL> analyze table WFWIPARTICIPANT  delete statistics;

表已分析。

SQL> analyze table WFWIPARTICIPANT  compute statistics;

表已分析。

SQL> SELECT WORKITEMID
  2    FROM WFWIPARTICIPANT
  3   WHERE PARTICIPANT IN ('771', '99999', '41', '146', '李錦');

已選擇60行。


執行計劃
----------------------------------------------------------
Plan hash value: 1217134846

---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |    71 |   852 |    64   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |                    |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| WFWIPARTICIPANT    |    71 |   852 |    64   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | WF_IDX_PARTICIPANT |    71 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PARTICIPANT"='146' OR "PARTICIPANT"='41' OR "PARTICIPANT"='771' OR
              "PARTICIPANT"='99999' OR "PARTICIPANT"='李錦')
用ANALYZE命令分析之後,發現確實走了索引掃描

解決問題的思路:遇到這類問題,首先應該查詢索引的選擇率:

SQL> select a.owner,a.index_name,a.index_type,partitioned,b.num_rows,b.distinct_keys,b.num_rows/b.distinct_keys avg_row_per_key
  2  ,b.distinct_keys/b.num_rows SELECTIVITY,b.last_analyzed,b.stale_stats from dba_indexes a,dba_ind_statistics b
  3   where a.owner=b.owner and a.index_name=b.index_name and a.index_name='WF_IDX_PARTICIPANT';

OWNER INDEX_NAME           INDEX_TYPE PARTITIONED   NUM_ROWS DISTINCT_KEYS AVG_ROW_PER_KEY SELECTIVITY LAST_ANALYZED STALE_STATS
----- -------------------- ---------- ----------- ---------- ------------- --------------- ----------- ------------- -----------
TSSA  WF_IDX_PARTICIPANT   NORMAL     NO               57820          4073 14.195924380063 0.070442753 2010-4-15 17: NO

注意觀察,表一共有57820列,但是索引的列上面只有4073個不同的值,也就是說索引選擇性爲7%,很顯然,這列數據分佈不均衡。於是猜測該問題和直方圖有關。現在查看列上面有沒有直方圖:

SQL>  select owner,table_name,column_name,num_distinct,histogram,num_buckets from dba_tab_col_statistics
  2  where table_name='WFWIPARTICIPANT' and column_name='PARTICIPANT';

OWNER TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
----- ------------------------------ ------------------------------ ------------ --------------- -----------
TSSA  WFWIPARTICIPANT                PARTICIPANT                            4073 NONE                      1

NUM_BUCKETS爲1表示該列沒有直方圖,恩,這裏用ANALYZE命令收集的統計信息裏面沒有直方圖信息,於是改用

DBMS_STATS收集統計信息,看看是否有直方圖統計信息:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TSSA',TABNAME=>'WFWIPARTICIPANT',ESTIMATE_PERCENT=>100,CASCADE=>TRUE);

PL/SQL procedure successfully completed
SQL>  select owner,table_name,column_name,num_distinct,histogram,num_buckets from dba_tab_col_statistics
  2  where table_name='WFWIPARTICIPANT' and column_name='PARTICIPANT';

OWNER TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
----- ------------------------------ ------------------------------ ------------ --------------- -----------
TSSA  WFWIPARTICIPANT                PARTICIPANT                            4073 HEIGHT BALANCED         254

注意觀察,NUM_BUCKETS=254,Oracle自動的對該列收集了直方圖統計信息,於是懷疑直方圖的存在影響了執行計劃

,現在我刪掉直方圖的統計信息:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TSSA',TABNAME=>'WFWIPARTICIPANT',ESTIMATE_PERCENT=>100,DEGREE=>16,method_opt=>'for columns size 1 PARTICIPANT',CASCADE=>TRUE);

PL/SQL procedure successfully completed
SQL> select owner,table_name,column_name,num_distinct,histogram,num_buckets from dba_tab_col_statistics
  2  where table_name='WFWIPARTICIPANT' and column_name='PARTICIPANT';

OWNER TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT HISTOGRAM       NUM_BUCKETS
----- ------------------------------ ------------------------------ ------------ --------------- -----------
TSSA  WFWIPARTICIPANT                PARTICIPANT                            4073 NONE                      1

再次運行該查詢語句

SQL> SELECT WORKITEMID
  2    FROM WFWIPARTICIPANT
  3   WHERE PARTICIPANT IN ('771', '99999', '41', '146', '李錦');

已選擇60行。


執行計劃
----------------------------------------------------------
Plan hash value: 1217134846

---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |    71 |   994 |    64   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |                    |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| WFWIPARTICIPANT    |    71 |   994 |    64   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | WF_IDX_PARTICIPANT |    71 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("PARTICIPANT"='146' OR "PARTICIPANT"='41' OR "PARTICIPANT"='771' OR
              "PARTICIPANT"='99999' OR "PARTICIPANT"='李錦')

當刪掉直通圖統計信息之後,優化器選擇了我們期望的訪問路徑

這個案例提醒我們,對錶收集統計信息的時候,要寫好參數,另外,我們在對訪問路徑做優化的時候,首先應該查看的就是索引的選擇率,索引的類型,以及列上面關於直方圖的統計信息,有了直方圖的統計信息並不總是會給我們帶來好處,當然也不總是會對我們帶來壞處,具體問題具體對待。另外一個值得注意的就是,關於統計信息收集的方式,一定要寫好參數,如果Oracle自動去收集了直方圖統計信息,而我們不知道,這樣對於性能診斷會帶來麻煩的。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章