相關列問題

 先來創建一個表T

create table t as select level as id ,level||'a' as a,level||level||'b' as b from dual connect by level<100;

這裏A列的值能夠確定B列的值,

insert into t  select * from t; 
.............................. 一直重複插入數據

SQL> select count(*) from t;

  COUNT(*)
----------
   3244032

create index idx1 on t(a);  
create index idx2 on t(a,b);   
  
BEGIN
  DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'SCOTT',
                                tabname          => 'T',
                                estimate_percent => 100,
                                method_opt       => 'for all columns size skewonly',
                                no_invalidate    => FALSE,
                                degree           => 8,
                                cascade          => TRUE);
END;
/

SQL> select * from t where a='1a' and b='11b';

已選擇32768行。

已用時間:  00: 00: 03.98

執行計劃
----------------------------------------------------------
Plan hash value: 2303463401

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |   331 |  3972 |    84   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |   331 |  3972 |    84   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | IDX2 |   331 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"='1a' AND "B"='11b')


統計信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      11838  consistent gets
       7943  physical reads
          0  redo size
     441749  bytes sent via SQL*Net to client
      24424  bytes received via SQL*Net from client
       2186  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      32768  rows processed

因爲CBO不知道A與B關係,所以計算基數等於331,

SQL> select 1/99/99*3244032 from dual; ----這個其實就是 a選擇性*b選擇性 =(1/99)*(1/99)

1/99/99*3244032
---------------
     330.989899

但是實際上它要返回32768條記錄

SQL> select * from t where a='1a';

已選擇32768行。

已用時間:  00: 00: 01.38

執行計劃
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 32768 |   384K|  1874   (8)| 00:00:23 |
|*  1 |  TABLE ACCESS FULL| T    | 32768 |   384K|  1874   (8)| 00:00:23 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"='1a')


統計信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      10120  consistent gets
       6312  physical reads
          0  redo size
     441749  bytes sent via SQL*Net to client
      24424  bytes received via SQL*Net from client
       2186  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      32768  rows processed

如果where條件單獨是 where a='1a' CBO 就能夠算對基數,它的基數是這樣計算的

SQL> select 3244032/99 from dual;

3244032/99
----------
     32768 

很顯然,這個SQL select * from t where a='1a' and b='11b' 的執行計劃走錯了,它應該走全表掃描,但是因爲計算基數錯誤,導致它走 IDX2這個索引

相關列的解決辦法在Oracle中有2個,一個是動態採樣,另外一個就是Oracle11g,對相關列收集擴展統計

SQL> ALTER SESSION SET optimizer_dynamic_sampling=6;

會話已更改。

SQL> set lines 200
SQL> set pages 200
SQL> set timi on
SQL> explain plan for select * from t where a='1a' and b='11b';

已解釋。

已用時間:  00: 00: 00.86
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 32776 |   384K|  1885   (8)| 00:00:23 |
|*  1 |  TABLE ACCESS FULL| T    | 32776 |   384K|  1885   (8)| 00:00:23 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"='1a' AND "B"='11b')

Note
-----
   - dynamic sampling used for this statement

已選擇17行。

 

設置動態採樣之後 Oracle評估基數就基本正確了,關於11g擴展統計這裏就不做了,有興趣的請自己做一下。

我對相關列的建議就是,能否在程序裏拼接?如果A能確定B,那麼做DB 設計的時候就不要創建B列了 直接在程序里根據A列的值生成B的值 這樣減少DB的存儲空間。

如果非要在DB裏設置B列,寫SQL的時候就不要把2個列都寫進去,也就是說不要寫成
select * from t where a='1a' and b='11b';

直接寫成
select * from t where a='1a'  或者 select * from t where b='11b'

這樣能儘量避免CBO計算基數出錯,如果這個表要與多表關聯,基數一旦算錯,必然導致整個SQL的執行計劃全部出錯,從而導致SQL性能下降。

動態採樣和擴展統計雖然是解決辦法,但是如果產品要考慮兼容性呢?我的產品要同時支持ORACLE,DB2,SQLSERVER,甚至以後的國產數據庫達夢,他們沒有動態採樣怎麼辦。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章