相關列問題

相關列問題

原創

robinson1988

2020-02-23 02:22

先來創建一個表T

create table t as select level as id ,level||'a' as a,level||level||'b' as b from dual connect by level<100;

這裏A列的值能夠確定B列的值，

insert into t select * from t;
.............................. 一直重複插入數據

SQL> select count(*) from t;

COUNT(*)
----------
3244032

create index idx1 on t(a);
create index idx2 on t(a,b);

BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'SCOTT',
                                tabname          => 'T',
                                estimate_percent => 100,
                                method_opt       => 'for all columns size skewonly',
                                no_invalidate    => FALSE,
                                degree           => 8,
                                cascade          => TRUE);
END;
/

SQL> select * from t where a='1a' and b='11b';

已選擇32768行。

已用時間:  00: 00: 03.98

執行計劃
----------------------------------------------------------
Plan hash value: 2303463401

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |   331 |  3972 |    84   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |   331 |  3972 |    84   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | IDX2 |   331 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"='1a' AND "B"='11b')


統計信息
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      11838  consistent gets
       7943  physical reads
          0  redo size
     441749  bytes sent via SQL*Net to client
      24424  bytes received via SQL*Net from client
       2186  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      32768  rows processed

因爲CBO不知道A與B關係，所以計算基數等於331，

SQL> select 1/99/99*3244032 from dual; ----這個其實就是 a選擇性*b選擇性 =(1/99)*(1/99)

1/99/99*3244032
---------------
330.989899

但是實際上它要返回32768條記錄

SQL> select * from t where a='1a';

已選擇32768行。

已用時間:  00: 00: 01.38

執行計劃
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 32768 |   384K|  1874   (8)| 00:00:23 |
|*  1 |  TABLE ACCESS FULL| T    | 32768 |   384K|  1874   (8)| 00:00:23 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"='1a')


統計信息
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      10120  consistent gets
       6312  physical reads
          0  redo size
     441749  bytes sent via SQL*Net to client
      24424  bytes received via SQL*Net from client
       2186  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      32768  rows processed

如果where條件單獨是 where a='1a' CBO 就能夠算對基數，它的基數是這樣計算的

SQL> select 3244032/99 from dual;

3244032/99
----------
32768

很顯然，這個SQL select * from t where a='1a' and b='11b' 的執行計劃走錯了，它應該走全表掃描，但是因爲計算基數錯誤，導致它走 IDX2這個索引

相關列的解決辦法在Oracle中有2個，一個是動態採樣，另外一個就是Oracle11g，對相關列收集擴展統計

SQL> ALTER SESSION SET optimizer_dynamic_sampling=6;

會話已更改。

SQL> set lines 200
SQL> set pages 200
SQL> set timi on
SQL> explain plan for select * from t where a='1a' and b='11b';

已解釋。

已用時間:  00: 00: 00.86
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 32776 |   384K|  1885   (8)| 00:00:23 |
|*  1 |  TABLE ACCESS FULL| T    | 32776 |   384K|  1885   (8)| 00:00:23 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("A"='1a' AND "B"='11b')

Note
-----
   - dynamic sampling used for this statement

已選擇17行。

設置動態採樣之後 Oracle評估基數就基本正確了，關於11g擴展統計這裏就不做了，有興趣的請自己做一下。

我對相關列的建議就是，能否在程序裏拼接？如果A能確定B，那麼做DB 設計的時候就不要創建B列了直接在程序里根據A列的值生成B的值這樣減少DB的存儲空間。

如果非要在DB裏設置B列，寫SQL的時候就不要把2個列都寫進去，也就是說不要寫成
select * from t where a='1a' and b='11b';

直接寫成
select * from t where a='1a' 或者 select * from t where b='11b'

這樣能儘量避免CBO計算基數出錯，如果這個表要與多表關聯，基數一旦算錯，必然導致整個SQL的執行計劃全部出錯，從而導致SQL性能下降。

動態採樣和擴展統計雖然是解決辦法，但是如果產品要考慮兼容性呢？我的產品要同時支持ORACLE,DB2，SQLSERVER，甚至以後的國產數據庫達夢，他們沒有動態採樣怎麼辦。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【SQL進階】CASE語句的使用

npm error Cannot read properties of null (reading 'isDescendantOf')

利用Python多進程並行執行加快MySQL批量UPDATE執行速度

A.COLUMN LIKE B.COLUMN% 關聯的優化方法

MySQL根據主鍵切割大事務(變相ROWID切片)

MySQL8.0.19 MGR MySQL router MySQL connector failover 組合實現高可用

抓出Oralce當前賬戶下所有表建表語句，遷移到MySQL

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結