13.直方圖

2013-08-08 下午 星期四


---------------直方圖--------------------------


直方圖信息——收集性能數據的時候要收集的內容,對執行計劃有巨大的影響。


dbms_stats包對錶和索引的分析分爲三個層次:

1、表自身的分析:表的行數、行長、數據塊等信息,user_tables可以查到一部分

2、對列的分析:包括列值的重複數,列上的null值,數據在列上的分佈情況。(直方圖)

3、對索引的分析:包括索引葉子塊的數量、索引的高度、索引的聚簇因子。


直方圖單指數據在列上的分佈情況:


生成直方圖的過程——當Oracle作直方圖分析的時候,會將列上的數據分成很多相同的部分,

把每一個部分叫做一個bucket(桶)

這樣CBO就很容易知道列上數值的分佈情況,

這種數據的分佈分析將作爲一個重要的因素納入到成本計算裏面。


histogram類型:

1、height直方圖——等高直方圖

2、frequencey直方圖——頻度直方圖


案例:


1、創建一個案例表


create table t as

select rownum as id,round(dbms_random.normal*1000) as val1,

100+round(ln(rownum/3.25+2)) as val2,

100+round(ln(rownum/3.25+2)) as val3,

dbms_random.string('P',250) as pad

from dual connect by level<=1000

order by dbms_random.value;


2、加主鍵


SQL> alter table t add constraint t_pk primary key(id);


Table altered.


3、創建索引


SQL> create index t_val1_i on t(val1);


Index created.


SQL> create index t_val2_i on t(val2);


Index created.


SQL> exec dbms_stats.gather_table_stats(user,'t',estimate_percent=>100,method_opt=>'for all columns size auto',cascade=>true);--對所有的列作直方圖分析,桶數oracle自己去決定


PL/SQL procedure successfully completed.


method_opt——

Accepts:

FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]

FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...]


size_clause is defined as size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}


- integer : Number of histogram buckets. Must be in the range [1,254].

- REPEAT : Collects histograms only on the columns that already have histograms.

- AUTO : Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.

- SKEWONLY : Oracle determines the columns to collect histograms based on the data distribution of the columns.


The default is FOR ALL COLUMNS SIZE AUTO.The default value can be changed using the SET_PARAM Procedure.


查詢視圖:

select column_name,num_buckets,low_value,high_value,density,num_nulls,avg_col_len,histogram,num_distinct from user_tab_col_statistics where table_name='T'


發現:histogram=NONE  表示直方圖信息都沒有生成。


VAL1字段的最大值和最小值:發現是經過加密的。


oracle提供了這個值的反解的方法:

select column_name,num_buckets,utl_raw.cast_to_number(low_value) as low_value,utl_raw.cast_to_number(high_value) as high_value,density,num_nulls,avg_col_len,histogram,num_distinct from user_tab_col_statistics where table_name='T'

--數字反解


select column_name,num_buckets,utl_raw.cast_to_varchar2(low_value) as low_value,utl_raw.cast_to_varchar2(high_value) as high_value,density,num_nulls,avg_col_len,histogram,num_distinct from user_tab_col_statistics where table_name='T'

--字符串的反解


SQL> set linesize 100

SQL> desc user_tab_col_statistics;

Name                                                  Null?    Type

----------------------------------------------------- -------- ------------------------------------

TABLE_NAME                                                     VARCHAR2(30)   --表名

COLUMN_NAME                                                    VARCHAR2(30)   --列名

NUM_DISTINCT                                                   NUMBER     --列的不重複值個數

LOW_VALUE                                                      RAW(32)     --最小值

HIGH_VALUE                                                     RAW(32)     --最大值

DENSITY                                                        NUMBER      --密度

NUM_NULLS                                                      NUMBER     --空值個數

NUM_BUCKETS                                                    NUMBER     --桶數

LAST_ANALYZED                                                  DATE       --最後一此分析時間

SAMPLE_SIZE                                                    NUMBER      --採樣大小

GLOBAL_STATS                                                   VARCHAR2(3)     --狀態

USER_STATS                                                     VARCHAR2(3)    

AVG_COL_LEN                                                    NUMBER

HISTOGRAM                                                      VARCHAR2(15)  --直方圖類型


DENSITY——0到1之間的一個數字,越接近於0,表示過濾操作能去掉很多行(重複率比較高),越接近於1,表示過濾操作無法去掉很多行(重複率比較低)。


如果沒有直方圖的話,DENSITY=1/NUM_DISTINCT


強制生成直方圖:

SQL> exec dbms_stats.gather_table_stats(user,'t',estimate_percent=>100,method_opt=>'for all columns size SKEWONLY',cascade=>true);


PL/SQL procedure successfully completed.


SQL> set linesize 1000

SQL> select column_name,num_buckets,num_distinct,density,histogram from user_tab_col_statistics where table_name='T';


COLUMN_NAME                    NUM_BUCKETS NUM_DISTINCT    DENSITY HISTOGRAM

------------------------------ ----------- ------------ ---------- ---------------

ID                                       1         1000       .001 NONE

VAL1                                   254          871    .001288 HEIGHT BALANCED

VAL2                                     6            6      .0005 FREQUENCY

VAL3                                     6            6      .0005 FREQUENCY

PAD                                    254         1000       .001 HEIGHT BALANCED


SQL> select val2,count(1) from t group by val2 order by val2;


VAL2   COUNT(1)

---------- ----------

      101          8

      102         25

      103         68

      104        185

      105        502

      106        212


6 rows selected.


FREQUENCY直方圖——將相同的值放在一個桶內。

HEIGHT BALANCED——不重複的值超過了254,所以沒辦法作FREQUENCY,作等高直方圖,桶等高的,裏面的值是不相同的。


select endpoint_value,endpoint_number

from user_tab_histograms where table_name='T' and column_name='VAL2'   這個視圖中存儲的是累加行數的效果

order by endpoint_number;


ENDPOINT_VALUE ENDPOINT_NUMBER

-------------- ---------------

          101               8

          102              33

          103             101

          104             286

          105             788

          106            1000


6 rows selected.


查到累加值和原值


select endpoint_value,endpoint_number,endpoint_number-lag(endpoint_number,1,0) over(order by endpoint_number) as yz

from user_tab_histograms where table_name='T' and column_name='VAL2'  --lag錯位相減函數

order by endpoint_number;


ENDPOINT_VALUE ENDPOINT_NUMBER         YZ

-------------- --------------- ----------

          101               8          8

          102              33         25

          103             101         68

          104             286        185

          105             788        502

          106            1000        212


6 rows selected.


CBO是怎樣利用頻度直方圖精確的估算基於列val2過濾後的結果的。


SQL> explain plan set statement_id '101' for select * from t where val2=101;


Explained.


SQL> explain plan set statement_id '102' for select * from t where val2=102;


Explained.


SQL> explain plan set statement_id '103' for select * from t where val2=103;


Explained.


SQL> explain plan set statement_id '104' for select * from t where val2=104;


Explained.


SQL> explain plan set statement_id '105' for select * from t where val2=105;


Explained.


SQL> explain plan set statement_id '106' for select * from t where val2=106;


Explained.


SQL> select statement_id,cardinality from plan_table where id=0 order by statement_id;


STATEMENT_ID    CARDINALITY

------------------------------ -----------

101                                      8    --執行計劃中的rows就是從頻度直方圖中取到的。

102                                     25

103                                     68

104                                    185

105                                    502

106                                    212


6 rows selected.


SQL> select * from t where val2=101;


8 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 289244162


----------------------------------------------------------------------------------------

| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |          |     8 |  2136 |     3   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T        |     8 |  2136 |     3   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | T_VAL2_I |     8 |       |     1   (0)| 00:00:01 |

----------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  2 - access("VAL2"=101)


Statistics

----------------------------------------------------------

         1  recursive calls

         0  db block gets

        10  consistent gets

         0  physical reads

         0  redo size

      2757  bytes sent via SQL*Net to client

       400  bytes received via SQL*Net from client

         2  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

         8  rows processed   --endpoint_number-lag(endpoint_number,1,0) over(order by endpoint_number)


SQL> select * from t where val2=106;


212 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1601196873


--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      |   212 | 56604 |    11   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T    |   212 | 56604 |    11   (0)| 00:00:01 |

--------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  1 - filter("VAL2"=106)



Statistics

----------------------------------------------------------

         1  recursive calls

         0  db block gets

        57  consistent gets

         0  physical reads

         0  redo size

     58464  bytes sent via SQL*Net to client

       554  bytes received via SQL*Net from client

        16  SQL*Net roundtrips to/from client

         0  sorts (memory)

         0  sorts (disk)

       212  rows processed


模擬等高直方圖的計算過程


select count(1),max(val2),bucket from(

select val2,ntile(5) over(order by val2) as bucket from t)

group by bucket;


 COUNT(1)  MAX(VAL2)     BUCKET

---------- ---------- ----------

      200        104          1

      200        105          2

      200        106          4

      200        106          5

      200        105          3


強制作等高直方圖:

SQL> exec dbms_stats.gather_table_stats(user,'t',estimate_percent=>100,method_opt=>'for all columns size 5',cascade=>true);


PL/SQL procedure successfully completed.


SQL>select column_name,num_buckets,num_distinct,density,histogram from user_tab_col_statistics where table_name='T';


COLUMN_NAME                    NUM_BUCKETS NUM_DISTINCT    DENSITY HISTOGRAM

------------------------------ ----------- ------------ ---------- ---------------

ID                                       5         1000       .001 HEIGHT BALANCED

VAL1                                     5          871    .001288 HEIGHT BALANCED

VAL2                                     5            6 .138244755 HEIGHT BALANCED

VAL3                                     5            6 .138244755 HEIGHT BALANCED

PAD                                      5         1000       .001 HEIGHT BALANCED


全部變爲等高直方圖


SQL> select endpoint_value,endpoint_number from user_tab_histograms where table_name='T' and column_name='VAL2' order by endpoint_number;


ENDPOINT_VALUE ENDPOINT_NUMBER   --此時ENDPOINT_NUMBER由值變爲桶號

-------------- ---------------

          101                          0

          104                         1

          105                          3

          106                          5


0——表示開始點,從101開始:

1——桶1:104最大值       101,102,103,104  ——估計每個值50個,          

2                                             104,105  桶2沒顯示錶示結束值和下面的桶一樣  

3——桶3:105最大值       105                            

4                                              105,106                            

5——桶5:106最大值   --記錄的是區間的結束點    



101:200/4=50

102:200/4=50

103:200/4=50

104:150

105:100+200+100=400

106:200+100=300


SQL> conn hr/hr

Connected.

SQL> explain plan set statement_id '101' for select * from t where val2=101;


Explained.


SQL> explain plan set statement_id '102' for select * from t where val2=102;


Explained.


SQL> explain plan set statement_id '103' for select * from t where val2=103;


Explained.


SQL> explain plan set statement_id '104' for select * from t where val2=104;


Explained.


SQL> explain plan set statement_id '105' for select * from t where val2=105;


Explained.


SQL> explain plan set statement_id '106' for select * from t where val2=106;


Explained.


SQL> select statement_id,cardinality from plan_table where id=0 order by statement_id;


STATEMENT_ID    CARDINALITY

------------------------------ -----------

101                                     50

102                                     50

103                                     50

104                                     50    --?

105                                    400

106                                    300


6 rows selected.


SQL> select * from t where val2=101;


8 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 289244162


----------------------------------------------------------------------------------------

| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |          |    50 | 13350 |    10   (0)| 00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| T        |    50 | 13350 |    10   (0)| 00:00:01 |

|*  2 |   INDEX RANGE SCAN          | T_VAL2_I |    50 |       |     1   (0)| 00:00:01 |

----------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  2 - access("VAL2"=101)


SQL>  select count(1) from t where val2=101;


 COUNT(1)

----------

        8    --實際有8行


SQL>  select count(1) from t where val2=105;


 COUNT(1)

----------

      502

SQL> select * from t where val2=105;


502 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1601196873


--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      |   400 |   104K|    11   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T    |   400 |   104K|    11   (0)| 00:00:01 |

--------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  1 - filter("VAL2"=105)


桶數不能超過254,上面的錯誤原因是給的桶數太少了,增加桶數的話,誤差一定會減小。


執行計劃中的Rows來源於直方圖的cardinality,此時估計的card可能是有誤差的,bucket越大,誤差越小,數據量越大,誤差越大。


誤差大,小小的數據變化,就可能導致直方圖發生很大的變化。



SQL> update t set val2=105 where val2=106 and rownum<13;


12 rows updated.


SQL> commit;


Commit complete.


SQL> exec dbms_stats.gather_table_stats(user,'t',estimate_percent=>100,method_opt=>'for all columns size 5',cascade=>true);


PL/SQL procedure successfully completed.


SQL> select endpoint_value,endpoint_number from user_tab_histograms where table_name='T' and column_name='VAL2' order by endpoint_number;


ENDPOINT_VALUE ENDPOINT_NUMBER

-------------- ---------------

          101               0

          104               1

          105               4

          106               5


SQL> conn hr/hr

Connected.

SQL> explain plan set statement_id '101' for select * from t where val2=101;


Explained.


SQL> explain plan set statement_id '102' for select * from t where val2=102;


Explained.


SQL> explain plan set statement_id '103' for select * from t where val2=103;


Explained.


SQL> explain plan set statement_id '104' for select * from t where val2=104;


Explained.


SQL> explain plan set statement_id '105' for select * from t where val2=105;


Explained.


SQL> explain plan set statement_id '106' for select * from t where val2=106;


Explained.


SQL>  select statement_id,cardinality from plan_table where id=0 order by statement_id;


STATEMENT_ID                   CARDINALITY

------------------------------ -----------

101                                     80

102                                     80

103                                     80

104                                     80

105                                    600

106                                     80


6 rows selected.


SQL> select * from t where val2=105;


514 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1601196873


--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      |   600 |   156K|    11   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T    |   600 |   156K|    11   (0)| 00:00:01 |

--------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  1 - filter("VAL2"=105)


select * from user_tab_histograms where table_name='T' --bucket的分佈情況

select * from user_tab_columns where table_name='T'    --列的直方圖情況

select * from user_tab_col_statistics where table_name='T'  --列的直方圖情況



將ID列的直方圖信息刪除

SQL> exec dbms_stats.delete_column_stats(user,'T','ID');


PL/SQL procedure successfully completed.



select * from user_tab_columns where table_name='T'  --確認,ID列上的histogram=NONE


重新收集直方圖信息,桶數Oralce自己去決定


SQL> exec dbms_stats.gather_table_stats(user,'t',estimate_percent=>100,method_opt=>'for all columns size 1',cascade=>true);  --不做直方圖


PL/SQL procedure successfully completed.



select * from user_tab_columns where table_name='T'   --確認,有些列沒有直方圖了。


SQL> select endpoint_value,endpoint_number from user_tab_histograms where table_name='T' and column_name='VAL2' order by endpoint_number;


ENDPOINT_VALUE ENDPOINT_NUMBER

-------------- ---------------

          101               0

          106               1



沒有直方圖就是隻有一個bucket,endpoint_number字段只有0和1兩個位置表示一個桶,同時也會記錄最大值和最小值。


另外一個數據字典也能看到這個信息:

select * from user_tab_col_statistics where table_name='T'


研究:如果沒有直方圖信息的時候,card是從哪裏取到的?


SQL> select endpoint_value,endpoint_number from user_tab_histograms where table_name='T' and column_name='VAL2' order by endpoint_number;


ENDPOINT_VALUE ENDPOINT_NUMBER

-------------- ---------------

          101               0

          106               1


SQL> conn hr/hr

Connected.

SQL> explain plan set statement_id '101' for select * from t where val2=101;


Explained.


SQL> explain plan set statement_id '102' for select * from t where val2=102;


Explained.


SQL> explain plan set statement_id '103' for select * from t where val2=103;


Explained.


SQL> explain plan set statement_id '104' for select * from t where val2=104;


Explained.


SQL> explain plan set statement_id '105' for select * from t where val2=105;


Explained.


SQL> explain plan set statement_id '106' for select * from t where val2=106;


Explained.


SQL> select statement_id,cardinality from plan_table where id=0 order by statement_id;


STATEMENT_ID                   CARDINALITY

------------------------------ -----------

101                                    167

102                                    167

103                                    167

104                                    167

105                                    167

106                                    167


6 rows selected.


SQL> select * from t where val2=103;


68 rows selected.



Execution Plan

----------------------------------------------------------

Plan hash value: 1601196873


--------------------------------------------------------------------------

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |      |   167 | 44589 |    11   (0)| 00:00:01 |

|*  1 |  TABLE ACCESS FULL| T    |   167 | 44589 |    11   (0)| 00:00:01 |

--------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------


  1 - filter("VAL2"=103)


SQL> select count(1) from t where val2=103;


 COUNT(1)

----------

       68


總結:當沒有直方圖信息的時候,只有一個bucket,oracle會認爲以均勻分佈的方式,

     來作card值的分佈計算,如果列值分佈確實是不均勻的,此時plan有較大的誤差,

     如果列值均勻分佈,這樣處理沒有問題,如果列值不均勻,執行計劃計劃誤差就大了。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章