dbms_stats收集統計信息中的sample_size爲什麼跟estimated_percent不匹配？

原創

jarry_gao

2020-02-20 23:18

一張10000條記錄的表c1
create table c1 as select * from dba_objects where rownum <10001;

exec dbms_stats.gather_table_stats(user,'c1',estimate_percent=> 1);
按照理解，sample_size應該是10000*1%=100條左右

實際上
select sample_size from user_tables where table_name ='T1';
結果大致會在4400-5900附近。

而500000條記錄的表，sample size也在5000附近。

原因在哪裏呢？

打開SQL_TRACE

alter session set sql_trace=true;
exec dbms_stats.gather_table_stats(user,'t1',estimate_percent=> 1);
alter session set sql_trace=false;

select sample_size from user_tables where table_name ='C1';
4725

select value from v$diag_info where name='Default Trace File';
E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_ora_23220.trc
這個是trace文件放的位置

cd E:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\
e:\

tkprof orcl_ora_23220.trc a.trc

打開a.trc

找到下面的部分
select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
no_substrb_pad */count(*), count("OBJECT_ID"), count(distinct "OBJECT_ID"),
sum(sys_op_opnsize("OBJECT_ID")), substrb(dump(min("OBJECT_ID"),16,0,32),1,
120), substrb(dump(max("OBJECT_ID"),16,0,32),1,120)
from
"TEST"."C1" sample ( 1.0000000000) t

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00         10         12          0           1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total        4      0.01       0.01         10         12          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 90 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
         1          1          1 SORT AGGREGATE (cr=12 pr=10 pw=0 time=5504 us)
        64         64         64   VIEW VW_DAG_0 (cr=12 pr=10 pw=0 time=4775 us cost=5 size=4680 card=60)
        64         64         64    HASH GROUP BY (cr=12 pr=10 pw=0 time=4702 us cost=5 size=240 card=60)
        64         64         64     TABLE ACCESS SAMPLE C1 (cr=12 pr=10 pw=0 time=1031 us cost=4 size=240 card=60)

********************************************************************************

SQL ID: 50zv2cg4980b7 Plan Hash: 2789923169

select /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
no_substrb_pad */count(*), count("OBJECT_ID"), count(distinct "OBJECT_ID"),
sum(sys_op_opnsize("OBJECT_ID")), substrb(dump(min("OBJECT_ID"),16,0,32),1,
120), substrb(dump(max("OBJECT_ID"),16,0,32),1,120)
from
"TEST"."C1" sample ( 78.1250000000) t

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.04       0.03          0         12          0           1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total        4      0.04       0.04          0         12          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 90 (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
         1          1          1 SORT AGGREGATE (cr=12 pr=0 pw=0 time=36494 us)
      4725       4725       4725   VIEW VW_DAG_0 (cr=12 pr=0 pw=0 time=47839 us cost=5 size=365040 card=4680)
      4725       4725       4725    HASH GROUP BY (cr=12 pr=0 pw=0 time=29064 us cost=5 size=18720 card=4680)
      4725       4725       4725     TABLE ACCESS SAMPLE C1 (cr=12 pr=0 pw=0 time=9405 us cost=4 size=18720 card=4680)

最下面的4725就是本次gather獲得的sample size數。

我們會發現，oracle首先按照給定的比例1%去取個總數，發現64條記錄，
然後又重新按照78.125%去取總數，取出來正好是4725條記錄，這個就是我們前面看到的sample_size

那爲什麼是78.125呢？

64*78.125=5000

答案就是oracle用5000去除前面的1%的返回值得到的。

是不是oracle就是設定了5000呢？

換一個表、換一個比例，發現同樣的公式仍然是對的。

也就是說，如果oracle發現給定的比例不夠5000條記錄的話，會自動湊到5000條記錄來。

jarry_gao

發佈了15 篇原創文章 · 獲贊 1 · 訪問量 4萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

dbms_stats收集統計信息中的sample_size爲什麼跟estimated_percent不匹配？

exec dbms_stats.gather_table_stats(user,'c1',estimate_percent=> 1);
按照理解，sample_size應該是10000*1%=100條左右

實際上
select sample_size from user_tables where table_name ='T1';
結果大致會在4400-5900附近。

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

一個docker容器暴露多個端口

leetcode 60 排列序列

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

ORA-600 [2662] ORACLE 11.2.0.4

oracle 10G拔公網網線會發生什麼？插回去之後會發生什麼？

一次邏輯壞塊恢復過程

索引使用空間異常增長一例

shell腳本自動化採集性能sql

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

dbms_stats收集統計信息中的sample_size爲什麼跟estimated_percent不匹配？

exec dbms_stats.gather_table_stats(user,'c1',estimate_percent=> 1); 按照理解，sample_size應該是10000*1%=100條左右

實際上 select sample_size from user_tables where table_name ='T1'; 結果大致會在4400-5900附近。

exec dbms_stats.gather_table_stats(user,'c1',estimate_percent=> 1);
按照理解，sample_size應該是10000*1%=100條左右

實際上
select sample_size from user_tables where table_name ='T1';
結果大致會在4400-5900附近。