要理解oracle中的dimension,首先要搞清楚dimension和dimension table之間的區別。dimension table是table,和關係數據庫中的其他table一樣,存放數據,需要實際的存儲空間。而dimension則只是一個邏輯結構,定義了 dimension table中的一個列或一組列於其他列之間的一個層次關係,dimension只保存定義,可以將其理解爲一種特定的constraint。所以,dimension不是一種必須存在的結構,但是,創建dimension對於數據倉庫中一些複雜的查詢重寫有着相當重要的意義。而查詢重寫,則是數據倉庫性能優化的一個不二法門。
數據倉庫中由於數據量巨大,一些聚合計算等 操作往往通過物化視圖預先計算存儲。但是,不可能對所有維度的所有可能的聚合操作都建立物化視圖,一則空間不允許,二則刷新時間也不允許。那麼,在對某些聚合操作的sql進行查詢重寫時,就希望能利用已經存在的物化視圖,儘管他們的聚合操作條件不完全一致。而dimension定義的各個level之間的 層次關係,對於一些上卷(rolling up)和下鑽(drilling down)操作的查詢重寫的判斷是相當重要的,而dimension中定義的attributes對於使用不同的列來做分組的查詢重寫起作用。
一個典型的dimension定義如下:
CREATE DIMENSION products_dim
LEVEL product IS (products.prod_id)
LEVEL subcategory IS (products.prod_subcategory)
LEVEL category IS (products.prod_category)
HIERARCHY prod_rollup (
product CHILD OF
subcategory CHILD OF
category
)
ATTRIBUTE product_info LEVEL product DETERMINES
(products.prod_name, products.prod_desc,
prod_weight_class, prod_unit_of_measure,
prod_pack_size, prod_status, prod_list_price, prod_min_price)
ATTRIBUTE subcategory DETERMINES
(prod_subcategory, prod_subcategory_desc)
ATTRIBUTE category DETERMINES
(prod_category, prod_category_desc);
dimension 中三個重要的屬性:level,hierarchy,attribute。其中level定義了一個或一組列爲一個整體,而hierarchy則定義了各 個level之間的層次關係,父level和子level之間是一種1:N的關係,而且,在dimension中可以指定多個hierarchy層次關 系。attribute則定義了level和其他列的一個1:1的關係,但這種1:1的關係不一定是可逆的,比如上面的列子,根據 product_info,也就是prod_id,可以確定prod_name,但不一定要求prod_name就能確定prod_id。
而 且,各個level之間的列不一定要來自同一個table,對於雪花模型,dimension table可能被規範化爲許多的小表,則dimension中的level可能是來自不同表中的列。這是需要在dimension中指定join key來指出各個表之間的關聯列。例如:
CREATE DIMENSION customers_dim
LEVEL customer IS (customers.cust_id)
LEVEL city IS (customers.cust_city)
LEVEL state IS (customers.cust_state_province)
LEVEL country IS (countries.country_id)
LEVEL subregion IS (countries.country_subregion)
LEVEL region IS (countries.country_region)
HIERARCHY geog_rollup (
customer CHILD OF
city CHILD OF
state CHILD OF
country CHILD OF
subregion CHILD OF
region
JOIN KEY (customers.country_id) REFERENCES country);
如果不指定skip when null子句,每個level中都不允許出現null值。
通過dbms_dimension.describe_dimension可以查看dimension的定義。
通 過dbms_dimension.validate_dimension可以檢查dimension是否定義正確,在執行之前需要執行 ultdim.sql創建一個dimension_exceptions表,如果定義有誤,則會在dimension_exceptions中查到相應的記錄。在9i裏,validate_dimension在dbms_olap包中。
####################################
在數據倉庫環境中,我們通常利用物化視圖強大的查詢重寫功能來提升統計查詢的性能,但是物化視圖的查詢重寫功能有時候無法智能地判斷查詢中一些相關聯的條件,以至於影響性能。比如我們有一張銷售表sales,用於存儲訂單的詳細信息,包含交易日期、顧客編號和銷售量。我們創建一張物化視圖,按月存儲累計銷量信息,假如這時候我們要查詢按季度或者按年度統計銷量信息,Oracle是否能夠智能地轉換查詢重寫呢?我們知道交易日期中的日期意味着月,月意味着所處的季度,季度意味着年度,但是Oracle卻是無法智能地判斷這其中的關係,因此無法利用物化視圖查詢重寫來返回我們季度或年度的銷量信息,而是直接查詢基表,導致性能產生問題。
這時候Dimension就派上用場了。Dimension用於說明列之間的父子對應關係,以使優化器能夠自動轉換不同列的關係,利用物化視圖的查詢功能來提升查詢統計性能。下面我們首先創建一張銷售交易表sales,包含交易日期、顧客編號和銷售量這幾個列,用於保存銷售訂單信息,整個表有42萬多條記錄;創建另一張表time_hierarchy用於存儲交易日期中時間的關係,包含交易日期及其對應的月、季度及年度等信息,然後我們將體驗Dimension的強大功能。
Roby@XUE> create table sales
2 (trans_date date, cust_id int, sales_amount number );
Table created.
Roby@XUE> insert /*+ APPEND */ into sales
2 select trunc(sysdate,'year')+mod(rownum,366) TRANS_DATE,
3 mod(rownum,100) CUST_ID,
4 abs(dbms_random.random)/100 SALES_AMOUNT
5 from all_objects
6 /
5926 rows created.
Roby@XUE> commit;
Commit complete.
Roby@XUE> begin
2 for i in 1 .. 6
3 loop
4 insert /*+ APPEND */ into sales
5 select trans_date, cust_id, abs(dbms_random.random)/100 SALES_AMOUNT
6 from sales;
7 commit;
8 end loop;
9 end;
10 /
PL/SQL procedure successfully completed.
Roby@XUE> select count(*) from sales;
COUNT(*)
----------
426672
創建索引組織表time_hierarchy,裏面生成了交易日期中日期DAY、月MMYYYY、季度QTY_YYYY、年度YYYY的關係。
Roby@XUE> create table time_hierarchy
2 (day primary key, mmyyyy, mon_yyyy, qtr_yyyy, yyyy)
3 organization index
4 as
5 select distinct
6 trans_date DAY,
7 cast (to_char(trans_date,'mmyyyy') as number) MMYYYY,
8 to_char(trans_date,'mon-yyyy') MON_YYYY,
9 'Q' || ceil( to_char(trans_date,'mm')/3) || ' FY'
10 || to_char(trans_date,'yyyy') QTR_YYYY,
11 cast( to_char( trans_date, 'yyyy' ) as number ) YYYY
12 from sales
13 /
Table created.
接下我們創建一張物化視圖mv_sales,用於存儲每個客戶對應每個月的銷量統計信息。
Roby@XUE> create materialized view mv_sales
2 build immediate
3 refresh on demand
4 enable query rewrite
5 as
6 select sales.cust_id, sum(sales.sales_amount) sales_amount,
7 time_hierarchy.mmyyyy
8 from sales, time_hierarchy
9 where sales.trans_date = time_hierarchy.day
10 group by sales.cust_id, time_hierarchy.mmyyyy
11 /
Materialized view created.
我們對基表進行分析,以使優化器能夠物化視圖的查詢重寫功能:
Roby@XUE> analyze table sales compute statistics;
Table analyzed.
Roby@XUE> analyze table time_hierarchy compute statistics;
Table analyzed.
設置會話的查詢重寫功能:
Roby@XUE> alter session set query_rewrite_enabled=true;
Session altered.
Roby@XUE> alter session set query_rewrite_integrity=trusted;
Session altered.
接下來我們按月統計總的銷量:
Roby@XUE> select time_hierarchy.mmyyyy, sum(sales_amount)
2 from sales, time_hierarchy
3 where sales.trans_date = time_hierarchy.day
4 group by time_hierarchy.mmyyyy
5 /
MMYYYY SUM(SALES_AMOUNT)
---------- -----------------
12006 4.0574E+11
12007 1.2297E+10
22006 3.6875E+11
32006 3.9507E+11
42006 3.7621E+11
52006 3.8549E+11
62006 3.6641E+11
72006 3.8110E+11
82006 3.8502E+11
92006 3.7278E+11
102006 3.7983E+11
112006 3.7210E+11
122006 3.8364E+11
13 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=327 Bytes=8502)
1 0 SORT (GROUP BY) (Cost=4 Card=327 Bytes=8502)
2 1 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=327 Bytes=8502)
Statistics
----------------------------------------------------------
17 recursive calls
0 db block gets
25 consistent gets
4 physical reads
我們可以看到查詢使用了查詢重寫的功能,直接查詢物化視圖中的查詢方案,而不是查詢其表,邏輯IO只有25個,性能相當良好。
假如這時候我們要按季度來查詢統計銷量信息,結果又會是怎樣呢?
Roby@XUE> select time_hierarchy.qtr_yyyy, sum(sales_amount)
2 from sales, time_hierarchy
3 where sales.trans_date = time_hierarchy.day
4 group by time_hierarchy.qtr_yyyy
5 /
QTR_YYYY SUM(SALES_AMOUNT)
------------------------------------------------ -----------------
Q1 FY2006 1.1696E+12
Q1 FY2007 1.2297E+10
Q2 FY2006 1.1281E+12
Q3 FY2006 1.1389E+12
Q4 FY2006 1.1356E+12
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1681 Card=5 Bytes=145)
1 0 SORT (GROUP BY) (Cost=1681 Card=5 Bytes=145)
2 1 NESTED LOOPS (Cost=35 Card=426672 Bytes=12373488)
3 2 TABLE ACCESS (FULL) OF 'SALES' (Cost=35 Card=426672
4 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)
Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
428048 consistent gets
599 physical reads
可以看到查詢將直接查詢基表產生了將近428048個邏輯IO,性能無法滿足需求。
接下我們創建一個Dimension表time_hierarchy_dim,用於提醒優化器time_hierarchy表中的DAY列暗示着MMYYYY,MMYYYY又意味着QTY_YYYY,QTY_YYYY又意味着YYYY。然後我們將重新運行上面那個查詢,看執行計劃發生了怎樣的變更。
Roby@XUE> create dimension time_hierarchy_dim
2 level day is time_hierarchy.day
3 level mmyyyy is time_hierarchy.mmyyyy
4 level qtr_yyyy is time_hierarchy.qtr_yyyy
5 level yyyy is time_hierarchy.yyyy
6 hierarchy time_rollup
7 (
8 day child of
9 mmyyyy child of
10 qtr_yyyy child of
11 yyyy
12 )
13 attribute mmyyyy
14 determines mon_yyyy;
Dimension created.
Roby@XUE> select time_hierarchy.qtr_yyyy, sum(sales_amount)
2 from sales, time_hierarchy
3 where sales.trans_date = time_hierarchy.day
4 group by time_hierarchy.qtr_yyyy
5 /
QTR_YYYY SUM(SALES_AMOUNT)
------------------------------------------------ -----------------
Q1 FY2006 1.1696E+12
Q1 FY2007 1.2297E+10
Q2 FY2006 1.1281E+12
Q3 FY2006 1.1389E+12
Q4 FY2006 1.1356E+12
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=14 Card=5 Bytes=195)
1 0 SORT (GROUP BY) (Cost=14 Card=5 Bytes=195)
2 1 HASH JOIN (Cost=7 Card=1157 Bytes=45123)
3 2 VIEW (Cost=4 Card=46 Bytes=598)
4 3 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)
5 4 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)
6 2 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=327
Statistics
----------------------------------------------------------
193 recursive calls
0 db block gets
49 consistent gets
2 physical reads
可以看到創建Dimension後,Oracle已經能夠智能地理解交易日期中月度和季度的轉換關係,查詢使用到物化視圖,邏輯IO由原來的428048個減少到49個,性能有了大幅的提升。
同樣我們再來統計一下年度的銷量信息:
Roby@XUE> select time_hierarchy.yyyy, sum(sales_amount)
2 from sales, time_hierarchy
3 where sales.trans_date = time_hierarchy.day
4 group by time_hierarchy.yyyy
5 /
YYYY SUM(SALES_AMOUNT)
---------- -----------------
2006 4.5721E+12
2007 1.2297E+10
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=2 Bytes=66)
1 0 SORT (GROUP BY) (Cost=10 Card=2 Bytes=66)
2 1 HASH JOIN (Cost=7 Card=478 Bytes=15774)
我們再創建一張customer_hierarchy表,用於存儲客戶代碼、郵政編碼和地區的關係,然後我們將按不同郵編或地區來查詢各自的月度、季度或者年度銷量信息。
Roby@XUE> create table customer_hierarchy
2 ( cust_id primary key, zip_code, region )
3 organization index
4 as
5 select cust_id,
6 mod( rownum, 6 ) || to_char(mod( rownum, 1000 ), 'fm0000') zip_code,
7 mod( rownum, 6 ) region
8 from ( select distinct cust_id from sales)
9 /
Table created.
Roby@XUE> analyze table customer_hierarchy compute statistics;
Table analyzed.
改寫物化視圖,查詢方案中添加按不同郵編的月度統計銷量。
Roby@XUE> drop materialized view mv_sales;
Materialized view dropped.
Roby@XUE> create materialized view mv_sales
2 build immediate
3 refresh on demand
4 enable query rewrite
5 as
6 select customer_hierarchy.zip_code,
7 time_hierarchy.mmyyyy,
8 sum(sales.sales_amount) sales_amount
9 from sales, time_hierarchy, customer_hierarchy
10 where sales.trans_date = time_hierarchy.day
11 and sales.cust_id = customer_hierarchy.cust_id
12 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy
13 /
Materialized view created.
Roby@XUE> set autotrace traceonly
Roby@XUE> select customer_hierarchy.zip_code,
2 time_hierarchy.mmyyyy,
3 sum(sales.sales_amount) sales_amount
4 from sales, time_hierarchy, customer_hierarchy
5 where sales.trans_date = time_hierarchy.day
6 and sales.cust_id = customer_hierarchy.cust_id
7 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy
8 /
1216 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=409 Bytes=20450)
1 0 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 Bytes=20450)
Statistics
----------------------------------------------------------
28 recursive calls
0 db block gets
116 consistent gets
5 physical reads
可以看到如果按不同郵編、不同月度來統計查詢的話,優化器將會查詢物化視圖中的查詢方案,性能也是比較可觀的。假如我們查不同地區年度的統計銷量信息,結果又會是怎樣?
Roby@XUE> select customer_hierarchy.region,
2 time_hierarchy.yyyy,
3 sum(sales.sales_amount) sales_amount
4 from sales, time_hierarchy, customer_hierarchy
5 where sales.trans_date = time_hierarchy.day
6 and sales.cust_id = customer_hierarchy.cust_id
7 group by customer_hierarchy.region, time_hierarchy.yyyy
8 /
9 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1681 Card=9 Bytes=261)
1 0 SORT (GROUP BY) (Cost=1681 Card=9 Bytes=261)
2 1 NESTED LOOPS (Cost=35 Card=426672 Bytes=12373488)
3 2 NESTED LOOPS (Cost=35 Card=426672 Bytes=8106768)
4 3 TABLE ACCESS (FULL) OF 'SALES' (Cost=35 Card=426672
5 3 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)
6 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_7828' (UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
428047 consistent gets
745 physical reads
可以看到查詢性能大有影響。接下我們同樣創建dimension sales_dimension,用於說明客戶代碼和郵編、地區間的關係:
Roby@XUE> drop dimension time_hierarchy_dim
2 /
Dimension dropped.
Roby@XUE> create dimension sales_dimension
2 level cust_id is customer_hierarchy.cust_id
3 level zip_code is customer_hierarchy.zip_code
4 level region is customer_hierarchy.region
5 level day is time_hierarchy.day
6 level mmyyyy is time_hierarchy.mmyyyy
7 level qtr_yyyy is time_hierarchy.qtr_yyyy
8 level yyyy is time_hierarchy.yyyy
9 hierarchy cust_rollup
10 (
11 cust_id child of
12 zip_code child of
13 region
14 )
15 hierarchy time_rollup
16 (
17 day child of
18 mmyyyy child of
19 qtr_yyyy child of
20 yyyy
21 )
22 attribute mmyyyy
23 determines mon_yyyy;
Dimension created.
再回到原來的查詢,我們可以看到查詢性能有了大幅的提升:
Roby@XUE> set autotrace on
Roby@XUE> select customer_hierarchy.region,
2 time_hierarchy.yyyy,
3 sum(sales.sales_amount) sales_amount
4 from sales, time_hierarchy, customer_hierarchy
5 where sales.trans_date = time_hierarchy.day
6 and sales.cust_id = customer_hierarchy.cust_id
7 group by customer_hierarchy.region, time_hierarchy.yyyy
8 /
REGION YYYY SALES_AMOUNT
---------- ---------- ------------
0 2006 7.3144E+11
0 2007 4484956329
1 2006 7.8448E+11
2 2006 7.7257E+11
2 2007 4684418980
3 2006 7.7088E+11
4 2006 7.8004E+11
4 2007 3127953246
5 2006 7.3273E+11
9 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=15 Card=9 Bytes=576)
1 0 SORT (GROUP BY) (Cost=15 Card=9 Bytes=576)
2 1 HASH JOIN (Cost=10 Card=598 Bytes=38272)
3 2 VIEW (Cost=3 Card=100 Bytes=700)
4 3 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)
5 4 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)
6 2 HASH JOIN (Cost=7 Card=598 Bytes=34086)
7 6 VIEW (Cost=4 Card=19 Bytes=133)
8 7 SORT (UNIQUE) (Cost=4 Card=19 Bytes=133)
9 8 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828'
10 6 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409
Statistics
----------------------------------------------------------
364 recursive calls
0 db block gets
88 consistent gets
0 physical reads
Roby@XUE> set autot trace
Roby@XUE> select customer_hierarchy.region,
2 time_hierarchy.qtr_yyyy,
3 sum(sales.sales_amount) sales_amount
4 from sales, time_hierarchy, customer_hierarchy
5 where sales.trans_date = time_hierarchy.day
6 and sales.cust_id = customer_hierarchy.cust_id
7 group by customer_hierarchy.region, time_hierarchy.qtr_yyyy;
27 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=22 Bytes=154
1 0 SORT (GROUP BY) (Cost=23 Card=22 Bytes=1540)
2 1 HASH JOIN (Cost=11 Card=1447 Bytes=101290)
3 2 VIEW (Cost=3 Card=100 Bytes=700)
4 3 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)
5 4 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE) (
6 2 HASH JOIN (Cost=7 Card=1447 Bytes=91161)
7 6 VIEW (Cost=4 Card=46 Bytes=598)
8 7 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)
9 8 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UN
10 6 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 B
Statistics
----------------------------------------------------------
10 recursive calls
0 db block gets
19 consistent gets
0 physical reads
Roby@XUE> select customer_hierarchy.region,
2 time_hierarchy.mon_yyyy,
3 sum(sales.sales_amount) sales_amount
4 from sales, time_hierarchy, customer_hierarchy
5 where sales.trans_date = time_hierarchy.day
6 and sales.cust_id = customer_hierarchy.cust_id
7 group by customer_hierarchy.region, time_hierarchy.mon_yyyy;
75 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=56 Bytes=386
1 0 SORT (GROUP BY) (Cost=41 Card=56 Bytes=3864)
2 1 HASH JOIN (Cost=11 Card=3775 Bytes=260475)
3 2 VIEW (Cost=4 Card=120 Bytes=1440)
4 3 SORT (UNIQUE) (Cost=4 Card=120 Bytes=1440)
5 4 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_7828' (UNIQ
6 2 HASH JOIN (Cost=6 Card=409 Bytes=23313)
7 6 VIEW (Cost=3 Card=100 Bytes=700)
8 7 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)
9 8 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_7833' (UNIQUE)
10 6 TABLE ACCESS (FULL) OF 'MV_SALES' (Cost=2 Card=409 B
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
14 consistent gets
0 physical reads