query rewrite的意思和dimension配合物化視圖的巨大作用
今天看concept,看到dimension時候,看的不太懂,後來去網上百度了一下,在asktom上看到了實驗,就去實際做了一下,做後,終於明白了之前看到
的query rewrite的意思和dimension配合物化視圖的巨大作用,可以很大程度的提高查詢的性能,在數據倉庫中,應用很顯著。
首先介紹一下dimension(維度),下面是concept中的概念:
A dimension table is a logical structure that defines hierarchical (parent/child) relationships between pairs of columns or column sets. For example, a dimension can indicate that within a row the city column implies the value of the state column, the state column implies the value of the countrycolumn, and so on.
簡而言之,就是維度只是一個邏輯結構,主要有三個重要屬性,第一level,用於定義一個或者一組列爲一個整體;第二hierarchy,定義各個level之間的層次關係;第三attribute,定義level和某個列的1:1關係
維度使用的場所是,當你打開物化視圖的查詢重寫(query rewrite)時,對於包含聚合函數的SQL,可以重新定位到對應的物化視圖,而物化視圖裏面保存的已經有的數據,這樣就可以提供查詢性能,而當有時
不滿足物化視圖的查詢條件時,比如物化視圖裏面定義的月的聚合情況,而查詢條件爲查詢季的聚合情況時,就不走物化視圖,這樣性能大大減弱,而維度,就是用來解決這個問題的,他可以表示日、月、季、年
等等之間的層次關係,這樣雖然查詢的是季的情況,他可以通過月的情況,得到季的情況。
下面用實驗說話:
先創建一張1000多萬的代表:
SQL> desc sales;
Name Null? Type
----------------------------------------------------------------------------------- -------- --------------------------------------------------------
TRANS_DATE DATE
CUST_ID NUMBER(38)
SALES_AMOUNT NUMBER
SQL> select count(*) from sales;
COUNT(*)
----------
14680064
創建索引組織表,用於保存日、月、季、年之間的關係:
create table time_hierarchy(day primary,monthy,qtr_yyyy,year) organization index as select distinct to_char(trans_date,'yyyy-mm-dd'),to_char(trans_date,'yyyy-mm'),to_char(trans_date,'Q'),to_char(trans_date,'yyyy') from sales;
創建物化視圖,用於存儲每個客戶對應每個月的銷售情況:
create materialized view mv_sales
build immediate
refresh on demand
enable query rewrite
as
select sales.cust_id,time_hierarchy.monthy,sum(sales.sales_amount)
from sales,time_hierarchy
where to_char(sales.trans_date,'yyyy-mm-dd')=time_hierarchy.day
group by sales.cust_id,time_hierarchy.monthy
對基表進行分析,以使優化器能夠優化物化視圖的查詢重寫功能:
analyze table sales compute statistics;
analyze table time_hierarchy compute statistics;
設置會話的查詢重寫功能:
alter session set query_rewrite_enabled=true;
alter session set query_rewrite_integrity=trusted;
按月統計銷售量:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.monthy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.monthy
SQL> /
MONTHY SUM(SALES_AMOUNT)
--------------------- -----------------
1981-12 4141875200
1987-04 3145728000
1981-05 2988441600
1982-01 1363148800
1981-09 2883584000
1987-05 1153433600
1981-02 2988441600
1981-11 5242880000
1981-04 3119513600
1980-12 838860800
1981-06 2569011200
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3566649941
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 252 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 14 | 252 | 4 (25)| 00:00:01 |
| 2 | MAT_VIEW REWRITE ACCESS FULL| MV_SALES | 14 | 252 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
6 physical reads
0 redo size
835 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed
可以看到,查詢優化器走了物化視圖,而沒有走基表,consistent gets也只有3,查詢性能十分快
如果不按月查詢,而按季查詢銷售量,結果如下:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.qtr_yyyy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.qtr_yyyy
SQL> /
QTR SUM(SALES_AMOUNT)
--- -----------------
1 4351590400
3 2883584000
4 1.0224E+10
2 1.2976E+10
Execution Plan
----------------------------------------------------------
Plan hash value: 3402703070
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 84 | 10095 (10)| 00:02:02 |
| 1 | HASH GROUP BY | | 4 | 84 | 10095 (10)| 00:02:02 |
|* 2 | HASH JOIN | | 14M| 294M| 9329 (3)| 00:01:52 |
| 3 | INDEX FULL SCAN | SYS_IOT_TOP_58953 | 13 | 143 | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| SALES | 14M| 140M| 9256 (2)| 00:01:52 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TIME_HIERARCHY"."DAY"=TO_CHAR(INTERNAL_FUNCTION("SALES"."TRANS_DAT
E"),'yyyy-mm-dd'))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
41368 consistent gets
41362 physical reads
0 redo size
683 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
可以看到走的是基表,沒走物化視圖,並且consistent gets爲41368,查的時候很慢
下面建立dimension,用來表示日、月、季、年之間的層次關係,這樣就可以使用query rewrite功能了
下面是建立維護:
create dimension time_hierarchy_dim
level day is time_hierarchy.day
level monthy is time_hierarchy.monthy
level qtr_yyyy is time_hierarchy.qtr_yyyy
level year is time_hierarchy.year
hierarchy time_rollup
(
day child of
monthy child of
qtr_yyyy child of
year
)
attribute monthy
determines monthy;
再次按季查詢銷售量,結果如下:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.qtr_yyyy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.qtr_yyyy
SQL> /
QTR SUM(SALES_AMOUNT)
--- -----------------
1 4351590400
3 2883584000
4 1.0224E+10
2 1.2976E+10
Execution Plan
----------------------------------------------------------
Plan hash value: 1315230953
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 42 | 6 (34)| 00:00:01 |
| 1 | HASH GROUP BY | | 3 | 42 | 6 (34)| 00:00:01 |
| 2 | VIEW | | 3 | 42 | 6 (34)| 00:00:01 |
| 3 | HASH UNIQUE | | 3 | 114 | 6 (34)| 00:00:01 |
|* 4 | HASH JOIN | | 17 | 646 | 5 (20)| 00:00:01 |
| 5 | INDEX FULL SCAN | SYS_IOT_TOP_58953 | 13 | 104 | 1 (0)| 00:00:01 |
| 6 | MAT_VIEW REWRITE ACCESS FULL| MV_SALES | 14 | 420 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("MONTHY"="MV_SALES"."MONTHY")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
683 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
可以看到再次走了物化視圖,並且邏輯讀減少到只有4,這樣的性能提高特別明顯。
的query rewrite的意思和dimension配合物化視圖的巨大作用,可以很大程度的提高查詢的性能,在數據倉庫中,應用很顯著。
首先介紹一下dimension(維度),下面是concept中的概念:
A dimension table is a logical structure that defines hierarchical (parent/child) relationships between pairs of columns or column sets. For example, a dimension can indicate that within a row the city column implies the value of the state column, the state column implies the value of the countrycolumn, and so on.
簡而言之,就是維度只是一個邏輯結構,主要有三個重要屬性,第一level,用於定義一個或者一組列爲一個整體;第二hierarchy,定義各個level之間的層次關係;第三attribute,定義level和某個列的1:1關係
維度使用的場所是,當你打開物化視圖的查詢重寫(query rewrite)時,對於包含聚合函數的SQL,可以重新定位到對應的物化視圖,而物化視圖裏面保存的已經有的數據,這樣就可以提供查詢性能,而當有時
不滿足物化視圖的查詢條件時,比如物化視圖裏面定義的月的聚合情況,而查詢條件爲查詢季的聚合情況時,就不走物化視圖,這樣性能大大減弱,而維度,就是用來解決這個問題的,他可以表示日、月、季、年
等等之間的層次關係,這樣雖然查詢的是季的情況,他可以通過月的情況,得到季的情況。
下面用實驗說話:
先創建一張1000多萬的代表:
SQL> desc sales;
Name Null? Type
----------------------------------------------------------------------------------- -------- --------------------------------------------------------
TRANS_DATE DATE
CUST_ID NUMBER(38)
SALES_AMOUNT NUMBER
SQL> select count(*) from sales;
COUNT(*)
----------
14680064
創建索引組織表,用於保存日、月、季、年之間的關係:
create table time_hierarchy(day primary,monthy,qtr_yyyy,year) organization index as select distinct to_char(trans_date,'yyyy-mm-dd'),to_char(trans_date,'yyyy-mm'),to_char(trans_date,'Q'),to_char(trans_date,'yyyy') from sales;
創建物化視圖,用於存儲每個客戶對應每個月的銷售情況:
create materialized view mv_sales
build immediate
refresh on demand
enable query rewrite
as
select sales.cust_id,time_hierarchy.monthy,sum(sales.sales_amount)
from sales,time_hierarchy
where to_char(sales.trans_date,'yyyy-mm-dd')=time_hierarchy.day
group by sales.cust_id,time_hierarchy.monthy
對基表進行分析,以使優化器能夠優化物化視圖的查詢重寫功能:
analyze table sales compute statistics;
analyze table time_hierarchy compute statistics;
設置會話的查詢重寫功能:
alter session set query_rewrite_enabled=true;
alter session set query_rewrite_integrity=trusted;
按月統計銷售量:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.monthy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.monthy
SQL> /
MONTHY SUM(SALES_AMOUNT)
--------------------- -----------------
1981-12 4141875200
1987-04 3145728000
1981-05 2988441600
1982-01 1363148800
1981-09 2883584000
1987-05 1153433600
1981-02 2988441600
1981-11 5242880000
1981-04 3119513600
1980-12 838860800
1981-06 2569011200
11 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3566649941
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 252 | 4 (25)| 00:00:01 |
| 1 | HASH GROUP BY | | 14 | 252 | 4 (25)| 00:00:01 |
| 2 | MAT_VIEW REWRITE ACCESS FULL| MV_SALES | 14 | 252 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
6 physical reads
0 redo size
835 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
11 rows processed
可以看到,查詢優化器走了物化視圖,而沒有走基表,consistent gets也只有3,查詢性能十分快
如果不按月查詢,而按季查詢銷售量,結果如下:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.qtr_yyyy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.qtr_yyyy
SQL> /
QTR SUM(SALES_AMOUNT)
--- -----------------
1 4351590400
3 2883584000
4 1.0224E+10
2 1.2976E+10
Execution Plan
----------------------------------------------------------
Plan hash value: 3402703070
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 84 | 10095 (10)| 00:02:02 |
| 1 | HASH GROUP BY | | 4 | 84 | 10095 (10)| 00:02:02 |
|* 2 | HASH JOIN | | 14M| 294M| 9329 (3)| 00:01:52 |
| 3 | INDEX FULL SCAN | SYS_IOT_TOP_58953 | 13 | 143 | 1 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| SALES | 14M| 140M| 9256 (2)| 00:01:52 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("TIME_HIERARCHY"."DAY"=TO_CHAR(INTERNAL_FUNCTION("SALES"."TRANS_DAT
E"),'yyyy-mm-dd'))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
41368 consistent gets
41362 physical reads
0 redo size
683 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
可以看到走的是基表,沒走物化視圖,並且consistent gets爲41368,查的時候很慢
下面建立dimension,用來表示日、月、季、年之間的層次關係,這樣就可以使用query rewrite功能了
下面是建立維護:
create dimension time_hierarchy_dim
level day is time_hierarchy.day
level monthy is time_hierarchy.monthy
level qtr_yyyy is time_hierarchy.qtr_yyyy
level year is time_hierarchy.year
hierarchy time_rollup
(
day child of
monthy child of
qtr_yyyy child of
year
)
attribute monthy
determines monthy;
再次按季查詢銷售量,結果如下:
SQL> edit
Wrote file afiedt.buf
1 select time_hierarchy.qtr_yyyy,sum(sales_amount) from scott.sales,scott.time_hierarchy
2 where to_char(scott.sales.trans_date,'yyyy-mm-dd')=scott.time_hierarchy.day
3* group by scott.time_hierarchy.qtr_yyyy
SQL> /
QTR SUM(SALES_AMOUNT)
--- -----------------
1 4351590400
3 2883584000
4 1.0224E+10
2 1.2976E+10
Execution Plan
----------------------------------------------------------
Plan hash value: 1315230953
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 42 | 6 (34)| 00:00:01 |
| 1 | HASH GROUP BY | | 3 | 42 | 6 (34)| 00:00:01 |
| 2 | VIEW | | 3 | 42 | 6 (34)| 00:00:01 |
| 3 | HASH UNIQUE | | 3 | 114 | 6 (34)| 00:00:01 |
|* 4 | HASH JOIN | | 17 | 646 | 5 (20)| 00:00:01 |
| 5 | INDEX FULL SCAN | SYS_IOT_TOP_58953 | 13 | 104 | 1 (0)| 00:00:01 |
| 6 | MAT_VIEW REWRITE ACCESS FULL| MV_SALES | 14 | 420 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("MONTHY"="MV_SALES"."MONTHY")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
683 bytes sent via SQL*Net to client
469 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4 rows processed
可以看到再次走了物化視圖,並且邏輯讀減少到只有4,這樣的性能提高特別明顯。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.