oracle 並行原理

先來看看oralce官方文檔的解釋吧

Parallel execution enables the application of multiple CPU and I/O resources to the execution of a single database operation. It dramatically reduces response time for data-intensive operations on large databases typically associated with a decision support system (DSS) and data warehouses. You can also implement parallel execution on an online transaction processing (OLTP) system for batch processing or schema maintenance operations such as index creation. Parallel execution is sometimes called parallelism. Parallelism is the idea of breaking down a task so that, instead of one process doing all of the work in a query, many processes do part of the work at the same time.An example of this is when four processes combine to calculate the total sales for a year, each process handles one quarter of the year instead of a single process handling all four quarters by itself. The improvement in performance can be quite significant. Parallel execution improves processing for:

    Queries requiring large table scans, joins, or partitioned index scans       

    Creation of large indexes        

    Creation of large tables (including materialized views)        

    Bulk insertions, updates, merges, and deletions       

解釋一下吧,並行執行就是使用多個cpu和I/O資源去完成一個數據庫操作,看打標記的那句話,並行是將一個任務打碎,讓很多進程去執行原來應該有一個進程完成的動作。使用並行操作可以減少響應時間,但是這個和你的系統資源息息相關,如果系統資源缺乏,是用並行效果會跟差,並且增加資源的消耗。

oracle並行執行的機制

  當cbo判斷一個會話的使用了並行,oracle會將server process 轉換爲一個並行協調進程,Oracle啓動時候,oracle使用默認參數parallel_min_servers來確定預先創建的slave process數,如果需要的slave process超出了oracle剛開始的創建的process,則並行協調進程將創建額外的slave process。然後並行協調進程將要處理的對象打碎,分給slave process處理,處理完成之後再彙總給server process,由server process將數據進行處理並返回給客戶。

  我們來看一個圖:如圖1

  


    Oracle 使用了並行度爲2來執行圖中的sql,那麼oracle使用了兩個slave process  p1,p2  來掃描customer這張表,掃描完成後,Oracle又啓動了兩個進程p3,p4,然後p1,p2 進程將掃描的數據分別傳到對應的p3,p4進程中,由p3,p4進程執行group by操作。執行完成以後p3,p4進程,將數據送到p1,p2進程(因爲掃描完數據後,p1,p2進程已經空閒,所以oracle沒有啓動新的進程),然後進行order by操作,最後將數據送到協調進程返回給用戶。


前面就是一個並行執行的典型例子,但是並行進程之間的交互是怎麼進行的了,oracle官方文檔中是如下描述的:

 To execute a query in parallel, Oracle Database generally creates a set of producer parallel execution servers and a set of consumer parallel execution servers. The producer server retrieves rows from tables and the consumer server performs operations such as join, sort, DML, and DDL on these rows. Each server in the producer set has a connection to each server in the consumer set. The number of virtual connections between parallel execution servers increases as the square of the degree of parallelism.

Each communication channel has at least one, and sometimes up to four memory buffers, which are allocated from the shared pool. Multiple memory buffers facilitateasynchronous communication among the parallel execution servers.

A single-instance environment uses at most three buffers for each communication channel. An Oracle Real Application Clusters environment uses at most four buffers for each channel.Figure 8-3illustrates message buffers and how producer parallel execution servers connect to consumer parallel execution servers.

 

Figure 8-3 Parallel Execution Server Connections and Buffers



When a connection is between two processes on the same instance, the servers communicate by passing the buffers back and forth in memory (in the shared pool). When the connection is between processes in different instances, the messages are sent using external high-speed network protocols over the interconnect. InFigure 8-3, the DOP equals the number of parallel execution servers, which in this case is n.Figure 8-3does not show the parallel execution coordinator. Each parallel execution server actually has an additional connection to the parallel execution coordinator. It is important to size the shared pool adequately when using parallel execution. If there is not enough free space in the shared pool to allocate the necessary memory buffers for a parallel server, it fails to start.

讀懂並行執行計劃


   先來看operation這一行,可以看到出現了PX,表示Oracle使用了並行了。根據執行計劃的讀法以及上面瞭解的並行的執行過程,可以對執行計劃做如下解讀:

       1.  首先對t1表進行了全表掃描,但是此時不是一個進程進行掃描的,PX BLOCK ITERATOR表示slave process以迭代的方式掃描數據塊。

       2. 接下來執行PX SEND RANGE ,表示oracle將掃描的結果推送到下一組進程。

       3. 接下來oracle下一組進程開始接受數據(PX RECEIVE),並且並行進行排序。

       4. 然後oracle將排序好的數據send到server process(PX SEND QC (ORDER)),然後server process 將數據返回給用戶。

那麼我們在解釋以下in-out這一列的意思:

P->S (Parallel to Serial):表示一個並行操作發送數據給一個串行操作,通常是並行incheng將數據發送給並行調度進程。

P->P (Parallel to Parallel):表示一個並行操作向另一個並行操作發送數據,比如兩個從屬進程之間的數據交流.。

PCWP (Parallel Combined with parent):相同slave process並行執行一個操作及其父操作,無通訊。

PCWC (Parallel Combined with Child) :相同slave process並行執行一個操作及其子操作,無通訊。

這個地方PCWC和PCWP比較難理解,對着執行計劃理解一下:

PCWC :TABLEACCESSFULL是PX BLOCK ITERATOR 的子進程,所以這個表示這兩個操作是相同process完成的。

PCWPPX RECEIVE 是SORTORDERBY 的父進程 ,所以這個表示這兩個操作是相同process完成的。

這個是個人的理解,有什麼不對的,請指點。

接下來來看看如何在進程分發數據的方式:

range:生產者將執定範圍的記錄發給不同的消費者,會應用動態範圍分區決定哪條記錄給哪個消費者(對於orde by操作根據order by子句中字段range分區)

loop:記錄會被平均分給每個消費者(即生產者每loop一次給一個消費者發一條記錄)。

hash:生產者用hash函數發送數據給消費者,動態應用hash分區來決定哪條記錄給哪個消費者(對於group by根據group by子句使用的字段進行hash )

qc隨機:每個生產者將所有記錄發給query coordinator(隨機),這是常用方法。

qc順序:每個生產者將所有記錄發給query coordinator(順序很重要),並行orderby用這個給query coordinator(server process)發送數據。

oracle 11g中和並行有關的初始化參數

PARALLEL_ADAPTIVE_MULTI_USER:默認值true,根據oracle的負載情況來動態調整sql的並行度。

PARALLEL_DEGREE_LIMIT:默認值 CPU_COUNTXPARALLEL_THREADS_PER_CPUX number of instances available,當oracle使用了自動調整並行度,則它表示oracle能使用的最大並行度。

PARALLEL_DEGREE_POLICY:默認值MANUAL,oracle使用該參數來啓動自動調整並行度。

PARALLEL_EXECUTION_MESSAGE_SIZE:默認值16 KB。Specifies the size of the buffers used by the parallel execution servers to communicate among themselves and with the query coordinator. These buffers are allocated out of the shared pool.

PARALLEL_MAX_SERVERS:oracle 11g下默認是80個,這個參數定義了oracle所能使用的最大並行進程,當數據庫實例啓動的進程不夠時,Oracle能夠啓動的最大進程數不能超過這個數目。

PARALLEL_MIN_SERVERS:默認值爲0,這個參數定義了oracle實例啓動時,啓動的並行進程的數目。


當然還有一些其它的參數,請查看oracle官方文檔,下面是我本地數據庫一個默認參數的配置:

SQL> show parameter parallel         

NAME                                 TYPE        VALUE         
------------------------------------ ----------- ------------------------------         
fast_start_parallel_rollback         string      LOW         
parallel_adaptive_multi_user         boolean     TRUE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU         
parallel_degree_policy               string      MANUAL         
parallel_execution_message_size      integer     16384         
parallel_force_local                 boolean     FALSE
parallel_instance_group              string         
parallel_io_cap_enabled              boolean     FALSE
parallel_max_servers                 integer     80         
parallel_min_percent                 integer     0         

NAME                                 TYPE        VALUE         
------------------------------------ ----------- ------------------------------         
parallel_min_servers                 integer     0         
parallel_min_time_threshold          string      AUTO         
parallel_server                      boolean     FALSE
parallel_server_instances            integer     1         
parallel_servers_target              integer     32         
parallel_threads_per_cpu             integer     2         
recovery_parallelism                 integer     0

這裏重點了解一下一個Oracle 11g新增的參數PARALLEL_DEGREE_POLICY,讓oracle可以自動根據系統資源來調整Oracle的並行度。

這個參數有三個值: limited,autoMANUAL

我們先來做以下操作:


SQL> create table t1 as select * from dba_objects;         

表已創建。         
       
SQL> create table t2 as select * from dba_objects;         

表已創建。         

SQL> alter table t1 parallel 4;         

表已更改。         

SQL> alter table t2 parallel(degree default);         

表已更改。         

SQL> select table_name,degree from user_tables where table_name in('T1','T2');         

TABLE_NAME                     DEGREE         
------------------------------ --------------------         
T1                                      4         
T2                                DEFAULT
我們先來看當參數是使用默認值的時候,對並行是怎麼處理的。
SQL> select count(*) from t1;         

  COUNT(*)         
----------         
     75446         

SQL> select * from v$pq_sesstat where statistic='Allocation Height';         

STATISTIC                      LAST_QUERY SESSION_TOTAL         
------------------------------ ---------- -------------         
Allocation Height                       4             0         

SQL> select count(*) from t2;         

  COUNT(*)         
----------         
     75447         

SQL> select * from v$pq_sesstat where statistic='Allocation Height';         

STATISTIC                      LAST_QUERY SESSION_TOTAL         
------------------------------ ---------- -------------         
Allocation Height                       8             0

可以看到,當使用默認值的時候 ,oracle 不會去自動的調整並行度,完全是按照用戶的設置的並行度去處理的。

那麼當參數值爲limited時,Oracle又會如何處理了

SQL> alter session set parallel_degree_policy=limited;       

會話已更改。       

SQL> select count(*) from t1;       

  COUNT(*)       
----------       
     75446       

SQL> select * from v$pq_sesstat where statistic='Allocation Height';       

STATISTIC                      LAST_QUERY SESSION_TOTAL       
------------------------------ ---------- -------------       
Allocation Height                       4             0       

SQL> select count(*) from t2;       

  COUNT(*)       
----------       
     75447       

SQL> select * from v$pq_sesstat where statistic='Allocation Height';       

STATISTIC                      LAST_QUERY SESSION_TOTAL       
------------------------------ ---------- -------------       
Allocation Height                       0             0

可以看到,當爲limited的時候oracle會對並行度爲default的進行調整,但是對已經設定好的不會調整,那麼現在我們就可以猜到,auto肯定是會對兩個都調整了。看下面,oracle會對所有的都會調整。

SQL> alter session set parallel_degree_policy=auto;       

會話已更改。       

SQL> select count(*) from t1;       

  COUNT(*)       
----------       
     75446       

SQL> select * from v$pq_sesstat where statistic='Allocation Height';       

STATISTIC                      LAST_QUERY SESSION_TOTAL       
------------------------------ ---------- -------------       
Allocation Height                       0             0       

SQL> select count(*) from t2;       

  COUNT(*)       
----------       
     75447       

SQL> select * from v$pq_sesstat where statistic='Allocation Height';       

STATISTIC                      LAST_QUERY SESSION_TOTAL       
------------------------------ ---------- -------------       
Allocation Height                       0             0

可以使用並行執行的操作

Access methods:Some examples are table scans, index fast full scans, and partitioned index range scans.

Join methods:Some examples are nested loop, sort merge, hash, and star transformation.

DDL statements:

Some examples are CREATE TABLE AS SELECT,CREATE INDEX,REBUILD INDEX,REBUILD INDEX PARTITION, and MOVE/SPLIT/COALESCEP ARTITION.

You can typically use parallel DDL where you use regular DDL. There are, however, some additional details to consider when designing your database.One important restriction is that parallel DDL cannot be used on tables with object or LOB columns.(注意這一點,並行不能被使用在object或者lob字段上)

All of these DDL operations can be performed inNOLOGGINGmode for either parallel or serial execution.

TheCREATETABLEstatement for an index-organized table can be run with parallel execution either with or without anAS SELECT clause.

Different parallelism is used for different operations. Parallel CREATE (partitioned) TABLE ASS ELECT and parallel CREATE (partitioned) INDEX statements run with a degree of parallelism (DOP) equal to the number of partitions.(關於這一點我們會單獨去做一下)。

DML statements: 

Parallel query:

Miscellaneous SQL operations:

  Some examples are GROUP BY,NOT IN,SELECT DISTINCT,UNION,UNION ALL,CUBE, and ROLLUP, plus aggregate and table functions.

SQL*Loader

並行查詢:

一個查詢能夠並行執行,需要滿足以下條件: 

 SQL 語句中有 Hint 提示,比如 parallell 或者 PARALLEL_INDEX 。​ 

 SQL 語句中引用的對象被設置了並行屬性。

多表關聯中 , 至少有一個表執行全表掃描 ( full table scan ) 或者跨越分區的 INDEX RANGE SACN 。

並行ddl:

並行DDL依賴於直接路徑操作。也就是說,數據不傳遞到緩衝區緩存以便以後寫出;而是由一個操作(如CREATE TABLE AS SELECT)來創建新的區段,並直接寫入這些區段,數據直接從查詢寫到磁盤(放在這些新分配的區段中)。所以並行在表空間中容易造成空間碎片,在字典管理時代,會造成空間浪費。但是在本地表空間的管理中,兩種不同的區分配方式會有不同的結果,相對並行來說,oracle更傾向與使用自動區分配。

以下ddl可以使用並行執行

CREATE INDEX

CREATE TABLE ... AS SELECT

ALTER INDEX ... REBUILD

ALTER TABLE ... [MOVE|SPLIT|COALESCE] PARTITION

ALTER INDEX ... [REBUILD|SPLITPARTITION 

SQL> create index idx_tt1 on t1(object_id) parallel 4;   

索引已創建。
 

SQL ID: aq91k6zr8au5q   
Plan Hash: 1439620960   
create index idx_tt1 on t1(object_id) parallel 4   


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------   
Parse        1      0.01       0.34          2         21          0           0   
Execute      1      0.01       1.46         23          9       1044           0   
Fetch        0      0.00       0.00          0          0          0           0   
------- ------  -------- ---------- ---------- ---------- ----------  ----------   
total        2      0.03       1.81         25         30       1044           0   

Misses in library cache during parse: 1   
Optimizer mode: ALL_ROWS   
Parsing user id: 84     

Rows     Row Source Operation   
-------  ---------------------------------------------------   
      4  PX COORDINATOR  (cr=5 pr=0 pw=0 time=51 us)   
      0   PX SEND QC (ORDER) :TQ10001 (cr=0 pr=0 pw=0 time=0 us)   
      0    INDEX BUILD NON UNIQUE IDX_TT1 (cr=0 pr=0 pw=0 time=0 us)(object id 0)   
      0     SORT CREATE INDEX (cr=0 pr=0 pw=0 time=0 us)   
      0      PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us cost=83 size=1165905 card=89685)   
      0       PX SEND RANGE :TQ10000 (cr=0 pr=0 pw=0 time=0 us cost=83 size=1165905 card=89685)   
      0        PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us cost=83 size=1165905 card=89685)   
      0         TABLE ACCESS FULL T1 (cr=0 pr=0 pw=0 time=0 us cost=83 size=1165905 card=89685)

ctas使用並行:
 


 

上面這個例子是Oracle官方文檔提供的,oracle先去並行掃描源表,然後再去並行的創建目標表。

並行dml

 要是並行dml必須顯式的指定:

 

SQL> alter session enable parallel dml;  
  
會話已更改。

對於這個oracle官方文檔做了如下的解釋:

This mode is required because parallel DML and serial DML have different locking, transaction, and disk space requirements and parallel DML is disabled for a session by default.


UPDATE, MERGE, and DELETE操作只有當表爲分區表時,纔會啓用並行操作。


只有對insert。。。select並行纔有意義,語法如下:insert /*+parallel(t,4)*/ select /*+parallel(t1,4) from t1.對insert和select部分可以單獨執行並行,它們之間是獨立的。



關於並行的總結如上


參照:oracle官方文檔

《oracle 10g性能分析及優化》




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章