Explain Plan with Parallel Processing

First table creation

Let's recreate our Gigabyte sized table and do a full table scan in parallel.

CREATE TABLE T2 (
  row_id INTEGER NOT NULL,
  row_pad VARCHAR2( 1000 )
)
NOLOGGING
PCTFREE 90
NOPARALLEL;
Table created.
Elapsed: 00:00:00.01
BEGIN
  FOR i IN 1..130000
  LOOP
    INSERT INTO t2
    VALUES ( i, RPAD( TO_CHAR(i),1000,'*') );
  END LOOP;
END;
/
PL/SQL procedure successfully completed.
Elapsed: 00:00:48.60

EXECUTE DBMS_STATS.GATHER_TABLE_STATS( -
  ownname => 'TYSON', -
  tabname => 'T2', -
  method_opt => 'FOR ALL COLUMNS SIZE AUTO', -
  degree => 8, -
  cascade => TRUE, -
  estimate_percent => 100 -
);
PL/SQL procedure successfully completed.

Elapsed: 00:00:40.09

 

Checking Table Size

Let's check the size of the table using the user_segments view.

column segment_name format A15
column gig format 99.999

SELECT segment_name,
ROUND( bytes/(1024*1024*1024), 3 ) AS Gig
FROM user_segments
WHERE segment_name = 'T2';

SEGMENT_NAME        GIG
--------------- -------
T2                1.000

You can also get the size of the table from the blocks column of the user_tables view.

SELECT table_name,
ROUND( (blocks*8192)/(1024*1024*1024), 3 ) AS Gig
FROM user_tables
WHERE table_name = 'T2';
TABLE_NAME          GIG
--------------- -------
T2                 .995

In this case the size is the slightly smaller using the blocks columns.  The blocks column gives the number of blocks below the "high water mark" of the table.  These are the blocks that have been used.  They either currently contain data, or the did at some point in the past.  The blocks above the high water mark have never been used.  The "bytes" column in the user_segments view includes the size of all blocks in the segment both below and above the high water mark.

the blocks column in the user_table view is populated by the gather_table_stats() procedure.  As such, it can be an estimate if the sampling of the table is less than 100 percent.  In the gather_table_stats() procedure call for table T2 above, the estimate_percent parameter was set to 100 percent, so we can be confident that the value in the blocks column is exact.

When a table is read by full table scan, all the blocks up to the high water mark are read.  The blocks above the high water mark are not read, so the the blocks column can be a more accurate measure of the amount of data read than the bytes column of the user_segments view.  However, table stats can get stale if data has been loaded into the table since the last time stats were gathered, so the blocks column can get out of date if stats have not be gathered recently.

In the example above, the number of blocks was multiplied by 8192 (8 kilobytes).  This is the block size for the tablespace that table T2 is in.  You can get the block size from the block_size column of the sys.dba_tablespaces view.

SELECT tablespace_name, block_size
FROM sys.dba_tablespaces
WHERE tablespace_name = 'USERS';
TABLESPACE_NAME                BLOCK_SIZE
------------------------------ ----------
USERS                                8192

The tablespace that a table is in can be found from the user_tables view.

SELECT table_name, tablespace_name
FROM user_tables
WHERE table_name = 'T2';
TABLE_NAME      TABLESPACE_NAME
--------------- ---------------------
T2              USERS

You can also get the tablespace_name from the user_segments view.

 

Execution plan for full table scan in serial mode

Now, we look at the execution plan of a full table scan that runs in serial mode.

EXPLAIN PLAN FOR
SELECT count(*) FROM t2;

SELECT * 
FROM TABLE( dbms_xplan.display( NULL, NULL, 'typical', NULL ));

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 | 35374   (1)| 00:07:05 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T2   |   130K| 35374   (1)| 00:07:05 |
-------------------------------------------------------------------

 

Execution plan for full table scan in parallel

Then, we compare that with an execution plan of a full table scan in parallel.

EXPLAIN PLAN FOR
SELECT /*+ PARALLEL(6) */ count(*) FROM t2;

SELECT *
FROM TABLE( dbms_xplan.display( NULL, NULL, 'typical', NULL ));

--------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |  4908   (0)| 00:00:59 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   130K|  4908   (0)| 00:00:59 |  Q1,00 | PCWC |            |
|   6 |       TABLE ACCESS FULL| T2       |   130K|  4908   (0)| 00:00:59 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------

There are two things we notice right away about the parallel execution plan: there are more rows and more columns than a plan without parallel processing.   Your execution plans are going to be longer, wider, and more complicated.

I'll tell you right now that for basic performance tuning, the single most important fact that you get from these execution plans is the verification that your table will be accessed in parallel, if parallel processes are available.  I will describe what the new operations and column values are, but they are not very significant for performance tuning.  You'll be better off spending your limited time checking the selectivity of join and filter conditions and comparing those to the column statistics and available indexes.  That doesn't change with parallel processing.

That your table is being accessed in parallel is important information.  Full table scans are much faster in parallel and the optimizer knows this.  It will be more inclined to choose a full table scan over index access if parallel processing is available.  In my production database, a query needs to select about 0.1 percent of the rows in a table before index access is faster than a full table scan.  That's one row in a thousand.  There's been several times that I created indexes and found that the optimizer didn't choose them or, worse yet, that the queries ran slower when they used the indexes.

Notice that I said "if parallel processes are available" a couple of paragraphs ago.  Just because your query can use parallel processes does not mean that it will get them.  If all available parallel processes are already in use, if the number of "servers busy" is equal to parallel_max_servers, then your query will get no parallel processes and will have to run in serial mode.   It is also possible that your query will get some, but not all, of the parallel processes that it requests.  In that case, the request for parallel processes is said to be "partially downgraded".

Note also that the execution plan does not indicate how many parallel processes your query requested, only that parallel processing will be used if available.

 

Operation "PX BLOCK ITERATOR"

Line 5 of the execution plan has operation "PX BLOCK ITERATOR".   This is the operation that assigns work to the parallel processes.  In this case, they are assigned a set of blocks to read from the table.  This set of blocks is called a granule.  When a parallel process finishes reading a granule, it gets assigned another one if more need to be read.  Otherwise, it waits until all the other parallel processes are done reading their granules.  This disk IO is represented by line 6 with operation, "TABLE ACCESS FULL".

Granules can be a range of blocks, or it can be a partition if the table is partitioned.  Usually, it is a range of blocks, even in a partitioned table.  Only certain operations use partition granules. 

 

Operation "PX SEND QC (RANDOM)"

Line 3 of the execution plan is "PX SEND QC (RANDOM)".  This line takes the result of line 4, SORT AGGREGATE, and sends it to the Query Coordinator.   The sort was done by parallel processes.  When they send their results to the Query Coordinator, the order they send the result is not important.  Hence, it is random.

The "Name" column of the execution plan usually has the names of table and index segments.  In line 3 it has the value ":TQ10000".  The "TQ" stands for Table Queue.  Parallel processes communicate using memory structures known as table queues.  These table queues are numbered because there can be more than one in an execution plan.  

 

Operation "PX COORDINATOR"

Line 2 of the execution plan has the operation "PX COORDINATOR".  This refers to the session that ran the SELECT statement.  It may be a SQLPlus, TOAD, or some other application session.  It is not one of the parallel processes.  You can see it in the v$session view. 

 INST_ID   SID USERNAME   OSUSER               PROGRAM              MODULE     SQL_ID
-------- ----- ---------- -------------------- -------------------- ---------- -------------
       1   126 TYSON      KTQUAD\Kevin Tyson   ORACLE.EXE (P000)    SQL*Plus   63u4adswnn43d
       1   116 TYSON      KTQUAD\Kevin Tyson   ORACLE.EXE (P001)    SQL*Plus   63u4adswnn43d
       1   132 TYSON      KTQUAD\Kevin Tyson   ORACLE.EXE (P002)    SQL*Plus   63u4adswnn43d
       1   112 TYSON      KTQUAD\Kevin Tyson   ORACLE.EXE (P003)    SQL*Plus   63u4adswnn43d
       1   130 TYSON      KTQUAD\Kevin Tyson   ORACLE.EXE (P004)    SQL*Plus   63u4adswnn43d
       1   122 TYSON      KTQUAD\Kevin Tyson   ORACLE.EXE (P005)    SQL*Plus   63u4adswnn43d
       1   134 TYSON      KTQUAD\Kevin Tyson   sqlplusw.exe         SQL*Plus   63u4adswnn43d

The last line with sqlplusw.exe (yes, I'm connecting to an Oracle 11 database with an Oracle 10 client) in the program column is the session that ran the select statement.  It is called the query coordinator.  It spawns the parallel processes, assigns tasks to them, collects the results that they deliver, and sends those results to the client.

In the above execution plan, the query coordinator receives the output of the parallel processes, does another sort aggregate in serial, and then returns the result of the SELECT statement to the application.

 

Column TQ

We have a new column in this execution plan called "TQ".  This stands for Table Queue.  Each set of parallel processes gets a table queue, so this column indicates which set of parallel processes implement the activity specified in the "Operation" column.  In our first execution plan above, every operation executed in parallel has the same table queue, "T1,00".   This means that all four operations executed in parallel are implemented by the same set of parallel processes.  SQL statements can have more than one set of parallel processes. In this case, the number after the "T1," will be incremented for each new set of parallel processes. 

SQL statements can have a dozen or more sets of parallel processes.  However, only two sets of parallel processes can execute at any one time.  These two sets are called senders and receivers.  The senders communicate with the receivers through table queues.  Each set of parallel processes can have a number of sessions equal to the degree of parallelism specified for the SQL statement.  Since two sets can execute at the same time, a SQL statement can have a number of sessions equal to two times the degree.

The lines in the execution plan without a set of parallel processes in the TQ column are not executed in parallel.

 

Column In-Out

The IN-Out column can take on the following values:

  • PCWP: Parallel Combine with Parent. 
  • PCWC: Parallel Combine wth Child
  • P->S: Parallel to Serial
  • S->P: Serial to Parallel
  • P->P: Parallel to Parallel

The values PCWP and  PCWC indicate intra-operation parallelism.  For every PCWP there will be a PCWC. These pairs of operations are executed by the same set of parallel processes.                   

The value P->P indicates inter-operation parallelism.  This communication will between two different pairs of parallel processes.   The value P->S always occurs in a SQL statement when the parallel processes send their data to the query coordinator.  The value S->P occurs when a table is read in serial mode, such as a small table, and then its data is sent to an operation that is executed in parallel.

 

Column PQ Distrib

The inter-operation parallel communications can  use different methods.  Each operation with one of the three inter-operation values in the In-Out column will also have a value in the PQ Distrib column, where Distrib stands for Distribution.  This communication takes place between a set of sender processes and a set of receiver processes.  Mind you that a set will be one process for an operation that runs in serial.  The following methods are used by senders to distribute data among the receivers when they send it.

  • BROADCAST: Each producer sends every row to each of the consumers.
  • ROUND-ROBIN: producers send rows one at a time to the consumers as if they are in a circle.
  • RANGE: Producers send rows in a different range to different consumers.  This is useful for an ORDER BY operation.
  • HASH: A hash function is used to determine which consumer a producer sends a row to.  This is useful for hash joins.
  • QC (RAND): All producers send their rows to the query coordinator.  Order is not important, so it is called random.
  • QC (ORDER): All producers send their rows to the query coordinator.  Order is important.  For instance, the data may have already been sorted.
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章