In this Document
Purpose |
Details |
Concepts |
Symptoms |
Modes of Transaction Recovery |
Parallel Transaction Recovery |
Serial Transaction Recovery |
Detection of Transaction Recovery |
Identify Dead Transactions and their Sizes |
Identify undo segments containing dead transactions |
Identify the Time for transaction recovery to complete |
Query to identify the number of parallel Recovery Slaves |
Identify Objects involved in Transaction Recovery |
Data Dictionary Tables |
Dump the UNDO block |
Disable Transaction Recovery |
Suggestions |
Information to collect for Oracle Support |
Additional Resources |
References |
Applies to:
Oracle Server - Enterprise Edition - Version 9.2.0.8 and laterInformation in this document applies to any platform.
Purpose
To discuss the concepts, modes and detection of Database Transaction recovery.
Details
Concepts
Transaction recovery involves rolling back all uncommitted transactions of a failed instance. These are "in-progress" transactions that did not commit and that Oracle needs to undo. It is possible for uncommitted transactions to get saved to disk. In this case, Oracle uses undo data to reverse the effects of any changes that were written to the datafiles but not yet committed.
Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the crash or introduced by redo application during cache recovery. This normally happens during the Roll backward phase when the DB is restarted. Transaction Recovery can be performed by either the Server process which initiated the Transaction or the SMON process (in case the Server process is dead).
SMON process takes over the recovery when
- Server process is dead / crashed.
- Instance itself is crashed
Undo blocks (whether in rollback segments or automatic undo tablespaces) record database actions that may need to be undone during certain database operations. In database recovery, the undo blocks roll back the effects of uncommitted transactions previously applied by the rolling forward phase.
After the roll forward, any changes that were not committed must be undone. Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the failure or introduced by redo application during cache recovery. This process is called rolling back or transaction recovery.
Fig 1: Basic Recovery Steps: Rolling Forward and Rolling Back
Symptoms
- High CPU Utilization by SMON process
- Database may hang during large transaction recovery.
- If Database is shutdown abort, then the database may hang during consequent startup.
- Database repeatedly crashes while open.
- Database internal errors which could result in transaction recovery failure.
Modes of Transaction Recovery
Parallel Transaction Recovery
Recovery occurs in parallel mode. Several parallel slave processes will be spawned and will be involved in recovery. This is also termed as Fast Start Parallel Rollback. The background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes. To enable Parallel recovery mode, set the parameter FAST_START_PARALLEL_ROLLBACK to LOW / HIGH.
Values:
- FALSE - Parallel rollback is disabled
- LOW - Limits the maximum degree of parallelism to 2 * CPU_COUNT
- HIGH -Limits the maximum degree of parallelism to 4 * CPU_COUNT
If you change the value of this parameter, then transaction recovery will be stopped and restarted with the new implied degree of parallelism. Fast-start parallel rollback is mainly useful when a system has transactions that run a long time before committing, especially parallel INSERT, UPDATE, and DELETE operations. When SMON discovers that the amount of recovery work is above a certain threshold, it automatically begins parallel rollback by dispersing the work among several parallel processes: process 1 rolls back one transaction, process 2 rolls back a second transaction, and so on. The threshold is the point at which parallel recovery becomes cost-effective, in other words, when parallel recovery takes less time than serial recovery.
Oracle® Database Reference 11g Release 2 (11.2)
Part Number E25513-02
http://docs.oracle.com/cd/E11882_01/server.112/e25513/initparams089.htm#REFRN10059
Serial Transaction Recovery
Detection of Transaction Recovery
Identify Dead Transactions and their Sizes
select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%'
order by ktuxesiz asc;
===========
ktuxeusn – Undo Segment Number
ktuxeslt – Slot number
ktuxesqn – Sequence
ktuxesta – State
ktuxesiz – Undo Blocks Remaining
ktuxecfl – Flag
===========
USN SLOT SEQ STATE UNDO
------- ---------- ------- ---------- ----------
9 9 335 ACTIVE 10337
Identify undo segments containing dead transactions
select useg.segment_name, useg.segment_id, useg.tablespace_name, useg.status
from dba_rollback_segs useg
where useg.segment_id in (select unique ktuxeusn
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%');
SEGMENT_NAME SEGMENT_ID TABLESPACE_NAME STATUS
-------------------- ----------------- --------------------------- ------------
_SYSSMU9$ 9 UNDOTBS1 ONLINE
select usn,name
from v$rollname
where usn in (select unique ktuxeusn
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%');
Identify the Time for transaction recovery to complete
select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;
USN STATE Total Done ToDo Estimated time to Complete
-------- ---------------- -------- -------- -------- ---------------------------------
5 RECOVERING 16207 14693 1514 11-may-2012 08:05:40
4 RECOVERED 7453 7453 0 11-may-2012 08:05:16
9 RECOVERED 10337 10337 0 11-may-2012 08:05:16
Query to identify the number of parallel Recovery Slaves
select * from v$fast_start_servers;
STATE UNDOBLOCKSDONE PID XID
----------- -------------------------- ---------- ----------------
RECOVERING 54 18 001F005C00001BD6
RECOVERING 0 20 001F005C00001BD6
RECOVERING 0 21 001F005C00001BD6
RECOVERING 0 22 001F005C00001BD6
Column STATE shows the state of the server being IDLE or RECOVERING. If only 1 process is in state RECOVERING while the other processes are in state IDLE, then you should disable Parallel Transaction Recovery. Normally these queries will show the progress of the transaction recovery.
If there is no progress in the number of blocks, then we can consider disabling the parallel recovery and let the recovery continue in serial transaction.
Identify Objects involved in Transaction Recovery
Data Dictionary Tables
select decode(px.qcinst_id,NULL,username,' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "Slave Set",
to_char(s.sid) "SID",
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
px.req_degree "Requested DOP",
px.degree "Actual DOP"
from gv$px_sessionpx, gv$session s
where px.sid=s.sid (+)
and px.serial#=s.serial#
order by 5 , 1 desc ;
ORA User QC/Slav Slave Set SID QC SID Requested DOP Actual DOP
---------- ----------- ---------- ------ -------- ------------------- -----------------
QC 159 159
- p007 (Slave) 1 137 159 8 8
- p006 (Slave) 1 138 159 8 8
- p005 (Slave) 1 135 159 8 8
- p004 (Slave) 1 132 159 8 8
- p003 (Slave) 1 136 159 8 8
- p002 (Slave) 1 130 159 8 8
- p001 (Slave) 1 133 159 8 8
- p000 (Slave) 1 145 159 8 8
select distinct current_obj#
from v$active_session_history
where SESSION_ID=145
and SESSION_SERIAL#=15; -- (This Query to be run multiple times and you should be able to see the Current Object id consistently)
CURRENT_OBJ#
-------------------
77211
select Object_name, object_type
from dba_objects
where object_id=77211;
OBJECT_NAME OBJECT_TYPE
--------------- ----------------
EMPLOYEES TABLE
Note: This method helps when the recovery is Parallel Transaction Recovery.
Dump the UNDO block
1.Identify the UNDO name using the undo segment name and transaction details.
select sysdate, b.nameuseg, a.ktuxeusnxid_usn, a.ktuxesltxid_slot, a.ktuxesqnxid_seq
from x$ktuxe a, undo$ b
where a.ktuxesta = 'ACTIVE'
and a.ktuxecfl like '%DEAD%'
and a.ktuxeusn = b.us#;
2 Dump the Undo Block
Alter system dump undo block ‘<undo seg name>’ xid xidusn xidslot xidsqn;
Eg:
sqlplus / as sysdba
oradebug setmypid
Alter system dump undo block ‘_SYSSMU16$’ xid 16 21 1408186;
oradebug tracefile_name
3 Upload the resultant Trace file to Oracle Support.
This will help Oracle Support to review the UNDO dump and identify the Object ID.
Disable Transaction Recovery
- Disabling Transaction recovery could temporarily open up the database.
- SMON will stop performing all the transaction recovery.
- There is no way to stop only one Transaction recovery / few transaction recoveries.
- When the transaction recovery is disabled, any data modified by this dead transaction but required by other transactions will be recovered on "as needed" basis.
- We always suggest to disable the transaction recovery only with the help of Oracle Support.
NOTE: Oracle does not recommend to Disable the Transaction recovery and open the Database for Business Use.
Suggestions
- Check the long running Transactions before shutting down the Database
- DBA and Application team to be aware of any long running Jobs.
- Plan for the downtime and inform application team accordingly so that they do not schedule business critical / long running jobs.
- Do not Panic. Transaction recovery is part of Database normal Operations
Information to collect for Oracle Support
1) Complete Database Alert log (at least logs from last 2 startups)
2) SMON Trace file (if available)
3) Undo Block dump Trace file (if taken).
4) Output of following Queries (3 outputs in intervals of at least 5 min)
=============
spool /tmp/transaction_recovery.html
set markup html on
set time on
alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';
select sysdate from dual;
show parameter fast_start_parallel_rollback
select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State,ktuxesiz Undo from x$ktuxe where ktuxesta <> 'INACTIVE' and ktuxecfl like '%DEAD%' order by ktuxesiz asc;
select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo", decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400))
"Estimated time to complete" from v$fast_start_transactions;
select * from v$fast_start_servers;
select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta from x$ktuxe where ktuxecfl = 'DEAD';
set markup html off
spool off
=============
Additional Resources
Note 238507.1 How to Disable Parallel Transaction Recovery When Parallel Txn Recovery is Active
Note 144332.1 Parallel Rollback may hang database, Parallel query servers get 100% cpu
Note 414242.1 Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery
Note 416823.1 Diagnoing BUG 5016142 enqueue ENQ - SS contention
Note 461480.1 FAQ – Automatic Undo Management (AUM) / System Managed Undo (SMU)
Note 266159.1 SMON: ABOUT TO RECOVER UNDO SEGMENT %s messages in alert log
Note 1060831.6 Rollback Is Taking A Long Time After Session Killed
Note 11790175.8 Bug 11790175 - SMON spins in rollback of LOB due to double allocated block like bug 11814891
Note 8240762.8 Bug 8240762 - Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] / SMON
may spin to recover transaction
Note 11902008.8 Bug 11902008 - SMON may crash with ORA-00600 [kcbgcur_3] or ORA-600 [kcbnew_3] during
Transaction recovery