Dead Connection Detection (DCD) Explained [ID 151972.1]

Dead Connection Detection (DCD) Explained [ID 151972.1]

		修改時間 17-AUG-2009 類型 BULLETIN 狀態 PUBLISHED



Checked for relevance on 5-FEB-2009.



                        DEAD CONNECTION DETECTION 

                        =========================


OVERVIEW 

-------- 


Dead Connection Detection (DCD) is a feature of SQL*Net 2.1 and later, including

Oracle Net8 and Oracle NET. DCD detects when a partner in a SQL*Net V2 client/server

or server/server connection has terminated unexpectedly, and flags the dead session

so PMON can release the resources associated with it.


DCD is intended primarily for environments in which clients power down their 

systems without disconnecting from their Oracle sessions, a problem

characteristic of networks with PC clients.


DCD is initiated on the server when a connection is established. At this 

time SQL*Net reads the SQL*Net parameter files and sets a timer to generate an 

alarm.  The timer interval is set by providing a non-zero value in minutes for 

the SQLNET.EXPIRE_TIME parameter in the sqlnet.ora file. The Database and Listener

need to be restarted after any DCD changes.


When the timer expires, SQL*Net on the server sends a "probe" packet to the 

client. (In the case of a database link, the destination of the link

constitutes the server side of the connection.)  The probe is essentially an 

empty SQL*Net packet and does not represent any form of SQL*Net level data, 

but it creates data traffic on the underlying protocol. 


If the client end of the connection is still active, the probe is discarded, 

and the timer mechanism is reset.  If the client has terminated abnormally, 

the server will receive an error from the send call issued for the probe, and 

SQL*Net on the server will signal the operating system to release the 

connection's resources. 


On Unix servers, the sqlnet.ora file must be in either $TNS_ADMIN or 

$ORACLE_HOME/network/admin. Neither /etc nor /var/opt/oracle alone is valid. 


It should be also be noted that in SQL*Net 2.1.x, an active orphan process 

(one processing a query, for example) will not be killed until the query 

completes. In SQL*Net 2.2, orphaned resources will be released regardless of 

activity.


This is a server feature only.  The client may be running any supported 

SQL*Net V2 release.



THE FUNCTION OF THE PROTOCOL STACK 

---------------------------------- 


While Dead Connection Detection is set at the SQL*Net level, it relies heavily

on the underlying protocol stack for it's successful execution. For example,

you might set SQLNET.EXPIRE_TIME=1 in the sqlnet.ora file, but it is unlikely

that an orphaned server process will be cleaned up immediately upon expiration

of that interval. 


TCP/IP, for example, is a connection-oriented protocol, and as such, the 

protocol will implement some level of packet timeout and retransmission in an 

effort to guarantee the safe and sequenced order of data packets. If a timely 

acknowledgement is not received in response to the probe packet, the TCP/IP 

stack will retransmit the packet some number of times before timing out. After

TCP/IP gives up, then SQL*Net receives notification that the probe failed.


The time that it takes TCP/IP to timeout is dependent on the TCP/IP stack, and

timeouts of many minutes are entirely common.  This has been an area of concern

for many customers, as many retransmissions at the protocol layer causes what

could be a significant lag between the expiration of the DCD interval and the

time when the orphaned process is actually killed. 


The easiest way to determine if the protocol stack is causing such a delay 

involves testing different DCD intervals.



TESTING THE PROTOCOL STACK 

--------------------------

Set the SQLNET.EXPIRE_TIME parameter to 1 minute and note the time required to

clean up an orphaned server process.  Then set SQLNET.EXPIRE_TIME to 5 minutes

and again observe the time required to clean up the shadow. If the TCP/IP

timeout is the reason the server resources do not get released, the time to

clean up the shadow should increase by about 4 minutes.


If the TCP/IP retransmission timeout is indeed the problem, the Operating 

System kernel can be tuned to reduce the interval for and number of packet 

retransmissions (on many Unix platforms, the file 

/usr/include/netinet/tcp_timer.h contains the configuration parameters). 


Reducing the interval and number of retransmissions may impact other system 

components, since in effect you are shrinking the window allowed for

connections to process data, possibly resulting in inadvertent loss of 

connections during periods of heavy system load.  Slower connections from

remote sites may be impacted by this change.


Kernel parameters that may affect retransmission include but are not limited 

to TCP_TTL, TCPTV_PERSMIN, TCPTV_MAX, and TCP_LINGERTIME. 


*** To avoid disrupting other system processes, it is important to contact the 

appropriate vendor for assistance in tuning the operating system kernel or 

protocol stack. *** 



MONITORING DEAD CONNECTION DETECTION 

------------------------------------ 

The best way to determine if DCD is enabled and functioning properly is to 

generate a server trace and search the file for the DCD probe packet. To 

generate a server trace, set TRACE_LEVEL_SERVER=16 and 

TRACE_DIRECTORY_SERVER=<path> in sqlnet.ora on the server (note the location

of the sqlnet.ora file).  The resulting trace file will have a filename of

svr_<PID>.trc and will be located in the specified directory. 



Is DCD Enabled? 

--------------- 

For pre-Oracle8i versions, enable level 16 SQL*Net server tracing and search

the resultant server trace file for an entry like the following: 


  osntns: Enabling dead connection detection (1 min) 


The timer interval listed should match the value of SQLNET.EXPIRE_TIME.


For Oracle8i onwards, you should see the following:


  nstimini: entry 

  nstimig: entry 

  nstimig: normal exit 

  nstimini: initializing NLTM in asynchronous mode 

  nstimini: normal exit 

  nstimstart: entry



Is DCD Working? 

---------------

Search the server trace file for DCD probe packets. They will appear in the

form of empty data packets, as follows: 


  nstimexp: entry 

  nstimexp: timer expired at 05-OCT-95 12:15:05 

  nsdo: entry 

  nsdo: cid=0, opcode=67, *bl=0, *what=1, uflgs=0x2, cflgs=0x3 

  nsdo: nsctx: state=8, flg=0x621c, mvd=0 

  nsdo: gtn=93, gtc=93, ptn=10, ptc=2048 

  nsdoacts: entry 

  nsdofls: entry 

  nsdofls: DATA flags: 0x0 

  nsdofls: sending NSPTDA packet 

  nspsend: entry 

  nspsend: plen=10, type=6 

  nttwr: entry 

  nttwr: socket 4 had bytes written=10 

  nttwr: exit 

  nspsend: 10 bytes to transport 

  nspsend:packet dump 

  nspsend:00 0A 00 00 06 00 00 00  |........| 

  nspsend:00 00 00 00 00 00 00 00  |........| 

  nspsend: normal exit 

  nsdofls: exit (0) 

  nsdoacts: flushing transport 

  nttctl: entry 

  nsdoacts: normal exit 

  nsdo: normal exit 

  nstimexp: normal exit


The entry:


  nspsend:00 0A 00 00 06 00 00 00  |........| 

  nspsend:00 00 00 00 00 00 00 00  |........| 


represents the probe packet.  Note that DCD packets are 10 bytes long when they

are issued to the protocol stack. Once the protocol header and trailer bytes

for the underlying protocols have been added, the packet could be approximately

70 bytes long.


If DCD is enabled, you will see these probe packets written to the trace file

when the timer expires.  If the server is a UNIX system, it might be useful to

establish a connection and tail the trace file: 


  tail -f svr_<PID>.trc 


The time elapsed after each probe packet is written to the server trace should 

match the SQLNET.EXPIRE_TIME value.


Note: from version 9.2.0.4.0 onwards, DCD probe packets are no longer traced in

SQL*Net trace files, however DCD packets can be observed using other forms of

tracing, such as network sniffer tracing.



KNOWN PROBLEMS OR LIMITATIONS 

----------------------------- 

- Of the few reported problems, perhaps the most significant is DCD's poor 

performance on Windows NT.  Dead connections are cleaned up only when the 

server is rebooted and the database is restarted.  Exactly how well DCD works 

on NT depends on the client's protocol implementation. SQL*Net v2.3 has 

improved the performance over earlier releases. 


  This has been logged as port-specific Bug#303578. 



- On SCO Unix, a problem was reported in which server processes spin, consuming

large amounts of CPU, once the DCD timer expires. The problem is due to improper

signal handling and can be eliminated by disabling DCD.


  This is port-specific Bug#293264


- Orphaned resources are not released if only the client application is 

terminated. Only after the client PC has been rebooted does DCD release these 

resources. For example, if a Windows application is killed yet Windows remains

running, the probe packet may be received and discarded as if the connection is

still active.  As it currently stands, it appears that DCD detects dead client

machines, but not dead client processes.


  This is logged as generic Bug#280848. 


- The SQL*Net V2 implementation on MVS does not use the generic DCD mechanism,

and therefore the SQLNET.EXPIRE_TIME parameter does not apply. The KEEPALIVE

function of IBM's TCP/IP is used instead. This was implemented prior to

development of DCD. 


  This is documented in port-specific Bug#301318. 


- DCD relies heavily on issuing probe packets during any phase of the connection.

This is not be possible with some protocols which run half-duplex. Hence, DCD is

not enabled on protocols like APPC/LU6.2. 


  This is not a bug, but is rather the intended design. 


- Local connections using BEQ protocol adapters are not supported with DCD.  

Local connections using the IPC protocol adapters are supported with DCD.


-BUG#1388806 : On Windows NT, DCD FAILS AFTER 16 CONNECTIONS



A FINAL NOTE...

--------------

On most OS'es (even more recent versions of Windows) if a process exits 

abnormally or is killed by an administrator, the OS will still gracefully 

clean up resources associated with that process including the network

connection(s).  It will tell the server on the other end that it is closing 

the network connection. DCD is still useful for times when there are problems 

with the physical network (e.g. ethernet cable falls off the machine) or if

the OS kernel panics and crashes (e.g. blue screen of death) before it can

close the network connections.  It may have another side benefit with certain

load balancing hardware, that may prematurely abort connections it thinks have

been idle too long, by sending a dummy packet to the client periodically.


Under no circumstances should you rely 100% on Dead Connection Detection.  

It was developed to handle clients that have abnormally exited. Clients should

always exit their applications gracefully. It is the responsibility of the

application developer to make this possible. DCD is intended only to clean up

after abnormal events.


DCD is much more resource-intensive than similar mechanisms at the protocol 

level, so if you depend on DCD to clean up all dead processes, that will put 

an undue load on the server. 


Clearly it is advantageous to exit applications cleanly in the first place.
amber112
發佈了57 篇原創文章 · 獲贊 2 · 訪問量 23萬+
私信關注
Dead Connection Detection (DCD) Explained [ID 151972.1]

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

一個docker容器暴露多個端口

leetcode 60 排列序列

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

table&index調整表空間

如何在Exp或Imp過程中使用trace

Dead Connection Detection (DCD) Explained [ID 151972.1]

dbms_xplan.display_cursor oracle 10g查看執行計劃

ORA-00119

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結