Oracle Library Cache Lock 解決思路

一. Library Cache Lock

Library cacheHandle 裏保存了lock 和 pin 的信息。而且在Library cache handle 和child cursor 上都有lock 和pin。它們稱爲library cache lock和library cache pin。

Library cachelock/pin是用來控制對librarycache object的併發訪問的。Lock管理併發，pin管理一致性，lock是針對於librarycache handle, 而pin是針對於heap。

當我們想要訪問某個library cache object，我們首先要獲得這個指向這個object的handle的lock，獲得這個lock之後我們就需要pin住指向這個object的heap。

當我們對包，存儲過程，函數，視圖進行編譯的時候，Oracle就會在這些對象的handle上面首先獲得一個library cache lock，然後再在這些對象的heap上獲得pin，這樣就能保證在編譯的時候其它進程不會來更改這些對象的定義，或者將對象刪除。

當一個session對SQL語句進行硬解析的時候，這個session就必須獲得librarycache lock，這樣其他session就不能夠訪問或者更改這個SQL所引用的對象。如果這個等待事件花了很長時間，通常表明共享池太小(由於共享池太小，需要搜索free的chunk，或者將某些可以被移出的object page out，這樣要花很長時間)，當然了，也有可能另外的session正在對object進行修改(比如split 分區),而當前session需要引用那個table，那麼這種情況下我們必須等另外的session進行完畢。

Library Cache lock有3中模式：

（1）Share(S): 當讀取一個library cache object的時候獲得

（2）Exclusive(X): 當創建/修改一個library cache object的時候獲得

（3）Null(N)：用來確保對象依賴性

比如一個進程想要編譯某個視圖，那麼就會獲得一個共享鎖，如果我們要create/drop/alter某個對象，那麼就會獲得exclusive lock。Null鎖非常特殊，我們在任何可以執行的對象(cursor，function)上面都擁有NULL鎖，我們可以隨時打破這個NULL鎖，當這個NULL鎖被打破了,就表示這個對象被更改了，需要從新編譯。

NULL鎖主要的目的就是標記某個對象是否有效。比如一個SQL語句在解析的時候獲得了NULL 鎖，如果這個SQL的對象一直在共享池中，那麼這個NULL鎖就會一直存在下去，當這個SQL語句所引用的表被修改之後，這個NULL鎖就被打破了，因爲修改這個SQL語句的時候會獲得Exclusive 鎖，由於NULL鎖被打破了，下次執行這個SQL的時候就需要從新編譯。

Library Cache pin有2種模式：

（1）Share(S): 讀取object heap

（2）Exclusive(X)：修改object heap

當某個session想要讀取object heap，就需要獲得一個共享模式的pin，當某個session想要修改object heap，就需要獲得排他的pin。當然了在獲得pin之前必須獲得lock。

在Oracle10gR2中，library cache pin被library cache mutex 所取代。

Library cache latch用來控制對library cache object的併發訪問。前面已經提到，我們要訪問library cacheobject之前必須獲得librarycache lock， lock不是一個原子操作(原子操作就是在操作程中不會被打破的操作，很明顯這裏的lock可以被打破), Oracle爲了保護這個lock，引入了library cache latch機制，也就是說在獲得library cachelock之前，需要先獲得library cache latch，當獲得library cache lock之後就釋放librarycache latch。

如果某個librarycache object沒有在內存中，那麼這個lock就不能被獲取，這個時候需要獲得一個library cache load lock latch，然後再獲取一個librarycache load lock,當load lock獲得之後就釋放library cache load lock latch。

librarycache latch受隱含參數_KGL_LATCH_COUNT的控制，默認值爲大於等於系統中CPU個數的最小素數，但是Oracle對其有一個硬性限制，該參數不能大於67。

注意：我們去查詢_kgl_latch_count有時候顯示爲0,這是一個bug。

Oracle利用下面算法來確定library cache object handle是由哪個子latch來保護的：

latch#= mod(bucket#, #latches)

也就是說用哪個子latch去保護某個handle是根據那個handle所在的bucket號，以及總共有多少個子latch來進行hash運算得到的。

MOS 的文檔【122793.1】裏說導致librarycache lock通常有2種原因：

（1）A DML operation that is hangingbecause the table which is accessed is currently undergoing changes (ALTERTABLE). This may take quite a long time depending on the size of the table andthe type of the modification (e.g. ALTER TABLE x MODIFY (col1 CHAR(200) on atable with thousands of records)

In this case,V$LOCK will show that the session doing the 'ALTER TABLE' with an exclusive DMLenqueue lock on the table object (LMODE=6, TYPE=TM where ID1 is the OBJECT_IDof the table). The waiting session however does not show up in V$LOCK yet so inan environment with a lot of concurrent sessions the V$LOCK information will beinsufficient to track down the culprit blocking your operation.

（2）The compilation of package willhang on Library Cache Lock and Library Cache Pin if any users are executing aprocedure/function defined in the same package.

更多內容參考：

OracleLibrary cache 內部機制說明

http://www.cndba.cn/Dave/article/1381

OracleLibrary Cache 的lock 與 pin 說明

http://www.cndba.cn/Dave/article/1378

OracleNamespace 說明

http://www.cndba.cn/Dave/article/1383

一次librarycache pin故障的解決過程

http://www.cndba.cn/Dave/article/502

二. 處理Library cache lock

2.1 使用hanganalyze + systemstat 分析

Systemstat 事件包含每個oracle 進程的詳細信息。當操作hang住時，可以新開一個窗口，使用該事件，捕獲相關信息。

Systemdump 級別說明：

LEVEL參數：

10 Dump all processes (IGN state)

5 Level 4 + Dump all processes involved in wait chains (NLEAF state)

4 Level 3 + Dump leaf nodes (blockers) in wait chains(LEAF,LEAF_NW,IGN_DMP state)

3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)

1-2 Only HANGANALYZE output, no process dump at all

level 266= SYSTEM STATE (level=10, withshort stacks) = level 10 + short stacks

level 266 在level 10的基礎上包含了進程的short stacks信息

Oracle 9.2.0.1 之後，執行如下腳本：

$sqlplus '/ as sysdba'
oradebugs etmypid
oradebug unlimit
oradebug dump systemstate 266

oradebug tracefile_name

systemstat 226級別在9.2.0.6 之前不可用，所以在之前的版本可以使用如下命令：

alter session set max_dump_file_size=unlimited; alter session set events 'immediate trace name systemstate level 10'

先執行hanganalyze，如下：

SQL> oradebug setmypid

SQL>oredebug unlimit

SQL> oradebug setinst all

SQL> oradebug -g def hanganalyze 3;

SQL>oradebug tracefile_name

如下文件裏其他session都被1169的阻塞：

State of ALL nodes

([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):

[1001]/1/1002/9/c00000063d7aff78/9720/NLEAF/[1169]

[1159]/1/1160/51635/c00000063d8dfc68/19539/NLEAF/[1169]

[1160]/1/1161/15627/c000000631959658/8818/NLEAF/[1169]

[1162]/1/1163/27931/c0000006398d7810/20170/NLEAF/[1169]

[1165]/1/1166/4003/c0000006358f4d58/22069/NLEAF/[1169]

[1166]/1/1167/45511/c0000006398d4868/15674/NLEAF/[1169]

[1167]/1/1168/46253/c00000063d8d9d18/29492/NLEAF/[1169]

[1169]/1/1170/9233/c0000006358f1db0/9434/LEAF_NW/

[1170]/1/1171/43901/c0000006398d18c0/13246/NLEAF/[1169]

[1171]/1/1172/53701/c00000063d8d6d70/13794/NLEAF/[1169]

[1172]/1/1173/23737/c000000631950760/25188/NLEAF/[1169]

[1173]/1/1174/28801/c0000006358eee08/24770/NLEAF/[1169]

[1175]/1/1176/25017/c00000063d8d3dc8/18795/NLEAF/[1169]

[1177]/1/1178/3/c0000006358ebe60/10170/NLEAF/[1169]

這裏sess_srno 是v$session 中的serial#.

Ospid 是系統進程號。

找到了sid和serial# 就可以查看對應session 的信息，是什麼操作。如果session 沒有sql_id, 那麼可以進一步使用oradebug systemdump 對應的進程。來查看信息。

SYS@dave2(db2)> oradebug setospid 9434

Oracle pid: 18, Unix processpid: 27028, image: oracledave2@db2

SYS@dave2(db2)> oradebug unlimit

Statement processed.

SYS@dave2(db2)> oradebug dump systemstate 10

Statement processed.

SYS@dave2(db2)> oradebug TRACEFILE_NAME

/u01/app/oracle/admin/dave2/udump/dave2_ora_27028.trc

SYS@dave2(db2)> oradebug close_trace

Statement processed.

然後使用awk來分析systemdump 的trace：

Oracle 使用ass.awk 工具查看system state dump 說明

http://www.cndba.cn/Dave/article/1523

這裏也可以直接用systemdump 查看所有的進程信息。

2.2 查看X$KGLLK表

The X$KGLLK table (accessibleonly as SYS/INTERNAL) contains all the library object locks (both held &requested) for all sessions and is more complete than the V$LOCK view althoughthe column names don't always reveal their meaning.

--X$KGLLK 表只能被SYS/INTERNAL用戶訪問，其包含所有library object locks的信息（held和requested）。

--查看等待事件爲librarycache lock的session 的session address (SADDR):

SQL>select sid,saddr from v$session where event='library cache lock';
SID SADDR
---------- --------
16 572ed244

--從x$kgllk查看具體的鎖信息：

select kgllkhdl Handle, kgllkreq Request,kglnaobj Object

from x$kgllk

where kgllkses = '572ed244'

and kgllkreq > 0;

HANDLE REQUEST OBJECT
-------- ---------- -------------------
62d064dc 2EMPLOYEES

KGLLKREQ: This will show you the library cache lock requested by this session(KGLLKREQ > 0)

KGLNAOBJ:contains the first 80 characters of the name of the object.

KGLLKHDL:corresponds with the 'handle address' of the object

--然後根據KGLLKHDL從X$KGLLK查看KGLLKMOD > 0的session，其正在持有該鎖：

select kgllkses saddr, kgllkhdl handle,kgllkmod mod, kglnaobj object

from x$kgllk lock_a

where kgllkmod > 0

andexists (select lock_b.kgllkhdl

from x$kgllk lock_b

where kgllkses = '572ed244'/* blocked session*/

and lock_a.kgllkhdl =lock_b.kgllkhdl

and kgllkreq > 0);

SADDR HANDLE MOD OBJECT
-------- ----------- ------- --------

572eac94 62d064dc 3 EMPLOYEES

--查看所有blocked的session：

selectsid, username,terminal, program

from v$session

where saddr in

(select kgllkses

from x$kgllk lock_a

where kgllkreq > 0

andexists (select lock_b.kgllkhdl

from x$kgllk lock_b

where kgllkses = '572eac94'/* blocking session*/

and lock_a.kgllkhdl =lock_b.kgllkhdl

and kgllkreq = 0));

--查看所有持有librarycache pin 或者lock的session 在做什麼：

SELECT s.sid, kglpnmod"Mode",kglpnreq "Req", SPID "OS Process"

FROM v$session_wait w,x$kglpn p, v$session s, v$process o

WHERE p.kglpnuse =s.saddr

AND kglpnhdl = w.p1raw

and w.event like'%library cache %'

and s.paddr = o.addr

2.3 處理問題

一般來說，使用2.1 或者2.2 的方法都可以找到library cache lock的根源，確定是哪個session 導致的，如我們上面的hanganalyze中，是1169的session。我們只需要kill 掉這個session，其他的問題就會自動解決了。

先在DB級別kill session，如果kill 不了，在os 級別kill。

alter systemkill session '1170,9233';

注意在os 級別kill 之前，先用ps 命令查看一下該進程，如果是DB 進程，不可隨意kill，否則會導致系統crash。

ps –ef|grep 9434

kill -9 9434

Oracle Library Cache Lock 解決思路

一. Library Cache Lock

二. 處理Library cache lock

2.1 使用hanganalyze + systemstat 分析

2.2 查看X$KGLLK表

2.3 處理問題

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

大齡程序員思考

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

WebStorm 創建 Vue 項目

nuget添加readme

Oracle Library Cache Lock 解決思路

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結