SQL運行時報錯ora-04030 (QERHJ hash-joi,kllcqas:kllsltba)的診斷案例

本文源自:https://blog.csdn.net/cnje5551/article/details/100230162

某公安系統開發人員在plsql中執行查詢時報ORA-04030: 在嘗試分配 123416 字節 (QERHJ hash-joi,kllcqas:kllsltba) 時進程內存不足。數據庫是aix上的rac 11.2.0.4。當時朋友找到我,我當時給他的建議是直接把報錯的查詢直接在數據庫服務器上執行看是否報錯,因爲在服務器上用sqlplus來執行是沒有經過監聽程序的,但後面給我的答覆是在服務器上執行也報錯。所以就遠程連接來處理。

查看alert.log文件發現如下錯誤信息:

Errors in file /u01/app/oracle/diag/rdbms/test/test/incident/incdir_156708/test_ora_15859922_i156708.trc

ORA-04030: 在嘗試分配 123416 字節 (QERHJ hash-joi,kllcqas:kllsltba) 時進程內存不足

從alert.log文件查看到的非缺省值參數信息如下 :

System parameters with non-default values:
  processes                = 1500
  sessions                 = 2272
  _realfree_heap_pagesize_hint= 256K
  _use_realfree_heap       = TRUE
  spfile                   = "+DATA_DG/xtrk/spfilextrk.ora"
  sga_target               = 14G
  control_files            = "+DATA_DG/xtrk/controlfile/current.261.851142581"
  control_files            = "+DATA_DG/xtrk/controlfile/current.260.851142581"
  db_block_size            = 8192
  _external_scn_rejection_threshold_hours= 1
  compatible               = "11.2.0.4.0"
  log_archive_format       = "%t_%s_%r.dbf"
  cluster_database         = TRUE
  db_create_file_dest      = "+DATA_DG"
  db_recovery_file_dest    = "+DATA_DG"
  db_recovery_file_dest_size= 400G
  thread                   = 1
  undo_tablespace          = "UNDOTBS1"
  instance_number          = 1
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  dispatchers              = "(PROTOCOL=TCP) (SERVICE=xtrkXDB)"
  remote_listener          = "test-scan:1521"
  audit_file_dest          = "/u01/app/oracle/admin/test/adump"
  audit_trail              = "DB"
  db_name                  = "test"
  open_cursors             = 600
  pga_aggregate_target     = 10G
  diagnostic_dest          = "/u01/app/oracle"

如是查看錯誤跟蹤文件:

Dump continued from file: /u01/app/oracle/diag/rdbms/xtrk/xtrk1/trace/xtrk1_ora_15859922.trc
ORA-04030: 在嘗試分配 123416 字節 (QERHJ hash-joi,kllcqas:kllsltba) 時進程內存不足

========= Dump for incident 156708 (ORA 4030) ========
----- Beginning of Customized Incident Dump(s) -----
=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------

*** 2014-08-27 15:07:09.388
77%   86 MB, 727 chunks: "kllcqas:kllsltba          "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 7% 8449 KB,  80 chunks: "QERHJ Bit vector          "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 6% 6771 KB,   4 chunks: "kllcqc:kllcqslt           "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 5% 5880 KB, 407 chunks: "free memory               "  SQL
         QERHJ hash-joi  ds=11160a378  dsprt=110bcb0f8
 2% 2056 KB,   2 chunks: "kllcqgf:kllsltba          "  SQL
         kxs-heap-w      ds=110bcb0f8  dsprt=1109f0120
 1%  651 KB,  28 chunks: "permanent memory          "  
         pga heap        ds=110005210  dsprt=0
 0%  376 KB,   1 chunk : "kfk_kfkio_freeq           "  
         KFK_IO_SUBHEAP  ds=1108dae38  dsprt=110005210
 0%  336 KB,   1 chunk : "kfkosd_p                  "  
         KFK_IO_SUBHEAP  ds=1108dae38  dsprt=110005210
 0%  259 KB,   8 chunks: "free memory               "  
         pga heap        ds=110005210  dsprt=0
 0%  253 KB,   2 chunks: "free memory               "  
         top call heap   ds=11011f360  dsprt=0

從上面報錯進程內存分配信息來看報錯的chunks: "kllcqas:kllsltba " 大小是86M

=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
111 MB total:
   110 MB commented, 652 KB permanent
   606 KB free (0 KB in empty extents),
     109 MB,   1 heap:    "session heap   "            92 KB free held
------------------------------------------------------
Summary of subheaps at depth 1
110 MB total:
   110 MB commented, 96 KB permanent
    94 KB free (0 KB in empty extents),
     108 MB,   1 heap:    "kxs-heap-w     "            84 KB free held
------------------------------------------------------
Summary of subheaps at depth 2
109 MB total:
   108 MB commented, 45 KB permanent
    32 KB free (4 KB in empty extents),
     106 MB,   4 heaps:   "QERHJ hash-joi "            10 KB free held
------------------------------------------------------
Summary of subheaps at depth 3
106 MB total:
   100 MB commented, 41 KB permanent
  5880 KB free (0 KB in empty extents),
      91 MB, 727 chunks:  "kllcqas:kllsltba          " 5257 KB free held
    8786 KB,  80 chunks:  "QERHJ Bit vector          " 336 KB free held

從111 MB total可知出錯進程只分配了111M的內存

=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------
 
Dump of Real-Free Memory Allocator Heap [0x110959bf0]
mag=0xfefe0001 flg=0x5000003 fds=0x0 blksz=262144
blkdstbl=0x110959c00, iniblk=8192 maxblk=524288 numsegs=15
In-use num=402 siz=116391936, Freeable num=0 siz=0, Free num=0 siz=0
 
==========================================
INSTANCE-WIDE PRIVATE MEMORY USAGE SUMMARY
------------------------------------------
 
Dumping Work Area Table (level=1)
=====================================
 
  Global SGA Info
  ---------------
 
    global target:    10240 MB
    auto target:       8997 MB
    max pga:           2047 MB
    pga limit:         4095 MB
    pga limit known:  0
    pga limit errors:     0
 
    pga inuse:          275 MB
    pga alloc:          343 MB
    pga freeable:        11 MB
    pga freed:         4150 MB
    pga to free:          0 
    broker request:       0
 
    pga auto:            32 MB
    pga manual:           0 MB
 
    pga alloc  (max):  3043 MB
    pga auto   (max):    32 MB
    pga manual (max):     0 MB
 
    # workareas     :     4
    # workareas(max):     5
 
 

================================
PER-PROCESS PRIVATE MEMORY USAGE
--------------------------------
 
Private memory usage per Oracle process

當出現ora-04030時該實例分配的總PGA大小如下

-------------------------
Top 10 processes:
-------------------------
(percentage is of 420 MB total allocated memory)
27% pid 51: 111 MB used of 112 MB allocated  < = CURRENT PROC
 5% pid 13: 18 MB used of 22 MB allocated 
 5% pid 14: 18 MB used of 22 MB allocated 
 5% pid 36: 18 MB used of 22 MB allocated 
 5% pid 38: 18 MB used of 22 MB allocated 
 5% pid 39: 18 MB used of 22 MB allocated 
 5% pid 40: 18 MB used of 22 MB allocated 
 4% pid 10: 16 MB used of 17 MB allocated 
 4% pid 12: 13 MB used of 17 MB allocated 
 4% pid 20: 12 MB used of 15 MB allocated 

從上面的percentage is of 420 MB total allocated memory可知這個實例的PGA只分配了420M。從27% pid 51: 111 MB used of 112 MB allocated < = CURRENT PROC可知當前進程分配了112M,使用了111M。而pid=51,那麼接下來查看pid=51的詳細會話信息:

------------------------------------
Begin session detail for pid 51
  sid: 430 ser: 5 audsid: 400022 user: 239/TC_RKXT
    flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 51 O/S info: user: grid, term: UNKNOWN, ospid: 15859922
    image: oracle@xtrkdb1
  client details:
    O/S info: user: Administrator, term: RKXXWEB, ospid: 4120:2956
    machine: RKXX\RKXXWEB program: plsqldev.exe
    application name: PL/SQL Developer, hash value=1190136663
    action name: SQL Window - New, hash value=3399691616
  current SQL:
  SELECT
 b.master_relation 戶關係, b.hu_id_new 戶號,a.name 姓名,a.pid 身份證號碼,b.when_in_logged 入戶時間, b.in_category 入戶理由, b.dob,
a.photo_flag 有無相片, substr(ORGNAME,7,10) 所屬派出所,
b.quxcun_id,c.name 居委會名稱
from t_person a, t_huji b, tc_jcyw.t_org o ,tc_jcyw.t_quxcun c
where a.person_id = b.person_id
      and b.org_id = o.sunit_code
      and b.quxcun_id=c.quxcun_id
      and b.zxbz = '0'
      and a.dob < =to_date( '1996.07.01','yyyy.mm.dd' )
and not Exists(select * from t_pid_accept m where m.iscancel is null and m.pid = a.pid )
End session detail for pid 51
------------------------------------
 
================
SWAP INFORMATION
----------------
swap info: free_mem = 22025.57M rsv = 128.00M 
           alloc = 44.48M avail = 32768.00M swap_free = 32723.52M
----- End of Customized Incident Dump(s) -----

*** 2014-08-27 15:07:09.417
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=1gbsquwh58w4k) -----
SELECT
 b.master_relation 戶關係, b.hu_id_new 戶號,a.name 姓名,a.pid 身份證號碼,b.when_in_logged 入戶時間, b.in_category 入戶理由, b.dob,
a.photo_flag 有無相片, substr(ORGNAME,7,10) 所屬派出所,
b.quxcun_id,c.name 居委會名稱
from t_person a, t_huji b, tc_jcyw.t_org o ,tc_jcyw.t_quxcun c
where a.person_id = b.person_id
      and b.org_id = o.sunit_code
      and b.quxcun_id=c.quxcun_id
      and b.zxbz = '0'
      and a.dob <=to_date( '1996.07.01','yyyy.mm.dd' )
and not Exists(select * from t_pid_accept m where m.iscancel is null and m.pid = a.pid )

從上面的信息可知swap大小爲32768M,SGA爲14G,PGA爲10G,當這個實例的總PGA爲分配了420M,當這個會話進程只分配111M內存時不應該會報告ora-04030錯誤.這時我還是親自在服務器上執行上面的報錯查詢發現不報錯,而開始我要朋友做測試,他是要別人做的,但給我的答案是報錯,看來還是得親自測試。這種情況在MOS有一篇文章(ID 758131.1)就是描述這種情況的並且說在aix下是一個特定的問題。造成這個問題的原因可能是因爲操作系統資源限制,說當CRS啓動它的資源時是使用root用戶可能會出現ora-04030錯誤,如果是資源通過sqlplus,lsnrctl來啓動則沒有操作系統資源限制。如果我查看/etc/security/limits文件內容如下:

# cat limits

*
* Sizes are in multiples of 512 byte blocks, CPU time is in seconds
*
* fsize      - soft file size in blocks
* core       - soft core file size in blocks
* cpu        - soft per process CPU time limit in seconds
* data       - soft data segment size in blocks
* stack      - soft stack segment size in blocks
* rss        - soft real memory usage in blocks
* nofiles    - soft file descriptor limit
* fsize_hard - hard file size in blocks
* core_hard  - hard core file size in blocks
* cpu_hard   - hard per process CPU time limit in seconds
* data_hard  - hard data segment size in blocks
* stack_hard - hard stack segment size in blocks
* rss_hard   - hard real memory usage in blocks
* nofiles_hard - hard file descriptor limit
*
* The following table contains the default hard values if the
* hard values are not explicitly defined:
*
*   Attribute        Value
*   ==========    ============
*   fsize_hard    set to fsize
*   cpu_hard      set to cpu
*   core_hard         -1
*   data_hard         -1
*   stack_hard      8388608 
*   rss_hard          -1
*   nofiles_hard      -1
*
* NOTE:  A value of -1 implies "unlimited"
*

default:
        fsize = 2097151
        core = 2097151
        cpu = -1
        data = 262144
#       rss = 65536
        rss = -1
        stack = 65536
        nofiles = 2000

root:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

oracle:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

發現我朋友已經修改了root用戶和oracle用戶的權限但爲什麼還是報ora-04030錯誤了。原因爲在oracle 11g rac安裝不像以前的版本,現在安裝Grid Infrastructure 和Oracle數據庫軟件推薦使用不同的用戶來安裝。通過查看這裏用grid用戶來安裝Grid Infrastructure ,oracle用戶來安裝的oracle數據庫軟。下面的limits文件朋友只修改了root,oracle兩個用戶,但並沒有修改grid用戶對操作系統資源的使用限制。而在rac中,監聽程序是集羣軟件來啓動的也就是由grid用戶來管理的。由於修改grid用戶對操作系統資源的使用限制由於在limits文件中增加以下內容:

grid:
        fsize = -1
        data = -1
        stack = -1
        core = -1
        rss = -1
        nofiles = -1
        stack = -1
        stack_hard = -1
        data_hard = -1
        cpu = -1
        core_hard = -1
        cpu_hard = -1
        fsize_hard = -1

由於重啓rac,在重啓之後再執行上面報錯語句一切正常了,造成這個問題的根本原因是沒有修改oracle,grid用戶對操作系統資源的使用限制所造成的,而這個問題應該是可以避免的,因爲對這種操作系統資源的使用限制其實在安裝rac之前是要做修改的在安裝文檔中也是有的,所以安裝也是需要仔細閱讀官方文檔的。

 

來自 “ ITPUB博客 ” ,鏈接:http://blog.itpub.net/26015009/viewspace-1261123/,如需轉載,請註明出處,否則將追究法律責任。

轉載於:http://blog.itpub.net/26015009/viewspace-1261123/

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章