我的平臺是redhat as 3 ,oracle 9204.
其他應用是apache,resin等。
因爲以前發現apache運行時間長以後會出現共享內存不足的錯誤,具體錯誤信息如下:
[Fri Apr 13 06:00:03 2007] [error] shm.create(): error creating shm 2 No such file or directory
[Fri Apr 13 06:00:03 2007] [error] shm.create(): error creating shm /home/apache/logs/shm.file
[Fri Apr 13 06:00:03 2007] [warn] pid file /home/apache/logs/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Fri Apr 13 06:00:03 2007] [emerg] (28)No space left on device: Couldn't create accept lock
因此,我寫了一個腳本,來定時檢測並清理,一直很有效。
因此,我寫了一個腳本,來定時檢測並清理。一直很有效。
前一段時間,新開了一個小應用,也是apache的應用,由於沒地方放了,就放到oracle機器上了,一直運行比較好;
今天早上接到信息,說新開的這個apache應用服務停止了,打開log一看,又是共享內存的問題,二話不說,把原來的腳本在系統上跑了一遍,restart apache,ok。系統可以了。
過了幾分鐘。問題大了,說oracle服務宕了。趕緊檢查,ps -ef|oracle 服務都沒了,看alterlog發現如下信息:
Errors in file /opt/oracle/admin/sc1/bdump/sc1_reco_5195.trc:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
Fri Apr 13 10:10:46 2007
Errors in file /opt/oracle/admin/sc1/bdump/sc1_smon_5193.trc:
ORA-27157: OS post/wait facility removed
ORA-27300: OS system dependent operation:semop failed with status: 43
ORA-27301: OS failure message: Identifier removed
ORA-27302: failure occurred at: sskgpwwait1
Fri Apr 13 10:10:46 2007
RECO: terminating instance due to error 27157
Fri Apr 13 10:10:46 2007
Errors in file /opt/oracle/admin/sc1/udump/sc1_ora_23824.trc:
ORA-27153: wait operation failed
ORA-27300: OS system dependent operation:semop failed with status: 22
ORA-27301: OS failure message: Invalid argument
ORA-27302: failure occurred at: sskgpwwait2
Fri Apr 13 10:10:46 2007
Errors in file /opt/oracle/admin/sc1/bdump/sc1_lgwr_5189.trc:
知道是系統問題導致oracle宕機了。想到剛纔的操作,懷疑把oracle的共享內存也給誤清理了,好在db能正常啓動,把數據庫啓動後,檢查共享內存:
[root@oracle]# ipcs -s
------ Semaphore Arrays --------
key semid owner perms nsems
0x00000000 4849664 nobody 600 1
0x00000000 4882433 nobody 600 1
0x00000000 4915202 nobody 600 1
0x00000000 4947971 nobody 600 1
0x00000000 4980740 nobody 600 1
0xbeae576c 5111813 oracle 640 201
0xbeae576d 5144582 oracle 640 201
0xbeae576e 5177351 oracle 640 201
0xbeae576f 5210120 oracle 640 201
0xbeae5770 5242889 oracle 640 201
0x00000000 5275658 nobody 600 1
0x00000000 5308427 nobody 600 1
0x00000000 5341196 nobody 600 1
0x00000000 5373965 nobody 600 1
0x00000000 5406734 nobody 600 1
0x00000000 5439503 nobody 600 1
0x00000000 5472272 nobody 600 1
0x00000000 5505041 nobody 600 1
果然有oracle的共享內存,而我的腳本沒有判斷,如果只是刪除apache用戶的共享內存,可以這樣
ipcs -s | grep apache | perl -e 'while (<STDIN>) {@a=split(/\s+/); print `ipcrm sem $a[1]`}'