問題描述:
10月25日上午濱州網通的工程師報告OSS應用系統運行緩慢,具體操作是通過OSS系統查詢話單,很長時間才能返回結果,嚴重影響了客戶的正常使用。
問題處理:
1.登陸數據庫主機,用sar命令看到idle的值持續爲0,CPU的資源已經耗盡:
bz_db1# sar 2 4
SunOS kest 5.8 Generic_108528-19 sun4u 10/26/04
10:56:46 %usr %sys %wio %idle
10:56:48 1 4 95 0
10:56:50 1 5 94 0
10:56:52 0 6 93 0
10:56:54 1 6 93 0
Average 1 5 94 0
2.使用TOP命令看到有兩個明顯佔用CPU利用率過高的進程,以下是top命令的結果:
bz_db1# top
last pid: 1664;load averages: 3.26, 3.24, 3.69
159 processes: 152 sleeping, 2 running, 2 zombie, 1 stopped, 2 on cpu
CPU states: 1.5% idle, 72.5% user, 17.9% kernel, 8.0% iowait, 0.0% swap
Memory: 2.0G real, 233M free, 2.0G swap in use, 3.4G swap free
PID USERNAME THR PR NCE SIZE RES STATE TIME FLTS CPU COMMAND
27420 oracle 1 10 0 1.3G 1.2G cpu01 22.9H 2 31.94% oracle
27418 oracle 1 10 0 1.3G 1.2G run 23.0H 6 26.86% oracle
5943 oracle 1 59 0 1.3G 1.2G sleep 25:26 37 4.92% oracle
6295 oracle 1 55 0 1.3G 1.2G run 25:14 74 4.90% oracle
7778 oracle 1 43 0 1.3G 1.2G sleep 11:43 110 4.86% oracle
13270 oracle 1 59 0 1.3G 1.2G sleep 210.6H 0 0.96% oracle
13056 oracle 1 48 0 1.3G 1.2G sleep 303:30 0 0.37% oracle
10653 root 1 58 0 2560K 1624K cpu00 0:00 0 0.32% top
18827 oracle 1 58 0 1.3G 1.2G sleep 18.4H 0 0.31% oracle
12748 oracle 258 58 0 1.3G 1.2G sleep 555:14 0 0.21% oracle
10634 oracle 1 59 0 1.3G 1.2G sleep 0:01 0 0.21% oracle
28458 oracle 1 58 0 1.3G 1.2G sleep 535:02 0 0.18% oracle
13075 oracle 1 59 0 1.3G 1.2G sleep 326:33 0 0.15% oracle
13173 oracle 1 58 0 1.3G 1.2G sleep 593:07 0 0.13% oracle
4927 oracle 1 59 0 1.3G 1.2G sleep 33.4H 0 0.11% oracle
可以看到這兩個進程號分別是27420和27418.
3.捕獲佔用CPU利用率過高的SQL語句:
以下用到了我總結的SQL語句:
SQL>set line 240
SQL>set verify off
SQL>column sid format 999
SQL>column pid format 999
SQL>column S_# format 999
SQL>column username format A9 heading "ORA User"
SQL>column program format a29
SQL>column SQL format a60
SQL>COLUMN OSname format a9 Heading "OS User"
SQL>SELECT P.pid pid,S.sid sid,P.spid spid,S.username username,
S.osuser osname,P.serial# S_#,P.terminal,P.program program,
P.background,S.status,RTRIM(a.sql_text) SQL
FROM v$process P, v$session S,v$sqlarea A WHERE P.addr = s.paddr
AND S.sql_address = a.address (+) AND P.spid LIKE '%&1%';
Enter value for 1: 27420(注意這裏應輸入佔用CPU最高的進程對應的PID)
得到以下SQL語句:
Select nvl(sum(LOCALCHARGE),0),nvl(sum(usage),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20041016
and LOCALCHARGE>0 and caller like '0543886%';
27418進程對應的SQL語句如下:
select nvl(sum(LOCALCHARGE),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20041016 and caller like '0543888%';
4.使用相關用戶連接到數據庫,檢查其執行計劃:
SQL>connect wacos/oss
Connected.
SQL>@?/rdbms/admin/utlxplan.sql
Table created.
SQL>set autotrace on
SQL>set timing on
SQL>Select nvl(sum(LOCALCHARGE),0),nvl(sum(usage),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20041016
and LOCALCHARGE>0 and caller like '0543886%';
NVL(SUM(LOCALCHARGE),0) NVL(SUM(USAGE),0)
----------------------- -----------------
0 0
Elapsed: 00:02:56.37
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=13435 Card=1 Bytes=5
3)
1 0 SORT (AGGREGATE)
2 1 PARTITION RANGE (ALL)
3 2 TABLE ACCESS (FULL) OF 'LOCALUSAGE' (Cost=13435 Card=1
81 Bytes=9593)
Statistics
----------------------------------------------------------
258 recursive calls
0 db block gets
88739 consistent gets
15705 physical reads
0 redo size
580 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
1 rows processed
發現對localusage表做了全表掃描,什麼記錄也沒有返回居然用了2分多鐘。
SQL> select nvl(sum(LOCALCHARGE),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20040816 and caller like '0543888%';
NVL(SUM(LOCALCHARGE),0)
-----------------------
27.6
Elapsed: 00:03:56.46
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=13435 Card=1 Bytes=4
0)
1 0 SORT (AGGREGATE)
2 1 PARTITION RANGE (ALL)
3 2 TABLE ACCESS (FULL) OF 'LOCALUSAGE' (Cost=13435 Card=3
615 Bytes=144600)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
88588 consistent gets
15615 physical reads
0 redo size
507 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
這個SQL語句有結果返回,發現也是對localusage表做了全表掃描,但速度也很慢,用了3分多鐘。
SQL> select count(*) from localusage;
COUNT(*)
----------
5793776
該表有579萬多條記錄,數據量很大,全表掃描已經不再適合。
5.檢查該表的類型:
SQL> SELECT INDEX_NAME, TABLE_NAME, STATUS, PARTITIONED FROM USER_INDEXES WHERE TABLE_NAME='LOCALUSAGE';
INDEX_NAME TABLE_NAME STATUS PAR
------------------------------ ------------------------------ -------- ---
I_LOCALUSAGE_SID LOCALUSAGE N/A YES
UI_LOCALUSAGE_ST_SEQ LOCALUSAGE N/A YES
SQL> SELECT index_name,table_name,locality FROM user_part_indexes where table_name='LOCALUSAGE';
INDEX_NAME TABLE_NAME LOCALI
------------------------------ ------------------------------ ------
I_LOCALUSAGE_SID LOCALUSAGE LOCAL
UI_LOCALUSAGE_ST_SEQ LOCALUSAGE LOCAL
發現該表是分區表,並在SERVICEID,STARTIME和CDRSEQUENCE列上建立了分區索引,索引類型是local索引。
6.查看分區索引的索引鍵值:
SQL> select INDEX_NAME,COLUMN_NAME,INDEX_OWNER from dba_ind_columns where TABLE_NAME='LOCALUSAGE';
INDEX_NAME COLUMN_NAME INDEX_OWNER
-------------------- -------------------- ------------------------------
I_LOCALUSAGE_SID SERVICEID WACOS
UI_LOCALUSAGE_ST_SEQ STARTTIME WACOS
UI_LOCALUSAGE_ST_SEQ CDRSEQUENCE WACOS
發現在endtime和caller列上都沒有建立索引,這也是導致SQL語句做全表掃描的最終原因。
7.決定創建新的分區索引以消除全表掃描:
(1).首先查看localusage表分區情況:
SQL> select PARTITION_NAME,tablespace_name from user_tab_partitions where table_name='LOCALUSAGE';
PARTITION_NAME TABLESPACE_NAME
------------------------------ ------------------------------
LOCALUSAGE_200312 WACOS
LOCALUSAGE_200401 WACOS
LOCALUSAGE_200402 WACOS
LOCALUSAGE_200404 WACOS
LOCALUSAGE_200405 WACOS
LOCALUSAGE_200406 WACOS
LOCALUSAGE_200407 WACOS
LOCALUSAGE_200409 WACOS
LOCALUSAGE_200410 WACOS
LOCALUSAGE_200411 WACOS
LOCALUSAGE_200403 WACOS
LOCALUSAGE_200408 WACOS
LOCALUSAGE_200412 WACOS
13 rows selected.
(2).在caller列上創建local分區索引:
SQL>set timing on
SQL>create index I_LOCALUSAGE_CALLER on localusage(caller)
LOCAL
(
PARTITION LOCALUSAGE_200312,
PARTITION LOCALUSAGE_200401,
PARTITION LOCALUSAGE_200402,
PARTITION LOCALUSAGE_200404,
PARTITION LOCALUSAGE_200405,
PARTITION LOCALUSAGE_200406,
PARTITION LOCALUSAGE_200407,
PARTITION LOCALUSAGE_200409,
PARTITION LOCALUSAGE_200410,
PARTITION LOCALUSAGE_200411,
PARTITION LOCALUSAGE_200403,
PARTITION LOCALUSAGE_200408,
PARTITION LOCALUSAGE_200412
)
TABLESPACE wacos
STORAGE(
INITIAL 6553600
NEXT 6553600
MAXEXTENTS unlimited
PCTINCREASE 0)
PCTFREE 5
NOLOGGING;
Index created.
Elapsed: 00:06:27.90 (由於數據量比較大,耗時6分鐘)
8.再次查看執行計劃:
SQL>Select nvl(sum(LOCALCHARGE),0),nvl(sum(usage),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20041016
and LOCALCHARGE>0 and caller like '0543886%';
NVL(SUM(LOCALCHARGE),0) NVL(SUM(USAGE),0)
----------------------- -----------------
0 0
Elapsed: 00:00:03.00
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=22 Card=1 Bytes=53)
1 0 SORT (AGGREGATE)
2 1 PARTITION RANGE (ALL)
3 2 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'LOCALUSAGE' (Cost=22 Card=181 Bytes=9593)
4 3 INDEX (RANGE SCAN) OF 'I_LOCALUSAGE_CALLER' (NON-UNIQUE) (Cost=14 Card=65063)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
16813 consistent gets
569 physical reads
0 redo size
580 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
這次走了索引後速度明顯快多了,用了3秒鐘就返回了結果。
SQL>select nvl(sum(LOCALCHARGE),0) from LOCALUSAGE where to_char(ENDTIME,'YYYYMMDD')=20040816 and caller like '0543888%';
NVL(SUM(LOCALCHARGE),0)
-----------------------
27.6
Elapsed: 00:00:24.73
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=22 Card=1 Bytes=40)
1 0 SORT (AGGREGATE)
2 1 PARTITION RANGE (ALL)
3 2 TABLE ACCESS (BY LOCAL INDEX ROWID) OF 'LOCALUSAGE' (Cost=22 Card=3615 Bytes=144600)
4 3 INDEX (RANGE SCAN) OF 'I_LOCALUSAGE_CALLER' (NON-UNIQUE) (Cost=14 Card=65063)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
129336 consistent gets
7241 physical reads
0 redo size
507 bytes sent via SQL*Net to client
651 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
這個SQL語句走了索引,用了24秒鐘返回結果,性能明顯提高了很多.