Oracle壞塊修復

 
數據庫壞塊(corruption) 的類型可以按照壞塊所屬對象的不同,分爲用戶數據壞塊,數據字典壞塊,Undo壞塊,控制文件壞塊,Redo壞塊,Lob壞塊,index壞塊等等;也可以按照壞塊產生的原因,分爲物理壞塊(physical corruption)和邏輯壞塊(logical corruption )。

物理壞塊(physical corruption)
常見的物理壞塊(Physical Block Corruptions)有塊頭和塊尾信息不一致(Fractured/Incomplete),checksum值無效,數據塊信息全部爲0等情況,並且可能伴隨錯誤ORA-1578和ORA-1110

爲了及時發現物理壞塊和準確定位壞塊產生的原因,oracle建議設置初始化參數DB_BLOCK_CHECKSUM=TYPICAL(默認值)。一般情況下,物理壞塊是由於底層OS/disk系統錯誤/損壞,導致數據塊被修改,數據塊標誌爲壞塊(corruption)。
數據塊的Checksum值無效是一種常見的物理壞塊,當數據庫初始化參數DB_BLOCK_CHECKSUM=TYPICAL(默認值)時,DBWR進程將數據塊寫入disk時會計算數據塊的Checksum,並且將Checksum值記錄在數據塊的位置offset 16和17;當從disk讀取該數據塊時,oracle重新計算數據塊的Checksum,並且與記錄在數據塊中的Checksum做異或運算(Xor),如果異或結果爲非0,說明數據塊被修改過,數據塊爲壞塊(corruption)。

一、壞塊的產生原因:


1.硬件問題


Oracle進程在處理一個數據塊時,首先將其讀入物理內存空間,在處理完成後,再由特定進程將其寫回磁盤;如果在這個過程中,出現內存故障,CPU計算失誤,都會導致內存數據塊的內容混亂,最後反映到寫回磁盤的數據塊內容有誤。同樣,如果存儲子系統出現異常,數據塊損壞也就隨之出現了。


2.操作系統BUG


由於Oracle進程對數據塊的讀寫,都是以操作系統內核調用(system call)的方式完成的,如果操作系統在內核調用存在問題,必然導致Oracle進程寫入非法的內容。


3.操作系統的I/O錯誤或緩衝問題


4.內存或paging問題


Oracle軟件BUG Oracle軟件特定版本上,可能出現導致數據塊的內容出現異常BUG。


5.非Oracle進程擾亂Oracle共享內存區域


在當數據塊的內容被讀入主機的物理內存時,如果其他非Oracle進程,對Oracle使用的共享內存區域形成了擾亂,最終導致寫回磁盤的數據塊內容混亂。


6.異常關機,掉電,終止服務


異常關機,掉電,終止服務使進程異常終止,而破壞數據塊的完整性,導致壞塊產生。
注:這也是爲什麼突然斷電會導致數據庫無法啓動
由上可見,壞塊的形成原因複雜。當出現壞塊時,爲了找到確切的原因,需要大量的分析時間和排查操作,甚至需要多次重現才能找出根本原因。但當故障發生在生產系統上,我們爲了減少停機時間,會盡快實施應急權變措施以保證系統的可用性,這樣就破壞了故障現場,對根本原因的分析因而也更加困難了。

二、壞塊預防(檢查)


1.對於Oracle bug問題引起的物理壞塊問題,Oracle會對這些BUG以嚴重(Noticable)問題標出(標記爲*或+)相應的patch。
2.使用 RMAN進行檢查:
RMAN> BACKUP CHECK LOGICAL VALIDATE DATAFILE n ; --可以檢查數據文件是否包含壞塊,同時並不產生實際的備份輸出。
3.使用dbv工具檢查
ANALYZE TABLE tablename VALIDATE STRUCTURE CASCADE
它執行壞塊的檢查,但是不會標記壞塊爲corrupt,檢測的結果保存在USER_DUMP_DEST目錄下的用戶trace文件中。
dbv file=d:\oracle\oradata\mydb\RONLY.DBF blocksize=8192
4.使用dbms包檢查
①根據alert中的報錯file_id和block_id查詢對象
SELECT tablespace_name, segment_type, owner, segment_name
FROM dba_extents
WHERE file_id = &fileid
and &blockid between block_id AND block_id + blocks - 1;

-If V$DATABASE_BLOCK_CORRUPTION contains rows please run this query to find the objects that contains the corrupted blocks:
SELECT e.owner,
e.segment_type,
e.segment_name,
e.partition_name,
c.file#,
greatest(e.block_id, c.block#) corr_start_block#,
least(e.block_id + e.blocks - 1, c.block# + c.blocks - 1) corr_end_block#,
least(e.block_id + e.blocks - 1, c.block# + c.blocks - 1) -
greatest(e.block_id, c.block#) + 1 blocks_corrupted,
null description
FROM dba_extents e, v$database_block_corruption c
WHERE e.file_id = c.file#
AND e.block_id <= c.block# + c.blocks - 1
AND e.block_id + e.blocks - 1 >= c.block#
UNION
SELECT s.owner,
s.segment_type,
s.segment_name,
s.partition_name,
c.file#,
header_block corr_start_block#,
header_block corr_end_block#,
1 blocks_corrupted,
'Segment Header' description
FROM dba_segments s, v$database_block_corruption c
WHERE s.header_file = c.file#
AND s.header_block between c.block# and c.block# + c.blocks - 1
UNION
SELECT null owner,
null segment_type,
null segment_name,
null partition_name,
c.file#,
greatest(f.block_id, c.block#) corr_start_block#,
least(f.block_id + f.blocks - 1, c.block# + c.blocks - 1) corr_end_block#,
least(f.block_id + f.blocks - 1, c.block# + c.blocks - 1) -
greatest(f.block_id, c.block#) + 1 blocks_corrupted,
'Free Block' description
FROM dba_free_space f, v$database_block_corruption c
WHERE f.file_id = c.file#
AND f.block_id <= c.block# + c.blocks - 1
AND f.block_id + f.blocks - 1 >= c.block#
order by file#, corr_start_block#;

②給定一個表空間,並在此表空間下建立維修表:
BEGIN
 DBMS_REPAIR.ADMIN_TABLES (
 TABLE_NAME => 'REPAIR_TABLE',
 TABLE_TYPE => dbms_repair.repair_table,
 ACTION => dbms_repair.create_action,
 TABLESPACE => '&tablespace_name');
END;
/
③對指定的<schema>.<object>檢查並確認其中壞塊(如果同時指定PARTITION_NAME也可以進行分區級別檢查):
set serveroutput on
DECLARE num_corrupt INT;
BEGIN
 num_corrupt := 0;
 DBMS_REPAIR.CHECK_OBJECT (
 SCHEMA_NAME => '&schema_name',
 OBJECT_NAME => '&object_name',
 REPAIR_TABLE_NAME => 'REPAIR_TABLE',
 corrupt_count => num_corrupt);
 DBMS_OUTPUT.PUT_LINE('number corrupt: ' || TO_CHAR (num_corrupt));
END;
/

5.利用exp工具導出整個數據庫可以檢測壞塊
對以下情況的壞塊是檢測不出來的:
①HWM以上的壞塊是不會發現的
②索引中存在的壞塊是不會發現的
③數據字典中的壞塊是不會發現的


三、修復方式


1. 當前數據庫初始化參數配置DB_BLOCK_CHECKSUM=TYPICAL,因此從disk讀取數據塊時校驗checksum:

SQL> show parameter DB_BLOCK_CHECKSUM

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_block_checksum                    string      TYPICAL

 

2. 查詢表dept時發現有壞塊,報錯信息ORA-1578和ORA-1110,壞塊爲file # 4, block # 133

SQL> select * from dept;
 select * from dept
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 133)
ORA-01110: data file 4: '/u01/app/oracle/oradata/orcl/users01.dbf'

 

3. 出現以上錯誤的同時在alert log中也有詳細錯誤信息,這些錯誤信息說明數據塊(file # 4, block # 133)損壞的原因是checksum無效。數據塊中記錄的checksum值爲0x8167(這個值是上一次DBWR寫入磁盤時計算的),讀取數據塊時重新計算得到的checksum是0x8122,checksum值異或運算(Xor)的結果是0x45 (computed block checksum)。由於兩次checksum值不同(即異或結果爲非0),說明數據塊被修改過,數據塊爲壞塊(corruption)。

Alert log錯誤信息:

Hex dump of (file 4, block 133) in trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_20892.trc
Corrupt block relative dba: 0x01000085 (file 4, block 133)
Bad check value found during multiblock buffer read  <<<<<<<<<<<<<< 說明壞塊的原因是checksum無效
Data in bad block:
 type: 6 format: 2 rdba: 0x01000085
 last change scn: 0x0000.0023d69a seq: 0x5 flg: 0x06
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0xd69a0605
 check value in block header: 0x8167   <<<<<<<<<<<<<< 數據塊中記錄的checksum值爲0x8167
 computed block checksum: 0x45         <<<<<<<<<<<<<< 0x8167與0x8122異或運算(Xor)的結果是0x45
Reading datafile '/u01/app/oracle/oradata/orcl/users01.dbf' for corruption at rdba: 0x01000085 (file 4, block 133)
Reread (file 4, block 133) found same corrupt data (no logical check)
Sun Mar 23 22:53:40 2014
Corrupt Block Found
         TSN = 4, TSNAME = USERS
         RFN = 4, BLK = 133, RDBA = 16777349
         OBJN = 14343, OBJD = 14343, OBJECT = DEPT, SUBOBJECT = 
         SEGMENT OWNER = JAMES, SEGMENT TYPE = Table Segment         <<<<<<<<<<<<<< 壞塊對應的object ID
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_20892.trc  (incident=182595):
ORA-01578: ORACLE data block corrupted (file # 4, block # 133)
ORA-01110: data file 4: '/u01/app/oracle/oradata/orcl/users01.dbf'

4.1 對應的orcl_ora_20892.trc中也有數據塊的信息,其中數據塊上記錄的checksum值是0x8167(chkval)

Block dump from disk:
buffer tsn: 4 rdba: 0x01000085 (4/133)
scn: 0x0000.0023d69a seq: 0x05 flg: 0x06 tail: 0xd69a0605
frmt: 0x02 chkval: 0x8167 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1


4.2 通過dd也查看數據塊中記錄的checksum值, offset 16,17 對應的是checksum值0x8167

$ dd if=/u01/app/oracle/oradata/orcl/users01.dbf bs=8192 count=1 skip=133 of=/tmp/dd133.out

$ od -x /tmp/dd133.out
0000000 a206 0000 0085 0100 d69a 0023 0000 0605
0000020 8167 0000 0001 0000 3807 0000 2fef 000c
^^^^

解決方法:

5. 修復數據壞塊的方法可以通過備份恢復或者DBMS_REPAIR.SKIP_CORRUPT_BLOCKS跳過壞塊。
 方法1 RMAN數據塊恢復:
首先要存在Rman的最新備份集,然後執行如下命令:

RMAN>backup validate datafile 4;

RMAN> run {blockrecover datafile 4 block 133;}

SQL> select * from dept;

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     DALIAN
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON

 

方法2 bbed方法修改文件頭


邏輯壞塊:


alert日誌報錯:

Reading datafile '/oradata/datafiles/oadb/oa01.dbf' for corruption at rdba: 0x016d4dd5 (file 5, block 2969045)
Reread (file 5, block 2969045) found same corrupt data (no logical check)
Tue Aug 18 10:53:51 2015
Corrupt Block Found
        TSN = 6, TSNAME = OA
        RFN = 5, BLK = 2969045, RDBA = 23940565
        OBJN = 95690, OBJD = 95690, OBJECT = EDOC_BASE_WORKFLOW, SUBOBJECT = 
        SEGMENT OWNER = INSPUROA, SEGMENT TYPE = Table Segment
Tue Aug 18 10:55:03 2015
Hex dump of (file 5, block 2969045) in trace file /u01/app/oracle/diag/rdbms/oadb/oadb/trace/oadb_ora_4565.trc
Corrupt block relative dba: 0x016d4dd5 (file 5, block 2969045)
Bad header found during buffer read
Data in bad block:
 type: 117 format: 0 rdba: 0x20206b73
 last change scn: 0x2020.20202020 seq: 0x20 flg: 0x20
 spare1: 0x64 spare2: 0x69 spare3: 0x0
 consistency value in tail: 0x4d240601
 check value in block header: 0x5f49
 block checksum disabled
Reading datafile '/oradata/datafiles/oadb/oa01.dbf' for corruption at rdba: 0x016d4dd5 (file 5, block 2969045)
Reread (file 5, block 2969045) found same corrupt data (no logical check)
Tue Aug 18 10:55:03 2015
Corrupt Block Found
        TSN = 6, TSNAME = OA
        RFN = 5, BLK = 2969045, RDBA = 23940565
        OBJN = 95690, OBJD = 95690, OBJECT = EDOC_BASE_WORKFLOW, SUBOBJECT = 
        SEGMENT OWNER = INSPUROA, SEGMENT TYPE = Table Segment
Tue Aug 18 10:57:29 2015
Hex dump of (file 5, block 2969045) in trace file /u01/app/oracle/diag/rdbms/oadb/oadb/trace/oadb_ora_21708.trc
Corrupt block relative dba: 0x016d4dd5 (file 5, block 2969045)
Bad header found during buffer read
Data in bad block:
 type: 117 format: 0 rdba: 0x20206b73
 last change scn: 0x2020.20202020 seq: 0x20 flg: 0x20
 spare1: 0x64 spare2: 0x69 spare3: 0x0
 consistency value in tail: 0x4d240601
 check value in block header: 0x5f49
 block checksum disabled

執行修復 根據報錯信息

方法1

Reading datafile '/oradata/datafiles/oadb/oa01.dbf' for corruption at rdba: 0x016d4dd5 (file 5, block 2969045)
Reread (file 5, block 2969045) found same corrupt data (no logical check)
Corrupt Block Found
        TSN = 6, TSNAME = OA
        RFN = 5, BLK = 2969045, RDBA = 23940565
        OBJN = 95690, OBJD = 95690, OBJECT = EDOC_BASE_WORKFLOW, SUBOBJECT = 
        SEGMENT OWNER = INSPUROA, SEGMENT TYPE = Table Segment

確定數據文件 datafile 5,oa01.dbf出現壞塊現象
1.查看壞塊信息:

SQL> select * from v$database_block_corruption;

    FILE#    BLOCK#    BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        5    2969045          1                  0 CORRUPT

確定壞塊爲2969045號,檢查備份日誌(增量,全量)是否完整備份
2.檢查備份datafile 5 是否完整

RMAN> backup validate datafile 5;

Starting backup at 18-AUG-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=982 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00005 name=/oradata/datafiles/oadb/oa01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:05:35
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    FAILED 0              1840        4190720        9484751217293
  File Name: /oradata/datafiles/oadb/oa01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data      0              2842014        
  Index      0              182983          
  Other      1              1163883       

validate found one or more corrupt blocks
See trace file /u01/app/oracle/diag/rdbms/oadb/oadb/trace/oadb_ora_13513.trc for details
Finished backup at 18-AUG-15


3.使用RMAN工具修復
RMAN> blockrecover datafile 5 block 2969045;
4.再次查詢故障塊信息:
SQL> select * from v$database_block_corruption;

no rows selected

方法2 DBMS_REPAIR.SKIP_CORRUPT_BLOCKS跳過壞塊,然後將dept表中的其他數據導出重建表
設置多塊讀
alter session set db_file_multiblock_read_count=1;

對檢查出的壞塊,可選擇性地進行標記:
select BLOCK_ID, CORRUPT_TYPE, CORRUPT_DESCRIPTION
from REPAIR_TABLE;
 
REM Mark the identified blocks as corrupted ( Soft Corrupt - reference Note 1496934.1 )
DECLARE num_fix INT;
BEGIN
 num_fix := 0;
 DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
 SCHEMA_NAME => '&schema_name',
 OBJECT_NAME=> '&object_name',
 OBJECT_TYPE => dbms_repair.table_object,
 REPAIR_TABLE_NAME => 'REPAIR_TABLE',
 FIX_COUNT=> num_fix);
 DBMS_OUTPUT.PUT_LINE('num fix: ' || to_char(num_fix));
END;
/
在將來進行DML操作時,對壞塊進行跳過處理:
BEGIN
 DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (
 SCHEMA_NAME => '&schema_name',
 OBJECT_NAME => '&object_name',
 OBJECT_TYPE => dbms_repair.table_object,
 FLAGS => dbms_repair.SKIP_FLAG);
END;
/
注意:
使用DBMS_REPAIR訪問壞塊後,INDEX scan可能會出現報錯,碰到這類報錯,你需要重建這些索引。如果是唯一索引,那麼相同數據的重新插入可能會報ORA-1錯誤。
如果在dbms_repair.SKIP_FLAG已經啓用後,希望將跳塊標記清除以重新訪問壞塊,可以在執行DBMS_REPAIR.SKIP_CORRUPT_BLOCKS時,使用dbms_repair.NOSKIP_FLAG進行參數設置。
使用DBMS_REPAIR.SKIP_CORRUPT_BLOCKS來跳塊僅能針對出現ORA-1578報錯的那些壞塊情況。如果是針對其它類型壞塊,就需要額外執行ADMIN_TABLES, CHECK_OBJECT 和FIX_CORRUPT_BLOCKS來對壞塊進行標記處理。
在執行過SKIP_CORRUPT_BLOCKS後,如果需要將表中的壞塊進行清理,可以對錶使用”alter table <name> MOVE”,而不是重建或truncate掉它。然後使用dbms_repair.NOSKIP_FLAG去除掉跳塊標記即可。注意,壞塊中的數據會被丟失掉。
 
SQL> alter session set db_file_multiblock_read_count=1;
 
SQL> create table dept_new as select * from dept;

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章