linux structure needs cleaning結構需要清理

最近遇到一個類似的故障Bug624293-XFS internal error/mount: Structure need scleaning

容器引擎啓動失敗,/home/robot/docker下報錯structure needs cleaning。看Linux 操作系統日誌也是上邊的報錯。

首先問自己爲什麼(why)出現structure  needs cleaning?什麼時間(when)會出現structure needs cleaning?怎麼(how)恢復環境?

Try to repair:首先嚐試修復

[root@scheat tmp]# xfs_check /dev/vdb

xfs_check: 無法初始化數據cannot init perag data (117)

ERROR:文件系統在日誌中有重要的元數據更改,需要重播。 The filesystem has  valuable metadata changes in a log which  needs to be replayed.  掛載文件系統重播日誌,卸載文件系統前首先運行xfs_check (Mount the  filesystem to replay the log, and unmount it before re-running xfs_check). 如果無法卸載文件系統則使用xfs_repair -L 參數破壞日誌並嘗試修復。 If you are  unable to mount the filesystem, then use the xfs_repair -L option  to destroy the  log and attempt a repair.

Note that destroying the log may cause  corruption -- please attempt a mount of the  filesystem before doing this.

[root@scheat tmp]# xfs_repair /dev/vdb

Phase 1 - find and verify superblock...

Phase 2 - using internal log

        - zero log...

ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed.  Mount the filesystem to replay the log, and unmount it before re-running  xfs_repair.  If you are unable to mount  the filesystem, then use the -L option to  destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount of the  filesystem before doing this.

[root@scheat tmp]#

xfs_metadump  -g /dev/vdb ./dev-vdb.dump

xfs_metadump: cannot init perag data (117)

Copying log                                               

[root@scheat tmp]

nothing help

going forward with:下一步-L修復

xfs_repair -L /dev/vdb

lot of errors!

Timeline of the Problem:問題的時間表

- everything went fine I installing a new  virtual Fileserver

- The Host has a 3Ware Controller in:

I have a 3Ware 9690SA-8I Controller with 4 x 2TB Disks ( RAID 10 for data ) and 2 x 320GB ( for OS ).

Then I do a reboot to clean the system and checks if all OK. There one Disks disappear from the RAID 10. Most likly because I don't set it to fix Link Speed = 1.5 Gbps. Then I rebuild the array but I couldn't mount it because of Metadata Problems !

I also see  the message:

Aug 15 20:30:05 scheat kernel: Filesystem "vdb": Disabling barriers, trial barrier write failed

Does this filesystem Problems only happen because of the disapperd Disk and the wrong Link Speed(是否僅由於缺少磁盤和錯誤的鏈接速度而導致此文件系統出現問題) ? or do I need to change something other ?

thanks for help

The array controller should be taking care  of any data integrity problems.磁盤陣列控制器應注意任何數據的完整性問題。

原理篇

Q: What is the problem with the write cache on journaled filesystems?

https://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F

Many drives use a write back cache in order to speed up the performance of writes. However, there are conditions such as power failure when the write cache memory is never flushed to the actual disk. Further, the drive can destage data from the write cache to the platters in any order that it chooses. This causes problems for XFS and journaled filesystems in general because they rely on knowing when a write has completed to the disk通常這會導致XFS和日記文件系統出現問題因爲它們依賴於知道何時完成對磁盤的寫入. They need to know that the log information has made it to disk before allowing metadata to go to disk它們需要知道日誌信息在允許元數據進入磁盤之前已進入磁盤. When the metadata makes it to disk then the transaction can effectively be deleted from the log resulting in movement of the tail of the log and thus freeing up some log space當元數據放入磁盤時則可以有效地從日誌中刪除事務,從而移動日誌尾部,從而釋放一些日誌空間. So if the writes never make it to the physical disk, then the ordering is violated and the log and metadata can be lost, resulting in filesystem corruption因此如果從未寫入物理磁盤,則將違反順序並且日誌和元數據可能會丟失,從而導致文件系統損壞。.

With hard disk cache sizes of currently (Jan 2009) up to 32MB that can be a lot of valuable information. In a RAID with 8 such disks these adds to 256MB, and the chance of having filesystem metadata in the cache is so high that you have a very high chance of big data losses on a power outage.總結一句話:硬盤緩存越大則丟數據的可能性越大。當前(2009年1月)的硬盤緩存大小最大爲32MB,這可能是很多有價值的信息。 在8個此類磁盤的RAID中,硬盤緩存增加到256MB,這樣的話,在高速緩存中有文件系統元數據的機會非常高,以至於停電時很有可能造成大量數據丟失。

With a single hard disk and barriers turned on (on=default), the drive write cache is flushed before and after a barrier is issued. A powerfail "only" loses data in the cache but no essential ordering is violated, and corruption will not occur.在單個硬盤和barriers打開的情況下(on = default),在barrier解決前後都會刷新驅動器寫緩存。 電源故障“僅”會丟失高速緩存中的數據,但不會違反基本順序,也不會發生損壞。

With a RAID controller with battery backed controller cache and cache in write back mode, you should turn off barriers - they are unnecessary in this case, and if the controller honors the cache flushes, it will be harmful to performance. But then you *must* disable the individual hard disk write cache in order to ensure to keep the filesystem intact after a power failure. The method for doing this is different for each RAID controller. See the section about RAID controllers below.對於具有後備電池的控制器緩存和緩存處於回寫模式的RAID控制器,在這種情況下應該關閉barriers,它們是不必要的,並且如果控制器採用高速緩存刷新功能,則將對性能造成危害。 但是,你必須*禁用單個硬盤寫緩存,以確保斷電後保持文件系統完整。 每個RAID控制器對這個的處理方法不同。 請參閱下面有關RAID控制器的部分。

問題清楚了

Thats clear, I already mention that the  maybe the Controller trigger the Problem.

But this night I get another XFS internal  error during a rsync Job:

----Once again, that is not directory block data that is being dumped there. It looks like a  partial path name ("/Pm.Reduzieren/S")  which tends to indicate that the directory  read has returned uninitialisd data.這不是轉儲在那裏的目錄塊數據。看起來像部分路徑名(“ /Pm.Reduzieren/S”),傾向於讀取的目錄已返回未初始化的數據。

Did the filesystem repair cleanly? if you run xfs_repair a second time, did it find more  errors or was it clean? i.e. is this still  corruption left over from the original  incident, or is it new corruption?文件系統修復乾淨了嗎? 如果第二次運行xfs_repair,它是否發現了更多錯誤還是沒有? 是從原始事件遺留的損壞,還是新的損壞?

----The filesystem repair did work fine,  all was Ok. the second was a new Problem.

LSI / 3 Ware now replace the Controller and the BBU Board and also the Battery, because they don't now what's happen.


There where no problem on the Host.

I now disable the write Cache according  the faq: /cX/uX set cache=off

But not sure howto disable the individual Harddisk Cache.

最後的面紗

File system errors can be a little tricky to narrow down. In some of the more rare cases a drive might be writing out bad data. However, per the logs I didn’t see any indication of a drive problem and not one has reallocated a sector. I see that all four are running at the 1.5Gb/s Link Speed now.要減小文件系統錯誤,可能會有些棘手。 在某些較罕見的情況下,驅動器可能會寫出不良數據。 然而根據日誌,沒有看到任何驅動器問題的跡象,也沒有重新分配了扇區。 我看到4個文件系統都以1.5Gb / s的鏈接速度運行。

Sometimes the problem can be traced back to the controller and/or the BBU. I did notice something pretty interesting in the driver message log and the controller’s advanced diagnostic.有時問題可以追溯到控制器或BBU。我確實在驅動程序消息日誌和控制器的高級診斷中發現了一些非常有趣的東西。

According to the driver message log, the last Health Check [capacity test] was done on Aug 10th:驅動消息日誌中最後一次健康檢查操作在8月10號

Aug 10 21:40:35 enif kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x0051): Battery health check started:.

However, the controller’s advanced log shows this:然後控制器的高級日誌顯示如下

/c6/bbu Last Capacity Test        = 10-Jul-2010

There is an issue between controller and BBU and we need to understand which component is at issue. If this is a live server you may want to replace both components. Or if you can perform some troubleshooting, power the system down and remove the BBU and its daughter PCB from the RAID controller. Then ensure the write cache setting remains enabled and see if there’s a reoccurrence. If so the controller is bad. If not it’s the BBU that we need to replace.這是一個在控制器和BBU之間的問題,我們需要理解問題所在的組件模塊。

Just for Information,the Problem was a  Bug in the virtio driver with disks over 2 TB !

Bug605757 - 2tb virtio disk gets massively corrupted filesystems

*** This bug has been marked as a duplicate of bug 605757 ***

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章