hung_task_timeout_secs

問題很簡單,但是之前沒有碰到過,大概是在readhat上裝數據庫較少吧,記錄一下:
客戶有一臺服務器,安裝了VMW軟件做了虛擬化,在其上搭建了一臺readhat虛擬機,起初給的內存爲16G,在添加了12G的內存後,將虛擬機的內存調整到了20G
調整完後主機這邊就一直報錯:
Nov 5 13:05:41 RedHat5 kernel: INFO: task oracle:22439 blocked for more than 120 seconds.
Nov 5 13:05:41 RedHat5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
查詢了資料後對於該參數的瞭解爲後臺對進行的任務由於超時而掛起
從以上的報錯信息也給出了簡單的解決方案,就是禁止該120秒的超時:echo 0 > /proc/sys/kernel/hung_task_timeout_secs
隨後詢問了主機工程師:給出方案是按照告警裏的提示將該提醒disable

後續詢問後給出如下解釋:
This is a know bug. By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous.
For flushing out this data to disk this there is a time limit of 120 seconds by default.
In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds.
This especially happens on systems with a lof of memory.

The problem is solved in later kernels and there is not “fix” from Oracle.
I fixed this by lowering the mark for flushing the cache from 40% to 10% by setting “vm.dirty_ratio=10″ in /etc/sysctl.conf.
This setting does not influence overall database performance since you hopefully use Direct IO and bypass the file system cache completely.
告知是linux會設置40%的可用內存用來做系統cache,當flush數據時這40%內存中的數據由於和IO同步問題導致超時(120s),所將40%減小到10%,避免超時。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章