通常,我們對硬盤當前的狀態不太好確定,一般通過機房人員巡檢來完成,有沒有通過軟件的方式來檢查確定這個問題呢。MegaCli就可以做到,一般通過 MegaCli 的“Media Error Count”和“Other Error Count”這兩個數值來確定陣列中磁盤是否有問題。
Medai Error Count 表示磁盤可能錯誤,可能是磁盤有壞道,這個值不爲0值得注意,數值越大,危險係數越高; Other Error Count 表示磁盤可能存在鬆動,可能需要重新再插入;
關於MegaCLI的詳細文檔,請參考http://www.ttlsa.com/tools/megacli-tool-query-raid-status/
發現腳本
#!/bin/bash ###raid_id_discover.sh ###wuhf### num=0 RAID_stats() { DISK=($(sudo /usr/local/MegaCli/MegaCli64 -pdlist -aALL | grep "Slot Number" | awk -F":" '{print $2}')) printf '{\n\t"data":[\n' for key in ${DISK[@]};do if [[ "${#DISK[@]}" -gt "$num" && "$num" -ne "$((${#DISK[@]}-1))" ]];then printf "\t\t{\"{#RAID_ID}\":\"$key\"},\n" let "num++" elif [[ "$((${#DISK[@]}-1))" -eq "$num" ]];then printf "\t\t{\"{#RAID_ID}\":\"$key\"}\n" fi done printf '\t]\n}\n' } RAID_stats
鍵值設置
#raid.conf UserParameter=raid_discover,bash /usr/local/zabbix/libexec/raid_id_discover.sh UserParameter=raid_degraded,sudo /usr/local/MegaCli/MegaCli64 -AdpAllInfo -aALL -NoLog | grep "Degraded" |awk '{print $NF}' UserParameter=raid_failed_disks,sudo /usr/local/MegaCli/MegaCli64 -AdpAllInfo -aALL -NoLog | grep "Failed Disks" |awk '{print $NF}' UserParameter=raid_MEC[*],sudo /usr/local/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -A 8 "Slot Number: $1" | grep "Media Error Count" | awk '{print $NF}' UserParameter=raid_OEC[*],sudo /usr/local/MegaCli/MegaCli64 -PDList -aAll -NoLog | grep -A 8 "Slot Number: $1" | grep "Other Error Count" | awk '{print $NF}'
權限設置
chmod 755 /usr/local/zabbix/libexec/raid_id_discover.sh chown zabbix.zabbix /usr/local/zabbix/libexec/raid_id_discover.sh chown zabbix.zabbix /usr/local/zabbix/etc/zabbix_agentd.conf.d/raid.conf echo "zabbix ALL=(root) NOPASSWD:ALL" >> /etc/sudoers sed -i 's/^Defaults.*.requiretty/#Defaults requiretty/' /etc/sudoers
模板導入
說明:
要理解模板首先要了解MegaCLI命令的詳情,這個百度教程有很多;
我提供的模板是在zabbix-3.0的環境上運行的,低版本可能不兼容,只要理解了鍵值的意義自己可以自定義模板;