cpu/memory監控腳本的怪異現象

一開始nagios執行監控腳本的時候,總是不定期地偶爾地出現下面的錯誤

[: too many arguments

腳本一開始是這麼寫的:

guard_info=`ps aux | grep guard | grep -v grep | grep guard_replica_id | awk '{print $2}'`
echo guard_info:$guard_info
if [ "$guard_info" = "" ]; then
  echo "guard is not found"
  exit 0
fi
vsz=`echo $guard_info | awk '{print $3}'`
echo vsz:$vsz
cpu=`echo $guard_info | awk '{print $4}' | awk -F"." '{print $1}'`
echo cpu:$cpu


if [ $vsz -gt 606192 ]; then
  echo "vsz as large as $vsz"
  exit 2
elif [ $cpu -gt 5 ]; then
  echo "cpu percentage is $p_cpu"
  exit 2
else
  exit 0
fi

最後發現是guard本身會fork子進程,在調用execve之前和fork()之後有那麼一個小時間窗口,這個時候會存在兩個相同名字相同參數的進程存在。

解決辦法就是用elapsed time來對grep出來的guard進程進行排序,取時間最長的那個。最後腳本變成這樣:

guard_info=`ps -C guard -o pid=,etime=,vsz=,%cpu= --sort -etime | head -1`
echo guard_info:$guard_info
if [ "$guard_info" = "" ]; then
  echo "guard is not found"
  exit 0
fi
vsz=`echo $guard_info | awk '{print $3}'`
echo vsz:$vsz
cpu=`echo $guard_info | awk '{print $4}' | awk -F"." '{print $1}'`
echo cpu:$cpu


if [ $vsz -gt 606192 ]; then
  echo "vsz as large as $vsz"
  exit 2
elif [ $cpu -gt 5 ]; then
  echo "cpu percentage is $p_cpu"
  exit 2
else
  exit 0
fi

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章