通過Nagios監控Weblogic服務
1. 前言
本文主要介紹如何通過Nagios軟件來監控Weblogic服務運行狀況,其中主要包括Weblogic Server以及Weblogic JDBC Pool的運行狀態。Nagios的插件中本身並不提供對於Weblogic服務監控的功能,所以要根據Nagios Plugin API編寫自己的腳本,擴展其插件,完成我們所需要的功能。對於Weblogic運行狀態信息的獲得需通過JMX。
本文參考了Nagios3的官方文檔中有關Nagios Plugin部分,以及Weblogic官方文檔有關JMX和命令行部分,具體的Weblogic版本是8.14。
2. Nagios Plugin API概述
作爲一個Nagios插件,無論你是用腳本(如shell、perl)還是用c編譯後的可執行程序實現,它必須至少完成兩件事,
1、退出時有一個返回值。
2、至少向標準輸出設備(STDOUT)輸出一行文本。
返回值定義:
Plugin Return Code |
Service State |
Host State |
0 |
OK |
UP |
1 |
WARNING |
UP or DOWN/UNREACHABLE* |
2 |
CRITICAL |
DOWN/UNREACHABLE |
3 |
UNKNOWN |
DOWN/UNREACHABLE |
輸出文本至少要一行,其信息主要反映被監控應用、服務的狀態。
例如:DISK OK - free space: / 3326 MB (56%);
3. 監控Weblogic的實現方法
對於Weblogic運行狀況的獲得,我們是通過命令行的方式實現的,通過調用Weblogic的weblogic.Admin類實現的。這個類的功能很強大,可以通過它管理和配置Weblogic。
以下介紹幾個常用的命令寫法。
1、獲得server運行狀態
$ java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} get -pretty \
-mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${SERVER_NAME},Type=ServerRuntime” |
2、獲得JDBC Pool運行狀態
$ java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} GET -pretty \
-mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${POOL_NAME},ServerRuntime=
${SERVER_NAME},Type=JDBCConnectionPoolRuntime" |
將黃色標記部分的變量替換成相應真實環境值即可。
${URL} |
weblogic的URL,例如t3://192.168.1.2:7002 |
${USER_NAME} |
用戶名 |
${PASS_WORD} |
密碼 |
${DOMAIN_NAME} |
weblogic域的名稱,如mydomain |
${SERVER_NAME} |
Server名 |
${POOL_NAME} |
JDBC Pool名稱 |
在運行上述命令前需要設置JAVA_HOME,並且將$JAVA_HOME/bin添加到PATH中,將weblogic的weblogic81/server/lib/weblogic.jar包添加到CLASSPATH中。
4. 具體實現的shell腳本
有了監控的方法,根據Nagios Plugin API規則編寫自己的shell實現腳本。具體的shell腳本如下:
check_wls.sh
#!/bin/ksh
#check_wls.sh --jdbcpool url username password domainname servername poolname
#check_wls.sh --server url username password domainname servername
PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION=`echo '$Revision: 1749 $' | sed -e 's/[^0-9.]//g'`
. $PROGPATH/utils.sh
print_usage() {
echo "Usage:"
echo " $PROGNAME --jdbcpool url username password domainname servername poolname
echo " $PROGNAME --server url username password domainname servername
echo " $PROGNAME --help"
echo " $PROGNAME --version"
}
print_help() {
print_revision $PROGNAME $REVISION
echo ""
print_usage
echo ""
echo "Check Weblogic status"
echo ""
echo "--jdbcpool url username password domainname servername poolname"
echo " Check Weblogic JDBC Pool"
echo "--server url username password domainname servername"
echo " Check Weblogic Server"
}
if [[ -z "$JAVA_HOME" ]]
then
echo "Please set JAVA_HOME!"
exit $STATE_UNKNOWN
fi
if [[ -z "$CLASSPATH" ]]
then
echo "Please set CLASSPATH!"
exit $STATE_UNKNOWN
else
echo $CLASSPATH | grep "weblogic.jar" | wc -l | read N
if [[ "$N" = "0" ]]
then
echo "Please add weblogic.jar to CLASSPATH!"
exit $STATE_UNKNOWN
fi
fi
PATH=$JAVA_HOME/bin:$PATH
export PATH
JDBC_TYPE="JDBCConnectionPoolRuntime"
SERVER_TYPE="ServerRuntime"
cmd="$1"
# Information options
case "$cmd" in
--help)
print_help
exit $STATE_OK
;;
-h)
print_help
exit $STATE_OK
;;
--version)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
-V)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
esac
case "$cmd" in
--server)
URL=${2}
USER_NAME=${3}
PASS_WORD=${4}
DOMAIN_NAME=${5}
SERVER_NAME=${6}
SERVER_INFO="${DOMAIN_NAME}:${SERVER_NAME}"
RE=`java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} get -pretty \
-mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${SERVER_NAME},Type=${SERVER_TYPE}"`
printf "${RE}" | grep ^"-" | wc -l | read N
if [[ "$N" -lt "1" ]]
then
#error
printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO
echo "CRITICAL - ${ERR_INFO}"
exit $STATE_CRITICAL
fi
if [[ "$N" -ge "1" ]]
then
HEALTH_STATE=""
RUN_STATE=""
#HealthState State
printf "${RE}" | while read NAME VALUE
do
#PoolState WaitingForConnectionCurrentCount State
#echo "NAME:${NAME} VALUE:${VALUE}"
case "${NAME}" in
HealthState:)
HEALTH_STATE=${VALUE}
;;
State:)
RUN_STATE=${VALUE}
;;
esac
done
#echo "HEALTH_STATE:${HEALTH_STATE}"
#echo "RUN_STATE:${RUN_STATE}"
HEALTH_STATE_INFO=${HEALTH_STATE}
echo ${HEALTH_STATE_INFO} | awk -F, '{ print $1 }' | awk -F: '{ print $2 }' | read HEALTH_STATE
#echo "HEALTH_STATE:${HEALTH_STATE}"
#HEALTH_OK HEALTH_WARN HEALTH_CRITICAL HEALTH_FAILED
if [[ "${RUN_STATE}" != "RUNNING" ]]
then
echo "CRITICAL - ${SERVER_INFO} State is ${RUN_STATE}"
exit $STATE_CRITICAL
fi
case "${HEALTH_STATE}" in
EALTH_OK)
;;
HEALTH_WARN)
echo "WARN - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}"
exit $STATE_WARNING
;;
HEALTH_CRITICAL)
echo "CRITICAL - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}"
exit $STATE_CRITICAL
;;
HEALTH_FAILED)
echo "FAILED - ${SERVER_INFO} HealthState is ${HEALTH_STATE_INFO}"
exit $STATE_CRITICAL
;;
esac
fi
echo "OK - ${SERVER_INFO} State is ${RUN_STATE},HealthState is ${HEALTH_STATE_INFO}"
exit $STATE_OK
;;
--jdbcpool)
URL=${2}
USER_NAME=${3}
PASS_WORD=${4}
DOMAIN_NAME=${5}
SERVER_NAME=${6}
POOL_NAME=${7}
POOL_INFO="${DOMAIN_NAME}:${SERVER_NAME}:${POOL_NAME}"
RE=`java weblogic.Admin -url ${URL} -username ${USER_NAME} -password ${PASS_WORD} GET -pretty \
-mbean "${DOMAIN_NAME}:Location=${SERVER_NAME},Name=${POOL_NAME},ServerRuntime=${SERVER_NAME},Type=${JDBC_TYPE}"`
printf "${RE}" | grep ^"-" | wc -l | read N
if [[ "$N" -lt "1" ]]
then
#error
printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO
echo "CRITICAL - ${ERR_INFO}"
exit $STATE_CRITICAL
fi
if [[ "$N" -ge "1" ]]
then
POOL_STATE=""
WAIT_CNT=""
RUN_STATE=""
printf "${RE}" | while read NAME VALUE
do
#PoolState WaitingForConnectionCurrentCount State
#echo "NAME:${NAME} VALUE:${VALUE}"
case "${NAME}" in
PoolState:)
POOL_STATE=${VALUE}
;;
WaitingForConnectionCurrentCount:)
WAIT_CNT=${VALUE}
;;
State:)
RUN_STATE=${VALUE}
;;
esac
done
#echo "POOL_STATE:${POOL_STATE}"
#echo "WAIT_CNT:${WAIT_CNT}"
#echo "RUN_STATE:${RUN_STATE}"
if [[ "${POOL_STATE}" != "true" ]]
then
echo "CRITICAL - ${POOL_INFO} PoolState is ${POOL_STATE}"
exit $STATE_CRITICAL
fi
if [[ "${RUN_STATE}" != "Running" ]]
then
echo "CRITICAL - ${POOL_INFO} State is ${RUN_STATE}"
exit $STATE_CRITICAL
fi
if [[ "${WAIT_CNT}" -gt "0" ]]
then
echo "WARNING - ${POOL_INFO} WaitingForConnectionCurrentCount is ${WAIT_CNT}"
exit $STATE_WARNING
fi
else
#error
printf "${RE}" | awk '{ printf $0 }' | read ERR_INFO
echo "CRITICAL - ${ERR_INFO}"
exit $STATE_CRITICAL
fi
echo "OK - ${POOL_INFO} State is ${RUN_STATE},PoolState is ${POOL_STATE},WaitingForConnectionCurrentCount is ${WAIT_CNT}"
exit $STATE_OK
;;
*)
print_usage
exit $STATE_UNKNOWN
;;
esac
|
5. 配置Weblogic監控
將check_wls.sh上傳到Nagios軟件的libexec目錄下,並創建一個ln文件check_wls。
$ ln -s ./check_wls.sh ./check_wls |
在nrpe的配置文件中增加相關的命令定義。
Weblogic的具體配置信息如下,
${URL} |
t3://172.17.1.2:7001 |
${USER_NAME} |
weblogic |
${PASS_WORD} |
weblogic |
${DOMAIN_NAME} |
mydomain |
${SERVER_NAME} |
myserver |
${POOL_NAME} |
mypool |
編輯nrpe.cfg文件,增加如下內容,
$ vi ./nrpe.cfg
... .... ... .... ... .... ... .... ... .... ... ....
#check weblogic [check_wls]
command[check_wls_server_myserver]=/usr/local/nagios//libexec/check_wls --server t3://172.2.10.2:7001 weblogic weblogic mydomain myserver
command[check_wls_jdbcpool_mypool]=/usr/local/nagios//libexec/check_wls --jdbcpool t3://172.2.10.2:7001 weblogic weblogic mydomain myserver mypool |
在nrpe的啓動腳本中添加環境變量(CLASSPATH、JAVA_HOME)
... .... ... .... ... .... ... .... ... .... ... ....
JAVA_HOME=/data/bea/bea/jdk142_05
export JAVA_HOME
CLASSPATH=/data/bea/bea/weblogic81/server/lib/weblogic.jar
export CLASSPATH
... .... ... .... ... .... ... .... ... .... ... .... |
編輯監控主機的nagios.cfg文件,添加如下內容。
$ vi ./nagios.cfg
... .... ... .... ... .... ... .... ... .... ... ....
# Define a host for the local machine
define host{
use linux-box ; Name of host template to use
; This host definition will inherit all variables that are defined
; in (or inherited by) the linux-server host template definition.
host_name sol_172.2.10.2
alias sol_172.2.10.2
address 172.2.10.2
}
#the check_wls_server_myserver on the remote host.
define service{
use generic-service
host_name sol_172.2.10.2
service_description Weblogic Server myserver
check_command check_nrpe!check_wls_server_myserver
}
#the check_wls_jdbcpool_mypool on the remote host.
define service{
use generic-service
host_name sol_172.2.10.2
service_description Weblogic JDBCPool mypool
check_command check_nrpe!check_wls_jdbcpool_mypool
} |
驗證配置是否正確。
重啓監控主機上的nagios服務以及遠程主機上的nrpe服務。
通過IE觀察監控情況。
圖5.1 |
|
就此配置工作完成。
6. 結語
本文介紹了一種通過Nagios監控Weblogic應用的實現方式,按照Nagios Plugin API規則編寫自己的Shell腳本實現該功能,並簡單的描述了配置過程,提供了Shell源碼。希望大家指正。