網絡監控的應用程序Nagios

什麼是Nagios

Nagios
是一款用於系統和網絡監控的應用程序。它可以在你設定的條件下對主機和服務進行監控,在狀態變差和變好的時候給出告警信息。

Nagios
最初被設計爲在Linux系統之上運行,然而它同樣可以在類Unix的系統之上運行。

Nagios
更進一步的特徵包括:

1.
監控網絡服務(SMTPPOP3HTTPNNTPPING等);
2.
監控主機資源(處理器負荷、磁盤利用率等);
3.
簡單地插件設計使得用戶可以方便地擴展自己服務的檢測方法;
4.
並行服務檢查機制;
5.
具備定義網絡分層結構的能力,用"parent"主機定義來表達網絡主機間的關係,這種關係可被用來發現和明晰主機宕機或不可達狀態;
6.
當服務或主機問題產生與解決時將告警發送給聯繫人(通過EMail、短信、用戶定義方式);
7.
具備定義事件句柄功能,它可以在主機或服務的事件發生時獲取更多問題定位;
8.
自動的日誌回滾;
9.
可以支持並實現對主機的冗餘監控;
10.
可選的WEB界面用於查看當前的網絡狀態、通知和故障歷史、日誌文件等;

2.2.
系統需求

Nagios
所需要的運行條件是機器必須可以運行Linux(或是Unix變種)並且有C語言編譯器。你必須正確地配置TCP/IP協議棧以使大多數的服務檢測可以通過網絡得以進行。

你需要但並非必須正確地配置Nagios裏的CGIs程序,而一旦你要使用CGI程序時,你必須要安裝以下這些軟件...

1.
一個WEB服務(最好是Apache
2. Thomas Boutell
製作的gd庫版本應是1.6.3或更高(在CGIs程序模塊statusmaptrends這兩個模塊裏需要這個庫)

安裝Nagios


1
、準備工作
a.
安裝依賴的程序
]# yum -y install httpd gcc glibc glib-common gd gd-devel
b.
創建用戶和組
[root@localhost ~]# useradd nagios
[root@localhost ~]# groupadd nagcmd
[root@localhost ~]# usermod -G nagcmd nagios
[root@localhost ~]# usermod -G nagcmd apache

2
、正式開始安裝nagios程序
]# tar zxvf nagios-3.2.0.tar.gz
]# cd nagios-3.2.0
]# ./configure --with-command-group=nagcmd
]# make all
如下提示會出現,根據提示操作
*** Compile finished ***

If the main program and CGIs compiled without any errors, you
can continue with installing Nagios as follows (type 'make'
without any arguments for a list of all possible options):

make install
- This installs the main program, CGIs, and HTML files

make install-init
- This installs the init script in /etc/rc.d/init.d

make install-commandmode
- This installs and configures permissions on the
directory for holding the external command file

make install-config
- This installs *SAMPLE* config files in /usr/local/nagios/etc
You'll have to modify these sample files before you can
use Nagios. Read the HTML documentation for more info
on doing this. Pay particular attention to the docs on
object configuration files, as they determine what/how
things get monitored!

make install-webconf
- This installs the Apache config file for the Nagios
web interface


*** Support Notes *******************************************

If you have questions about configuring or running Nagios,
please make sure that you:

- Look at the sample config files
- Read the HTML documentation
- Read the FAQs online at http://www.nagios.org/faqs

before you post a question to one of the mailing lists.
Also make sure to include pertinent information that could
help others help you. This might include:

- What version of Nagios you are using
- What version of the plugins you are using
- Relevant snippets from your config files
- Relevant error messages from the Nagios log file

For more information on obtaining support for Nagios, visit:

http://www.nagios.org/support/

*************************************************************

Enjoy.


make install ------- /usr/local/nagios/share/
監控站點頁面
make install-init -----
/etc/init.d/nagios
make install-commandmode
make install-config ------
/usr/local/nagios/etc/ nagios的主配置文件
make install-webconf ------
/etc/httpd/conf.d/nagios.conf


讀這個文件,可以看到頁面別名及監控配置
/etc/httpd/conf.d/nagios.conf

需要配置驗證的用戶和密碼
[root@localhost nagios-3.2.0]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
[root@localhost nagios-3.2.0]# htpasswd /usr/local/nagios/etc/htpasswd.users user1
New password:
Re-type new password:
Adding password for user user1
[root@localhost nagios-3.2.0]# cat /usr/local/nagios/etc/htpasswd.users
nagiosadmin:FuD2.sNj9En4c
user1:hDlnlLQBCPmCA

vim /etc/httpd/conf/http.conf
391 DirectoryIndex index.php index.html index.html.var
]# service httpd restart

由於 監控站點是php+cgi寫的。所以需要apache支持php,cgi
安裝apache時加上cgi模塊(apache本身的模塊)的支持
安裝php包生成php模塊(第三方模塊),給apache

]# rpm -qf /etc/httpd/modules/libphp5.so
php-5.1.6-27.el5
]# rpm -qf /etc/httpd/modules/mod_cgi.so
httpd-2.2.3-43.el5




]# yum -y install php
]# service httpd restart
停止 httpd [確定]
啓動 httpd [確定]

http://192.168.1.254/nagios/






配置nagios

1、安裝監控插件
]# tar zxvf nagios-plugins-1.4.13.tar.gz
]# cd nagios-plugins-1.4.13
]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
]# make && make install
會在此目錄下生成插件文件
]# ls /usr/local/nagios/libexec/
check_apt check_imap check_pop
check_breeze check_ircd check_procs
check_by_ssh check_ldap check_real
check_clamd check_ldaps check_rpc
check_cluster check_load check_sensors
check_dhcp check_log check_smtp
check_dig check_mailq check_ssh
check_disk check_mrtg check_swap
check_disk_smb check_mrtgtraf check_tcp
check_dns check_nagios check_time
check_dummy check_nntp check_udp
check_file_age check_nt check_ups
check_flexlm check_ntp check_users
check_ftp check_ntp_peer check_wave
check_http check_ntp_time negate
check_icmp check_nwstat urlize
check_ide_smart check_oracle utils.pm
check_ifoperstatus check_overcr utils.sh
check_ifstatus check_ping
]#


利用這些插件提供的功能來去監控
監控主機的私有服務CPUDISK
臨近主機的公共服務 HTTP FTP SAMBA....
1
、如何監控本機
2
、如何監控其它主機

























監控本機

監控主機的私有服務CPUDISK
監控主機的公共服務 HTTP FTP SAMBA....
1
、如何監控本機


cd /usr/local/nagios/etc/objects
commands.cfg
裏面定義了命令名 和命令語法
---- 將在localhost.cfg之類裏面使用命令名
use
命令名 ---- 定義好的名稱
contacts.cfg
裏面定義了聯繫人的名字和郵件地址
----將在其它配置文件中使用聯繫人名
use
聯繫人名

templates.cfg
裏面定義了一些模板
-------將在localhost.cfg之類的文件中使用模板名稱
use
模板名稱

timeperiods.cfg
裏面定義了時間週期 workhours / 24x7
use
週期名稱

監控本機的文件,默認有一個:
/usr/local/nagios/etc/objects/localhost.cfg















localhost.cfg

[root@localhost objects]# cat /usr/local/nagios/etc/objects/localhost.cfg
# HOST DEFINITION
# Define a host for the local machine
define host{
use linux-server ; Name of host template to use
host_name localhost
alias localhost
address 127.0.0.1
}
use xxx
是在其它幾個.cfg文件中定義過的。
# HOST GROUP DEFINITION
define hostgroup{
hostgroup_name linux-servers ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members localhost ; Comma separated list of hosts that belong to this group
}
# SERVICE DEFINITIONS
# Define a service to "ping" the local machine

define service{
use local-service ; Name of service template to use
host_name localhost
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}

命令必須 commands.cfg中定義過
# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Root Partition
check_command check_local_disk!20%!10%!/
}



# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Current Users
check_command check_local_users!20!50
}


# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}


define service{
use local-service ; Name of service template to use
host_name localhost
service_description Total Running Processes
check_command check_local_procs!3!5!R
}

# Define a service to check the load on the local machine.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}



# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free

define service{
use local-service ; Name of service template to use
host_name localhost
service_description Swap Usage
check_command check_local_swap!20!10
}



# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.

define service{
use local-service ; Name of service template to use
host_name localhost
service_description SSH
check_command check_ssh
notifications_enabled 0
}



# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
#
第一步:必須在/usr/local/nagios/etc/objects/commands.cfg中定義check_http命令,定義如下:
#define command{
# command_name check_http
# command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
# }
#define command{
# command_name check_dns
# command_line $USER1$/check_dns -H $HOSTADDRESS$ $ARG1$
#}


define service{
use local-service ; Name of service template to use
host_name localhost
service_description HTTP
check_command check_http!localhost!-u /test.html -t 3 -s "TEST"
#
你一定要確保命令在命令行下是能執行成功
#[root@localhost libexec]# ./check_http -I localhost -u /test.html -t 3 -s "TEST"
#HTTP OK HTTP/1.1 200 OK - 0.001 second response time |time=0.000792s;;;0.000000 size=266B;;;0


notifications_enabled 0
}

define service{
use local-service ; Name of service template to use
host_name localhost
service_description DNS
check_command check_dns!localhost!-s localhost -w 2 -c 10
notifications_enabled 0
}
#
你一定要確保命令在命令行下是能執行成功
#[root@localhost libexec]# ./check_dns -H localhost -s localhost -w 1 -c 3
#DNS WARNING: 1.006 second response time. localhost returns 127.0.0.1|time=1.006259s;;;0.000000


~
check_dns/check_http
命令是自己在 commands.cfg中定義的。

$USER1$/check_dns
$USER1$/check_http插件必須存在!!!!!

$USER1$
是在resources.cfg中定義好的宏。
]# grep USER1 /usr/local/nagios/etc/resource.cfg
$USER1$=/usr/local/nagios/libexec



[root@localhost nagios]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@localhost nagios]# /etc/init.d/nagios start
Starting nagios: done.
[root@localhost nagios]#

http://192.168.1.254/nagios/index.php

注意1::::
DNS
必須啓動,並且 能解析localhost域名,以下命令才能執行成功
[root@localhost libexec]# ./check_dns -H localhost -s localhost -w 1 -c 3
#DNS WARNING: 1.006 second response time. localhost returns 127.0.0.1|time=1.006259s;;;0.000000

注意2::::
HTTPD
服務
cd /var/www/html
touch test.html
echo TEST >> test.html
vim /etc/httpd/conf/http.conf
DirectoryIndex test.html index.php index.html index.html.var
service httpd restart
注意3::::
檢查了SSH
你需要把sshd啓動




監控其它主機

1
]# grep 101 /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/192.168.1.101.cfg
2

]# cd /usr/local/nagios/etc/objects/
[root@localhost objects]# cp -a localhost.cfg 192.168.1.101.cfg
3
、編輯192.168.1.101.cfg定義如何監控



libexec]# ./check_dns -H www.baidu.com -s 192.168.1.101 -w 1 -c 3
192
1681101上確實配置了DNS,能解析www.baidu.com





其它主機的公有服務監控192.168.1.101.cfg

[root@localhost objects]# grep ^[^#] 192.168.1.101.cfg
define host{
use linux-server ; Name of host template to use
host_name baidu
alias 101 host
address 192.168.1.101
}
define hostgroup{
hostgroup_name linux-servers1 ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members baidu ; Comma separated list of hosts that belong to this group
}
define service{
use local-service ; Name of service template to use
host_name baidu
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service ; Name of service template to use
host_name baidu
service_description SSH
check_command check_ssh
notifications_enabled 0
}
define service{
use local-service ; Name of service template to use
host_name baidu
service_description HTTP
check_command check_http!localhost!-u /test.html -t 3 -s "TEST"
notifications_enabled 0
}
define service{
use local-service ; Name of service template to use
host_name baidu
service_description DNS
check_command check_dns!www.baidu.com!-s dns.up1.com -w 5 -c 10
notifications_enabled 0
}






關於use xxx
是使用已經定義好的名稱!!!!
[root@localhost objects]# pwd
/usr/local/nagios/etc/objects
[root@localhost objects]# grep
   'name.*local-service'   *
templates.cfg: name local-service ; The name of this service template
[root@localhost objects]# vim templates.cfg
[root@localhost objects]# grep 'name.*generic-service' *
templates.cfg: name generic-service ; The 'name' of this service template








利用NRPE插件監控其它主機的私有服務


[192.168.1.254] ------------ 192.168.1.101
monitor host remote host

check_nrpe NRPE
進程


192
1681101
配置NRPE程序,讓其自行取私有信息,最後交給監控主機
1
、創建nagios用戶
2
、安裝nagios-plugin
]# tar zxvf nagios-plugins-1.4.13.tar.gz
]# cd nagios-plugins-1.4.13
]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
]# make
]# make install
生成了如下插件文件
/usr/local/nagios/libexec/check_*
3
、安裝xinetd服務,因爲nrpe服務是受xinetd管理的服務
4
、安裝nrpe
]# tar zxvf nrpe-2.12.tar.gz
]# cd nrpe-2.12
]# evince docs/NRPE.pdf
]# make all
編譯
]# make install-plugin
生成/usr/local/nagios/libexec/check_nrpe
]# make install-daemon
生成 /usr/local/nagios/bin/nrpe
]#make install-daemon-config
生成nrpe程序的配置文件 /usr/local/nagios/etc/nrpe.cfg
]# make install-xinetd
生成/etc/xinetd.d/nrpe
]# vim /etc/xinetd.d/nrpe
only_from = 127.0.0.1 192.168.1.254
]# vim /etc/service
nrpe 5666/tcp # NRPE
5
、配置NRPE,定義監控本機的哪些服務!
vim /usr/local/nagios/etc/nrpe.cfg


]#vim /usr/local/nagios/etc/nrpe.cfg

command[check_users]=/usr/local/nagios/libexec/check_users -w 2 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk_boot]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
command[check_disk_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda2
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_run_procs]=/usr/local/nagios/libexec/check_procs -w 3 -c 10 -s R
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 50% -c 30%
6
、啓動服務
[root@15 ~]# /etc/init.d/xinetd restart
[root@15 ~]# netstat -tnlp | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 20956/xinetd
7
、測試
在本機檢測是否可以連接到1270015666端口
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.12

在本機檢測nrpe.cfg文件中定義的command是否可用
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_users
USERS WARNING - 5 users currently logged in |users=5;2;10;0
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_load
OK - load average: 0.13, 0.04, 0.17|load1=0.130;15.000;30.000;0; load5=0.040;10.000;25.000;0; load15=0.170;5.000;20.000;0;
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_disk_boot
DISK OK - free space: /boot 82 MB (88% inode=99%);| /boot=11MB;78;88;0;98
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_disk_root
DISK OK - free space: / 24436 MB (53% inode=97%);| /=21495MB;38744;43587;0;48431


++++++++++++++++++++++++++
監控服務器
192
1681254
]# yum -y install openssl-devel

1
、安裝Nrpe插件 ---- /usr/local/nagios/libexec/check_nrpe
]# tar zxvf nrpe-2.12.tar.gz
]# cd nrpe-2.12
]# evince docs/NRPE.pdf
]# make all
編譯
]# make install-plugin
生成 /usr/local/nagios/libexec/check_nrpe

ls /usr/local/nagios/libexec/check_nrpe
檢測:嘗試連接NRPE 5666端口,看是否OK
]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.15
NRPE v2.12
]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.15 -c check_users
USERS WARNING - 5 users currently logged in |users=5;2;10;0

2
、定義監控配置文件
]# grep 15 /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/192.168.1.15.cfg
3
、 定義監控的服務


]# cp -a localhost.cfg 192.168.1.15.cfg

]# vim /usr/local/nagios/etc/objects/commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

定義主配置文件:vim /usr/local/nagios/etc/objects/192.168.1.15.cfg
見下一小節
[root@localhost nrpe-2.12]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@localhost nrpe-2.12]# service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.









192.168.1.15.cfg

define host{
use linux-server ; Name of host template to use
host_name 1.15
alias 15
address 192.168.1.15
}
define hostgroup{
hostgroup_name linux-servers2 ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members 1.15 ; Comma separated list of hosts that belong to this group
}
# SERVICE DEFINITIONS
# Define a service to "ping" the local machine

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}


# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Root Partition
check_command check_nrpe!check_disk_root
}


define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Boot Partition
check_command check_nrpe!check_disk_boot
}

# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Current Users
check_command check_nrpe!check_users
}


# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Total Processes
check_command check_nrpe!check_total_procs
}


define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Total Running Processes
check_command check_nrpe!check_run_procs
}

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Total Zombie Processes
check_command check_nrpe!check_zombie_procs
}
# Define a service to check the load on the local machine.

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Current Load
check_command check_nrpe!check_load
}



# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Swap Usage
check_command check_nrpe!check_swap
}



# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description SSH
check_command check_ssh
notifications_enabled 0
}



# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
#define command{
# command_name check_http
# command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
# }
#define command{
# command_name check_dns
# command_line $USER1$/check_dns -H $HOSTADDRESS$ $ARG1$
#}


define service{
use local-service ; Name of service template to use
host_name 1.15
service_description HTTP
check_command check_http!192.168.1.15!-u /test.html -t 3 -s "TEST"
#[root@localhost libexec]# ./check_http -I localhost -u /test.html -t 3 -s "TEST"
#HTTP OK HTTP/1.1 200 OK - 0.001 second response time |time=0.000792s;;;0.000000 size=266B;;;0

notifications_enabled 1
}

define service{
use local-service ; Name of service template to use
host_name 1.15
service_description DNS
check_command check_dns!192.168.1.15!-s 192.168.1.15 -w 2 -c 10
notifications_enabled 1
}
#
對於監控DNS來講,必須配置DNS服務器能正反向解析自已
dns.baidu.com IN A 192.168.1.15
15 IN PTR dns.baidu.com

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章