搭建nagios+ncpa監控

ncpa是nagios最近幾年推出的監控客戶端，已日趨完善，用於替代老舊的nrpe。

首先，nagios的優點在於

1、監控界的工業標準，專注報警近二十年（1999年誕生）
業界的話是這樣的，每種監控系統背後都有nagios的影子

2、優秀的設計永不過時，無數據庫設計
與zabbix的臃腫相比，nagios是遵循unix哲學的典範，做一件事並把它做好。
無數據庫設計，不讓數據庫拖後腿。

3、c語言編寫，超高性能
nagios4.0以前，採用了類似apache prefork模式，性能一度受到影響。在事件模型出現以前，它仍然是當時最好的方案。
nagios4.0之後，採用了類似nginx的事件模型，以極小的內存代價，取得性能上質的提升，10k+不成問題。

4、優秀的插件機制，非常靈活
nagios積累了十餘年的由社區貢獻的海量插件，自己編寫插件也十分容易。

ncpa比nrpe優秀的地方在於

1、支持被動監控，即ncpa主動向nagios上報（通過nrdp）

2、ncpa跟snmp類似，基本不需要配置，自帶基本監控項，比如cpu，內存，服務、進程等，
而nrpe需要在客戶端定義一堆check，然後還要在nagios服務端再定義一遍，非常繁瑣。

3、保留原有的nagios插件

4、通過簡單的腳本編程，在nagios服務端用nmap掃描ncpa客戶端，可以實現自動添加基本監控

5、環境依賴除了python2.7，對系統沒有任何侵入

本文描述基於nagios+ncpa的主動監控，替代nrpe。

環境

服務端：CentOS 7 + nagios 4  IP：192.168.1.200
客戶端：CentOS 7 + ncpa 2.0.6   IP：192.168.1.50

客戶端配置

1、安裝ncpa

rpm -ivh https://assets.nagios.com/downloads/ncpa/ncpa-2.0.6.el7.x86_64.rpm

2、啓動ncpa服務

/etc/init.d/ncpa_passive start
/etc/init.d/ncpa_listener start
chkconfig ncpa_listener on
chkconfig ncpa_passive on

3、客戶端開啓防火牆端口5693

iptables -A INPUT -p tcp --dport 5693 -j ACCEPT

或

iptables -A INPUT -s 192.168.1.200 -p tcp --dport 5693 -j ACCEPT

服務端配置

安裝nagios（簡略版）

yum install epel-release -y
yum install nagios httpd php php-pecl-zendopcache fping nmap -y
systemctl enable httpd nagios
systemctl start httpd nagios
iptables -A INPUT -p tcp --dport 80 -j ACCEPT

mkdir -p /etc/nagios/bin
mkdir -p /etc/nagios/hosts
mkdir -p /etc/nagios/services
mkdir -p /etc/nagios/template
echo "cfg_dir=/etc/nagios/hosts" >> /etc/nagios/nagios.cfg
echo "cfg_dir=/etc/nagios/services " >>/etc/nagios/nagios.cfg
service nagios restart

一、主機自動發現

所謂自動發現，就是用掃描器掃描局域網，

1、如果IP已在監控之內，則略過；

2、如果是新IP，則按照固定的模板，創建配置文件，並通知管理員；

3、如果某個IP發現後又消失了，nagios會報警，通知管理員。

這樣就形成了一個局域網IP管理的閉環。

使用fping配置主機自動發現

創建主機模板文件/etc/nagios/template/host.cfg，內容如下：

define host {
    host_name                       HOST
    address                         HOST
    check_command                   check-host-alive
    max_check_attempts              3
    check_interval                  5
    retry_interval          1
    check_period                    24x7
    contacts             nagiosadmin
    notification_interval        60
    notification_period          24x7
    notifications_enabled          1
}


創建腳本/etc/nagios/bin/find-hosts.sh，內容如下：

#!/usr/bin/env bash

if [ ! -f /usr/sbin/fping ];then
  yum install fping -y
fi

network=$1

echo_usage() {
  echo -e "\e[1;31mUsage: $0 [network] \e[0m"
  echo -e "example: \e[1;32m $0 192.168.0.0/24 \e[0m"
  echo
  exit 3
}

if [ x$network == "x" ];then
  echo_usage
fi

########################################################
########################################################

dir=/etc/nagios/hosts
host_template=/etc/nagios/template/host.cfg
result=$(mktemp -u /tmp/fping-XXXXXX)
mkdir -p $dir
fping -a -q -g $network > $result

i=0
while read host;do
  if [ ! -f /etc/nagios/hosts/$host.cfg ];then
    echo new host found $host
    #mailx -s "new host found :$host" root@localhost 
    sed "s/HOST/$host/g" $host_template > $dir/$host.cfg
    i=$(expr $i + 1)
  fi
done < $result
rm -rf $result

if [ $i -eq 0 ];then
  echo no new host found
  exit 0
fi

if (nagios -v /etc/nagios/nagios.cfg |grep -q "Things look okay");then
  echo "nagios configuration is OK"
  sleep 1
  service nagios restart
  echo "nagios restart successfully"
else
  echo "nagios restart failed.please check"
  exit 1
fi

通過定時任務運行這個腳本，即可自動添加主機監控，也可以修改腳本，讓每次發現新機器時發郵件通知管理員。

二、服務自動發現

使用nmap+check_ncpa實現服務自動發現

1、下載check_ncpa

wget https://assets.nagios.com/downloads/ncpa/check_ncpa.tar.gz
tar zxvf check_ncpa.tar.gz
cp check_ncpa.py /usr/lib64/nagios/plugins/
cp check_ncpa.py /usr/bin/

2、配置check_ncpa

創建文件/etc/nagios/conf.d/check_ncpa.cfg，內容如下：

# 'check_ncpa' command definition
define command{
  command_name check_ncpa
  command_line $USER1$/check_ncpa.py -H $HOSTADDRESS$ -P 5693 -t mytoken $ARG1$
}

3、測試check_ncpa.py

python check_ncpa.py -H 192.168.1.50 -p 5693 -t mytoken -l

4、創建服務發現模板

常規的監控項目無外乎兩類，一類是基本的CPU、swap、負載、磁盤等，另一種是服務，比如nginx

創建文件/etc/nagios/template/ncpa-service.cfg，內容如下：

define service {
    host_name                              HOST
    service_description                    SERVICE
    check_command                          check_ncpa!-M service/SERVICE
    max_check_attempts                     3
    check_interval                         5
    retry_interval                         1
    check_period                           24x7
    notification_interval                  60
    notification_period                    24x7
    contacts                               nagiosadmin
}

創建文件/etc/nagios/template/ncpa-basic.cfg，內容如下：

#監控uptime，防止機器重啓
define service {
    host_name                    HOST
    service_description          system uptime
    check_command                check_ncpa!-M system/uptime -w @60:120 -c @1:60
    max_check_attempts           3
    check_interval               5
    retry_interval               1
    check_period                 24x7
    notification_interval        60
    notification_period          24x7
    contacts                     nagiosadmin
}
#監控CPU使用率
define service {
    host_name                    HOST
    service_description          CPU Usage
    check_command                check_ncpa!-M cpu/percent -w 50 -c 80 -q 'aggregate=avg'
    max_check_attempts           3
    check_interval               5
    retry_interval               1
    check_period                 24x7
    notification_interval        60
    notification_period          24x7
    contacts                     nagiosadmin
}
#監控swap
define service {
    host_name                    HOST
    service_description          swap Usage
    check_command                check_ncpa!-M memory/swap -w 512 -c 1024 -u mb
    max_check_attempts           3
    check_interval               5
    retry_interval               1
    check_period                 24x7
    notification_interval        60
    notification_period          24x7
    contacts                     nagiosadmin
}
#監控進程總數
define service {
    host_name                    HOST
    service_description          Process Count
    check_command                check_ncpa!-M processes -w 500 -c 1000
    max_check_attempts           3
    check_interval               5
    retry_interval               1
    check_period                 24x7
    notification_interval        60
    notification_period          24x7
    contacts                     nagiosadmin
}

#監控磁盤空間
define service {
    host_name                    HOST
    service_description          Disk Usage
    check_command                check_ncpa!-M 'plugins/check_disk' -a "-w 20 -c 10 --local"
    max_check_attempts           3
    check_interval               5
    retry_interval               1
    check_period                 24x7
    notification_interval        60
    notification_period          24x7
    contacts                     nagiosadmin
}

#監控系統負載
define service {
    host_name                    HOST
    service_description          Load average
    check_command                check_ncpa!-M 'plugins/check_load' -a "-w 8,4,4 -c 12,8,8"
    max_check_attempts           3
    check_interval               5
    retry_interval               1
    check_period                 24x7
    notification_interval        60
    notification_period          24x7
    contacts                     nagiosadmin
}

#監控殭屍進程
define service {
    host_name               HOST
    service_description     Load average
    check_command           check_ncpa!-M 'plugins/check_procs' -a "-w 3 -c 5 -s Z"
    max_check_attempts      3
    check_interval          5
    retry_interval          1
    check_period            24x7
    notification_interval   60
    notification_period     24x7
    contacts                nagiosadmin
}

創建自動發現腳本/etc/nagios/bin/find-ncpa.sh，內容如下

#!/usr/bin/env bash

if [ ! -f /usr/bin/nmap ];then
  yum install nmap -y
fi

network=$1

usage() {
  echo -e "\e[1;31mUsage: $0 [ip|ip-rang|network] \e[0m"
  echo -e "example1: \e[1;32m $0 192.168.0.100 \e[0m"
  echo -e "example2: \e[1;32m $0 192.168.1-200 \e[0m"
  echo -e "example3: \e[1;32m $0 192.168.2.0/24 \e[0m"
  echo
  exit 0
}

if [ x$network == "x" ];then
  usage
fi


dir="/etc/nagios/services"
ncpa_basic_template="/etc/nagios/template/ncpa-basic.cfg"
ncpa_service_template="/etc/nagios/template/ncpa-service.cfg"

nmap -sS -p 5693 --open $network |awk '/Nmap scan report for/{print $5}' > /tmp/ncpa_hosts.txt


while read host;do
  if [ ! -f $dir/$host.cfg ];then
	touch $dir/$host.cfg
	sed "s/HOST/$host/g" $ncpa_basic_template >> $dir/$host.cfg
	/usr/local/bin/check_ncpa.py -H $host -t mytoken -M services -l |grep running |awk '/running/{print $1}' |tr -d \" |tr -d \: |egrep -v "@|systemd" > /tmp/$host.servicelist.txt

	while read service;do
		sed -e "s/HOST/$host/g" -e "s/SERVICE/$service/g"  $ncpa_service_template >> $dir/$host.cfg
	done < /dev/shm/$host.servicelist.txt
	rm -rf /dev/shm/$host.servicelist.txt
  fi
done < /tmp/ncpa_hosts.txt

rm -rf /tmp/ncpa_hosts.txt

if (nagios -v /etc/nagios/nagios.cfg |grep -q "Things look okay");then
	echo "nagios configuration is OK"
	sleep 1
	service nagios restart
	echo "nagios restart successfully"
else
	echo "nagios restart failed. please check"
	exit 1
fi

業務監控

自動發現在很大程度上可以減輕工作量，但具體的業務監控仍然需要手動添加。

比如監控nginx是否重啓過 (運行時長是否超過1800秒)

#監控進程運行時長
define service {
    host_name                      HOST
    service_description            Load average
    check_command                  check_ncpa!-M plugins/check_procs -a "-a nginx -m ELAPSED -w @1800:3600 -c @1:1800"
    max_check_attempts             3
    check_interval                 5
    retry_interval                 1
    check_period                   24x7
    notification_interval          60
    notification_period            24x7
    contacts                       nagiosadmin
}

對於php-fpm這類動態進程模型，其特點是root身份啓動一個master進程，子進程屬主是普通用戶，且個數是動態的，故只需監控master進程運行時長即可，也可以照葫蘆劃瓢，

#監控php-fpm
define service {
    host_name                     HOST
    service_description           Load average
    check_command                 check_ncpa!-M plugins/check_procs -a "-u root -a php-fpm -m ELAPSED -w @1800:3600 -c @1:1800"
    max_check_attempts            3
    check_interval                5
    retry_interval                1
    check_period                  24x7
    notification_interval         60
    notification_period           24x7
    contacts                      nagiosadmin
}

搭建nagios+ncpa監控

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

【Python】保存gym截圖

【譯】使用 GitHub Copilot 作爲你的編碼 GPS

Linux 服務器配置-安裝portainer-ce社區版

外行也能讀懂的網絡硬件設備功能原理速成

安裝Auto-GPT

TiDB rpm 安裝方式

iTop 2.5.0 中文漢化版【開源CMDB】

CentOS 7 借用debian kernel 4.9

我的友情鏈接

kubeadm 無法下載鏡像問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結