ganglia是一個監控服務器,可以監視和顯示集羣中的節點的各種狀態信息,比如如:cpu 、mem、硬盤利用率, I/O負載、網絡流量情況等,同時可以將歷史數據以曲線方式通過php頁面呈現。
ganglia服務端能夠通過一臺客戶端收集到同一個網段的所有客戶端的數據,ganglia集羣服務端能夠通過一臺服務端收集到它下屬的所有客戶端數據。
ganglia又依賴於一個web服務器用來顯示集羣狀態,用rrdtool來存儲數據和生成曲線圖,需要xml解析因此需要expat,配置文件解析需要libconfuse。安裝apche的httpd還需要支持php4以上,同時還有一些依賴軟件。
server2.example.com 172.25.85.2
server3.example.com 172.25.85.3
ganglia是在之前nagios的基礎上做的。
1.ganglia的安裝和配置
server2監控端:
yum install rpm-build -y
rpmbuild -tb ganglia-3.4.0.tar.gz ###解決依賴性
yum install gcc-c++ python-devel pcre-devel expat-devel apr-devel
rrdtool-devel-1.3.8-6.el6.x86_64.rpm
libconfuse-2.6-3.el6.x86_64.rpm
libconfuse-devel-2.6-3.el6.x86_64.rpm -y
rpmbuild -tb ganglia-3.4.0.tar.gz
rpmbuild -tb ganglia-web-3.4.2.tar.gz
cd /root/rpmbuild/RPMS/noarch
yum install php php-gd -y
rpm -ivh ganglia-web-3.4.2-1.noarch.rpm
cd /root/rpmbuild/RPMS/x86_64
ls
ganglia-devel-3.4.0-1.x86_64.rpm ganglia-gmetad-3.4.0-1.x86_64.rpm ganglia-gmond-3.4.0-1.x86_64.rpm ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm libganglia-3.4.0-1.x86_64.rpm
rpm -ivh *
scp ganglia-gmond-3.4.0-1.x86_64.rpm ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm libganglia-3.4.0-1.x86_64.rpm 172.25.85.3:/root
cd /root
scp libconfuse-2.6-3.el6.x86_64.rpm libconfuse-devel-2.6-3.el6.x86_64.rpm [email protected]:/root
server3:
cd /root
rpm -ivh ganglia-gmond-3.4.0-1.x86_64.rpm
ganglia-gmond-modules-python-3.4.0-1.x86_64.rpm
libganglia-3.4.0-1.x86_64.rpm
libconfuse-2.6-3.el6.x86_64.rpm
libconfuse-devel-2.6-3.el6.x86_64.rpm
server2:
vim /etc/ganglia/gmetad.conf
data_source "wei cluster" localhost
/etc/init.d/gmetad start
vim /etc/ganglia/gmond.conf
/etc/init.d/gmond start
server3:
vim /etc/ganglia/gmond.conf
/etc/init.d/gmond start
server2:
cd /var/www/html/gweb
ls
/etc/init.d/httpd start
檢測:
http://172.25.85.2/gweb
2.server2:
cd /var/lib/ganglia/rrds
cd wei\ cluster/
tar zxf ganglia-3.4.0.tar.gz
cd /root/ganglia-3.4.0/contrib
cp check_ganglia.py /usr/local/nagios/libexec/
cd /usr/local/nagios/libexec/
chown nagios.nagios check_ganglia.py
cd /usr/local/nagios/libexec
vim check_ganglia.py
ganglia_host = '172.25.85.2'
if critical > warning: if value >= critical: print "CHECKGANGLIA CRITICAL: %s is %.2f" % (metric, value) sys.exit(2) elif value >= warning: print "CHECKGANGLIA WARNING: %s is %.2f" % (metric, value) sys.exit(1) else: print "CHECKGANGLIA OK: %s is %.2f" % (metric, value) sys.exit(0) else: if critical >= value: print "CHECKGANGLIA CRITICAL: %s is %.2f" % (metric, value) sys.exit(2) elif warning >= value: print "CHECKGANGLIA WARNING: %s is %.2f" % (metric, value) sys.exit(1) else: print "CHECKGANGLIA OK: %s is %.2f" % (metric, value) sys.exit(0)
./check_ganglia.py -h server2.example.com -m disk_free_percent_rootfs -w 20 -c 10
[cd /var/lib/ganglia/rrds/wei cluster/server2.example.com
/usr/local/nagios/libexec/check_ganglia.py -h server2.example.com -m disk_free_percent_roots -w 20 -c 10
server2:
cd /usr/local/nagios/etc/objects
vim command.cfg
define command { command_name check_ganglia command_line $USER1$/check_ganglia.py -h $HOSTADDRESS$ -m $ARG1$ -w $ARG2$ -c $ARG3$ }
vim host.cfg
define host { use linux-server host_name server4.example.com address 172.25.85.4 }
define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group members server2.example.com,server3.example.com ; Comma separated list of hosts that belong to this group }
define hostgroup { hostgroup_name ganglia-servers alias ganglia-servers members server4.example.com }
vim service.cfg
define servicegroup { servicegroup_name ganglia-metrics alias Ganglia Metrics }
define service{ use ganglia-server service_description 根分區空閒百分比 check_command check_ganglia!disk_free_percent_rootfs!20!10 }
define service{ use ganglia-server service_description 內存空間 check_command check_ganglia!mem_free!50000!30000 }
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
/etc/init.d/nagios reload
http://172.25.85.2/nagios
server4的資源被監控