環境:
系統
CentOS 6.0
hadoop集羣中有3臺服務器
server01 -> master 192.168.255.128
server02 -> slave 192.168.255.130
server03 -> slave 192.168.255.131
軟件倉庫 epel
直接使用epel源中的ganglia(自己編譯安裝有點小麻煩)。
1. 安裝epel源
- wget http://download.fedora.redhat.com/pub/epel/6/x86_64/epel-release-6-5.noarch.rpm -P /usr/local/src
- rpm -ivh /usr/local/src/epel-release-6-5.noarch.rpm
- rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6
2. ganglia服務端安裝gemtad和gmond
- yum install ganglia ganglia-devel ganglia-gmetad ganglia-gmond ganglia-web ganglia-gmond-python
會自動安裝相應的依賴包。
3. 其他服務器(作爲客戶端)只需安裝gmond
- yum install ganglia ganglia-gmond
4. 配置ganglia的gemtad
- cd /etc/ganglia
- vi gmetad.conf
- data_source "ganglia_hadoop" 192.168.255.128 192.168.255.130 192.168.255.131
修改數據源data_source這一行即可。
data_source "name" ip01:port01 ip02:port02 ...
說明:後面這些IP地址就是進行監控的主機,冒號後面跟的是要監聽的端口號(默認爲8649)。
啓動服務
- service gmetad start
- chkconfig gmetad on
5. 所有服務器配置客戶端gmond(使用多播)
- vi /etc/ganglia/gmond.conf
- cluster {
- name = "ganglia_hadoop"
- ...
只需將集羣的name設置爲gmetad中data_source設置的名字即可。
啓動服務
- service gmond start
6. 配置nginx
- vi /usr/local/nginx/conf/vhosts/ganglia.conf
- server
- {
- listen 80;
- server_name 域名;
- index index.html index.htm index.php;
- root /usr/share/ganglia;
- location ~ ^(.*)\/\.svn\/
- {
- deny all;
- }
- location ~ .*\.(php|php5)?$
- {
- # fastcgi_pass unix:/tmp/php-cgi.sock;
- fastcgi_pass php_server01;
- fastcgi_index index.php;
- include fcgi.conf;
- }
- location ~ .*\.(gif|jpg|jpeg|png|bmp|swf)$
- {
- expires 30d;
- access_log off;
- }
- location ~ .*\.(js|css)?$
- {
- expires 1h;
- access_log off;
- }
- log_format ganglia '$remote_addr - $remote_user [$time_local] [$request_time] "$request"'
- '$status $body_bytes_sent "$http_referer"'
- '"$http_user_agent" $http_x_forwarded_for';
- access_log off;
- }
主目錄爲:/usr/share/ganglia
可以添加通過nginx設置用戶名密碼訪問和限制IP訪問。
訪問http://域名
會報錯:
Notice: Undefined variable: private in /usr/share/ganglia/auth.php on line 27
因爲我的php-fpm的運行用戶爲nobody,程序auth.php中fopen打開的文件爲private_clusters,鏈接到/etc/ganglia/private_clusters,查看文件的擁有者
- ls -l /etc/ganglia/private_clusters
- -rw-r----- 1 root apache 1222 Feb 17 2010 /etc/ganglia/private_clusters
組擁有者爲apache,修改組爲php-fpm運行的用戶即可。
- chown root:nobody /etc/ganglia/private_clusters
7. 監控hadoop
我使用的hadoop的版本爲hadoop-0.20.205.0.tar.gz,ganglia的配置文件已經修改爲hadoop-metrics2.properties
修改配置文件
- vi $HADOOP_HOME/conf/hadoop-metrics2.properties
- # for Ganglia 3.1 support
- *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
- *.sink.ganglia.period=10
- # default for supportsparse is false
- *.sink.ganglia.supportsparse=true
- *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
- *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
- namenode.sink.ganglia.servers=239.2.11.71:8649
- datanode.sink.ganglia.servers=239.2.11.71:8649
- jobtracker.sink.ganglia.servers=239.2.11.71:8649
- tasktracker.sink.ganglia.servers=239.2.11.71:8649
- maptask.sink.ganglia.servers=239.2.11.71:8649
- reducetask.sink.ganglia.servers=239.2.11.71:8649
只需要將ganglia段落中的相關注釋取消即可。
注意:需要根據你的ganglia的版本來選擇註釋以下哪一行
# for Ganglia 3.0 support
# *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink30
#
# for Ganglia 3.1 support
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
需要修改hadoop集羣中所有的服務器的hadoop-metrics2.properties文件
重啓hadoop
- stop-all.sh
- start-all.sh
8. 查看ganglia的監控頁面將會看到相關的監控信息
如:dfs.dataname metrics