本文轉自:http://www.ttlsa.com/mongodb/nagios-check_mongodb-plugin-to-monitor-mongodb/
當在生產環境下使用某種服務時,相應的監控措施也應當完善起來,來檢測服務是否正常和獲取相關信息是很有必要的。
下面來說說使用nagios-plugin-mongodb來監控mongodb數據庫。https://github.com/mzupan/nagios-plugin-mongodb
1. 下載check_mongodb nagios插件
| # cd /usr/local/nagios/libexec/ # wget --no-check-certificate https://github.com/mzupan/nagios-plugin-mongodb/archive/master.zip # unzip master # mv nagios-plugin-mongodb-master nagios-plugin-mongodb # chown -R nagios.nagios nagios-plugin-mongodb/ |
2. 安裝Mongo Python驅動
需要先安裝EPEL源。參見《CentOS / RHCE 可供使用的yum》。
|
#
yum install pymongo.x86_64
|
或者自己下載源碼包編譯。
| # wget --no-check-certificate https://github.com/mongodb/mongo-python-driver/archive/master.zip # unzip mongo-python-driver-master.zip # cd mongo-python-driver-master # python setup.py install |
或通過python easy_install來安裝。
3. check_mongodb.py 說明
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # ./check_mongodb.py --help usage: check_mongodb.py [options] This Nagios plugin checks the health of mongodb. options: -h, --help show this help message and exit -H HOST, --host=HOST The hostname you want to connect to -P PORT, --port=PORT The port mongodb is runnung on -u USER, --user=USER The username you want to login as -p PASSWD, --pass=PASSWD The password you want to use for that user -W WARNING, --warning=WARNING The warning threshold we want to set -C CRITICAL, --critical=CRITICAL The critical threshold we want to set -A ACTION, --action=ACTION The action you want to take --max-lag Get max replication lag (for replication_lag action only) --mapped-memory Get mapped memory instead of resident (if resident memory can not be read) -D, --perf-data Enable output of Nagios performance data -d DATABASE, --database=DATABASE Specify the database to check --all-databases Check all databases (action database_size) -s, --ssl Connect using SSL -r, --replicaset Connect to replicaset -q QUERY_TYPE, --querytype=QUERY_TYPE The query type to check [query|insert|update|delete|getmore|command] from queries_per_second -c COLLECTION, --collection=COLLECTION Specify the collection to check -T SAMPLE_TIME, --time=SAMPLE_TIME Time used to sample number of pages faults |
Nagios MongoDB監控插件的所有動作:
通過參數-A來傳遞下列任一動作。這些動作有:'connect', 'connections', 'replication_lag', 'replication_lag_percent', 'replset_state', 'memory', 'memory_mapped', 'lock', 'flushing', 'last_flush_time', 'index_miss_ratio', 'databases', 'collections', 'database_size', 'database_indexes', 'collection_indexes', 'queues', 'oplog', 'journal_commits_in_wl', 'write_data_files', 'journaled', 'opcounters', 'current_lock', 'replica_primary', 'page_faults', 'asserts', 'queries_per_second', 'page_faults', 'chunks_balance', 'connect_primary', 'collection_state', 'row_count'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
connect–默認動作.檢查連接
connections–檢查打開的數據庫連接的百分比
memory–檢測內存使用量
memory_mapped–檢查映射內存的使用情況
lock–檢查鎖定時間的百分比
flushing–檢查平均flush時間(以微秒)
last_flush_time–檢查上次刷新時間(以微秒)
index_miss_ratio–檢查索引命中失敗率
databases–檢查數據庫的總數
collections–檢查集合的總數
database_size–檢查特定數據庫的大小
database_indexes–檢查特定數據庫的索引大小
collection_indexes–檢查一個集合的索引大小
replication_lag–檢查複製延遲(以秒爲單位)
replication_lag_percent–檢查複製延遲(以百分比表示)
replset_state–檢查副本集的狀態
replica_primary–檢查副本集的主服務器
queries_per_second–檢查每秒查詢量
connect_primary–檢查連接在一組中的主服務器
collection_state–檢查數據庫中特定集合的狀態
|
4. 定義nagios command
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # vim /usr/local/nagios/etc/objects/commands.cfg define command { command_name check_mongodb command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ } define command { command_name check_mongodb_database command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -d $ARG8$ } define command { command_name check_mongodb_collection command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -d $ARG8$ -c $ARG9$ } define command { command_name check_mongodb_replicaset command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -r $ARG8$ } define command { command_name check_mongodb_query command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -q $ARG8$ } |
5. 創建監控項
5.1 Check Connection 需要監控集羣中每臺mongodb實例。
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoConnectCheck
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connect!2!4
}
|
5.2 Check Percentage of Open Connections 檢查空閒連接率
| define service { use generic-service hostgroup_name Mongo Servers service_description Mongo Free Connections check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connections!70!80 } |
5.3 Check Replication Lag 檢測複製延遲
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoReplicationLag
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replication_lag!15!30
}
|
5.4 Check Replication Lag Percentage 檢查複製滯後百分比。如果檢查達到100%的話就需要完全重新同步。
| define service { use generic-service hostgroup_name Mongo Servers service_description Mongo Replication Lag Percentage check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replication_lag_percent!50!75 } |
5.5 Check Memory Usage 檢查內存使用情況
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoMemoryUsage
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!memory!20!28
}
|
5.6 Check Mapped Memory Usage 檢查mongodb映射內存使用情況
| define service { use generic-service hostgroup_name Mongo Servers service_description Mongo Mapped Memory Usage check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!memory_mapped!20!28 } |
5.7 Check Lock Time Percentage 檢查鎖定時間百分比。如果有鎖定時間通常意味着數據庫已經超載。
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoLockPercentage
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!lock!5!10
}
|
5.8 Check Average Flush Time 檢查平均刷新時間。如果平均刷新時間高就意味着數據庫存在大量寫。
| define service { use generic-service hostgroup_name Mongo Servers service_description Mongo Flush Average check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!flushing!100!200 } |
5.9 Check Last Flush Time 檢查最後刷新時間。如果最後刷新時間高就意味着服務器可能存在IO壓力,需要更換更快的磁盤。
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoLastFlushTime
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!last_flush_time!200!400
}
|
5.10 Check status of mongodb replicaset 檢查的MongoDB replicaset狀態
| define service { use generic-service hostgroup_name Mongo Servers service_description MongoDB state check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replset_state!0!0 } |
5.11 Check status of index miss ratio 檢查索引命中失敗率。如果該值高,需要考慮添加索引了。
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoDBIndexMissRatio
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!index_miss_ratio!.005!.01
}
|
5.12 Check number of databases and number of collections
| define service { use generic-service hostgroup_name Mongo Servers service_description MongoDB Number of databases check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!databases!300!500 } define service { use generic-service hostgroup_name Mongo Servers service_description MongoDB Number of collections check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collections!300!500 } |
5.13 Check size of a database 檢查數據庫的大小。跟蹤數據增長率。
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoDBDatabasesizedb_ttlsa_posts
check_command check_mongodb_database!10.1.11.155!27017!check_mongodb!www.ttlsa.com!database_size!300!500!db_ttlsa_posts
}
|
5.14 Check index size of a database 檢查數據庫的索引大小
| define service { use generic-service hostgroup_name Mongo Servers service_description MongoDB Database index size db_ttlsa_posts check_command check_mongodb_database!10.1.11.155!27017!check_mongodb!www.ttlsa.com!database_indexes!50!100!db_ttlsa_posts } |
5.15 Check index size of a collection 檢查一個集合的索引大小
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoDBDatabaseindexsizedb_ttlsa_posts
check_command check_mongodb_collection!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collection_indexes!50!100!db_ttlsa_posts!posts
}
|
5.16 Check the primary server of replicaset 檢查replicaset的主服務器
| define service { use generic-service hostgroup_name Mongo Servers service_description MongoDB Replicaset Master Monitor: replset_ttlsa check_command check_mongodb_replicaset!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replica_primary!0!1!replset_ttlsa } |
5.17 Check the number of queries per second 檢查每秒查詢數量。這將檢查服務器上每秒查詢數量,類型有:query|insert|update|delete|getmore|command
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoDBUpdatesperSecond
check_command check_mongodb_query!10.1.11.155!27017!check_mongodb!www.ttlsa.com!queries_per_second!200!150!update
}
|
5.18 Check Primary Connection
| define service { use generic-service hostgroup_name Mongo Servers service_description Mongo Connect Check check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connect_primary!2!4 } |
5.19 Check Collection State 檢測集合狀態
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoCollectionState
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collection_state!db_ttlsa_posts!posts
}
|
轉載請註明來自運維生存時間: http://www.ttlsa.com/html/4188.html