一、说明
OpenTSDB的写入目前有OpenTSDB开源的tcollector采集框架,所以不着重去写关于写入的接口说明,这里主要是介绍常用的读取接口,便于对监控数据进行一些定制化检查、统计。
二、常用接口
OpenTSDB地址:192.168.180.128:16002
*1、查询所有的metric*
http://192.168.180.128:16002/api/suggest?max=1000&q=&type=metrics
返回
["test.delete","test.delete2"]
*2、查询全量tag*
http://192.168.180.128:16002/api/suggest?max=1000&q=&type=tagk
["host","idc"]
*3、查询全量tagv*
http://192.168.180.128:16002/api/suggest?max=1000&q=&type=tagv
["0.0.0.1","0.0.0.2","test1"]
*4、指定metric,查全量tagk*
http://192.168.180.128:16002/api/search/lookup?limit=1000&m=test.delete
得到的结果是这个metric下所有的tag、tagv,需要处理之后才会得到全量tagk
{"type":"LOOKUP","metric":"test.delete","tags":[],"limit":1000,"time":14.0,"results":[{"tsuid":"C66AC1000001000001000002000002","metric":"test.delete","tags":{"host":"0.0.0.1","idc":"test1"}}],"startIndex":0,"totalResults":1}
*5、指定metric,指定tagk,查全量tagv*
http://192.168.180.128:16002/api/search/lookup?limit=3000&m=test.delete{idc=*}
{"type":"LOOKUP","metric":"test.delete","tags":[{"key":"idc","value":"*"}],"limit":3000,"time":13.0,"results":[{"tsuid":"C66AC1000001000001000002000002","metric":"test.delete","tags":{"host":"0.0.0.1","idc":"test1"}}],"startIndex":0,"totalResults":1}
*6、指定metric,限定一个tagk=takv,查另外一个tagk下的所有tagv*
(如:test.delete,查指定idc=test1条件下,host的所有值)
http://192.168.180.128:16002/api/search/lookup?limit=3000&m=test.delete%7Bhost%3D*,idc%3Dtest1%7D
{"type":"LOOKUP","metric":"test.delete","tags":[{"key":"host","value":"*"},{"key":"idc","value":"test1"}],"limit":3000,"time":4.0,"results":[{"tsuid":"C66AC1000001000001000002000002","metric":"test.delete","tags":{"host":"0.0.0.1","idc":"test1"}}],"startIndex":0,"totalResults":1}
三、应用场景之巡检
- 功能1,巡检cmdb中没有添加监控的主机:
通过/api/search/lookup?limit=3000&m=monitor.alive{endpoint=*}接口查所有监控主机,与cmdb中的主机进行比对,得出未添加监控的主机列表。
- 功能2,巡检cmdb中没有添加监控的数据库实例
通过/api/search/lookup?limit=3000&m=mysql.alive{endpoint=*}接口查询所有监控主机及数据库端口号,与cmdb中的数据库实例进行对比,得出未添加监控的实例列表。
- 功能3,巡检指定时间内指定metric无数据上报的主机:
#使用 /api/query接口
#ipList为遍历的ip列表
tsdbQuery = {
'start': starttime,
'end': endtime,
'queries': [{
'aggregator':'avg',
'downsample':'1h-avg',
'metric': ‘monitor.alive’,
'rate': 'false',
'filters': [{
'type': 'literal_or',
'tagk': 'endpoint',
'filter': "|".join(ipList),
'groupBy': 'true'
}]
}]
}
取出无数据的endpoint,这些endpoint则为采集数据失败的主机
- 功能4,巡检cpu、磁盘、内存全天平均使用率top 10的主机列表:
同功能3的接口,只是metric改为对于cpu、磁盘、内存的metric即可。
功能n,使用/api/query出来的监控数据可以做各种应用,具体看需求。