背景
由於我們的自研客戶端壓測工具的測試結果是結構化日誌文件,而考慮到目前性能監控需要做到實時化和集中化,那麼需要一種定時和批量採集結構化日誌文件的採集 agent,而剛好 Telegraf Logparser插件可以滿足這個需求。
Telegraf logparser
Logparser插件流式傳輸並解析給定的日誌文件,目前支持解析 “grok” 模式和正則表達式模式。
Grok 解析器
熟悉 grok 解析器的最佳途徑是參考 logstash文檔:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Telegraf 解析器使用經過稍微修改的 logstash “grok” 模式版本,其格式爲
%{<capture_syntax>[:<semantic_name>][:<modifier>]}
capture_syntax
:定義解析輸入行的 grok 模式semantic_name
:用於命名字段或標記modifier
:擴展被解析項轉換爲的數據類型或其他特殊處理
默認情況下,所有命名的捕獲都轉換爲字符串字段。如果模式沒有語義名稱,則不會捕獲它。時間戳修飾符可用於將捕獲轉換爲已解析度量的時間戳。如果未解析任何時間戳,則將使用當前時間創建度量。
注意:每行必須捕獲至少一個字段。將所有捕獲轉換爲標記的模式將導致無法寫入到時序數據庫的點。
- Available modifiers:
- string (default if nothing is specified)
- int
- float
- duration (ie, 5.23ms gets converted to int nanoseconds)
- tag (converts the field into a tag)
- drop (drops the field completely)
- Timestamp modifiers:
- ts (This will auto-learn the timestamp format)
- ts-ansic (“Mon Jan _2 15:04:05 2006”)
- ts-unix (“Mon Jan _2 15:04:05 MST 2006”)
- ts-ruby (“Mon Jan 02 15:04:05 -0700 2006”)
- ts-rfc822 (“02 Jan 06 15:04 MST”)
- ts-rfc822z (“02 Jan 06 15:04 -0700”)
- ts-rfc850 (“Monday, 02-Jan-06 15:04:05 MST”)
- ts-rfc1123 (“Mon, 02 Jan 2006 15:04:05 MST”)
- ts-rfc1123z (“Mon, 02 Jan 2006 15:04:05 -0700”)
- ts-rfc3339 (“2006-01-02T15:04:05Z07:00”)
- ts-rfc3339nano (“2006-01-02T15:04:05.999999999Z07:00”)
- ts-httpd (“02/Jan/2006:15:04:05 -0700”)
- ts-epoch (seconds since unix epoch, may contain decimal)
- ts-epochmilli (milliseconds since unix epoch, may contain decimal)
- ts-epochnano (nanoseconds since unix epoch)
- ts-syslog (“Jan 02 15:04:05”, parsed time is set to the current year)
- ts-“CUSTOM”
自定義時間格式必須在引號內,並且必須是 “參考時間” 的表示形式 on Jan 2 15:04:05 -0700 MST 2006
。
要匹配逗號小數點,可以使用句點。例如,%{TIMESTAMP:timestamp:ts-"2006-01-02 15:04:05.000"}
可以用來匹配 "2018-01-02 15:04:05,000"
要匹配逗號小數點,可以在模式字符串中使用句點。
有關更多詳細信息,請參考:
https://golang.org/pkg/time/#Parse
Telegraf 具有許多自己的內置模式,並支持大多數 logstash 的內置模式。 Golang 正則表達式不支持向前或向後查找。不支持依賴於這些的logstash 模式。
如果需要構建模式以匹配日誌的調試,使用 https://grokdebug.herokuapp.com 調試非常有用!
示例
我們可以使用 logparser 將 Telegraf 生成的日誌行轉換爲指標。
爲此,我們需要配置 Telegraf 以將日誌寫入文件。可以使用 agent.logfile
參數或配置 syslog 來完成。
[agent]
logfile = "/var/log/telegraf/telegraf.log"
Logparser配置:
[[inputs.logparser]]
files = ["/var/log/telegraf/telegraf.log"]
[inputs.logparser.grok]
measurement = "telegraf_log"
patterns = ['^%{TIMESTAMP_ISO8601:timestamp:ts-rfc3339} %{TELEGRAF_LOG_LEVEL:level:tag}! %{GREEDYDATA:msg}']
custom_patterns = '''
TELEGRAF_LOG_LEVEL (?:[DIWE]+)
'''
log 內容:
2018-06-14T06:41:35Z I! Starting Telegraf v1.6.4
2018-06-14T06:41:35Z I! Agent Config: Interval:3s, Quiet:false, Hostname:"archer", Flush Interval:3s
2018-02-20T22:39:20Z E! Error in plugin [inputs.docker]: took longer to collect than collection interval (10s)
2018-06-01T10:34:05Z W! Skipping a scheduled flush because there is already a flush ongoing.
2018-06-14T07:33:33Z D! Output [file] buffer fullness: 0 / 10000 metrics.
InfluxDB 採集的數據:
telegraf_log,host=somehostname,level=I msg="Starting Telegraf v1.6.4" 1528958495000000000
telegraf_log,host=somehostname,level=I msg="Agent Config: Interval:3s, Quiet:false, Hostname:\"somehostname\", Flush Interval:3s" 1528958495001000000
telegraf_log,host=somehostname,level=E msg="Error in plugin [inputs.docker]: took longer to collect than collection interval (10s)" 1519166360000000000
telegraf_log,host=somehostname,level=W msg="Skipping a scheduled flush because there is already a flush ongoing." 1527849245000000000
telegraf_log,host=somehostname,level=D msg="Output [file] buffer fullness: 0 / 10000 metrics." 1528961613000000000
具體實踐
日誌格式
需要採集的結構化日誌示例如下:
TestConfig1,5.0,2019/3/6 17:48:23,2019/3/6 17:48:30,demo_1,open,3,1,6.8270219,openscreen>validatestage
TestConfig2,5.0,2019/3/6 17:48:33,2019/3/6 17:48:40,demo_2,open,3,2,6.9179322,openscreen>validatestage
TestConfig3,5.0,2019/3/6 17:48:43,2019/3/6 17:50:23,demo_1,open,3,3,100.1237885,switchscreen>validatestag
TestConfig3,5.0,2019/3/6 17:48:43,2019/3/6 17:50:23,demo_1,open,3,3,100.1237885,switchscreen>validatestag
TestConfig3,5.0,2019/3/6 17:48:43,2019/3/6 17:50:23,demo_1,open,3,3,100.1237885,switchscreen>validatestag
TestConfig3,5.0,2019/3/6 17:48:43,2019/3/6 17:50:23,demo_1,open,3,3,100.1237885,switchscreen>validatestag
TestConfig3,5.0,2019/3/6 17:48:43,2019/3/6 17:50:23,demo_1,open,3,3,100.1237885,switchscreen>validatestag
TestConfig3,5.0,2019/3/6 17:48:43,2019/3/6 17:50:23,demo_1,open,3,3,100.1237885,switchscreen>validatestag
注意:這個日誌是批量生成的,每一次客戶端壓測當前目錄都會生成一個
*.log
的文件。數據採集的時候需要爲對應列指定列名。
Telegraf 配置
配置 Telegraf.conf
[[inputs.logparser]]
## Log files to parse.
## These accept standard unix glob matching rules, but with the addition of
## ** as a "super asterisk". ie:
## /var/log/**.log -> recursively find all .log files in /var/log
## /var/log/*/*.log -> find all .log files with a parent dir in /var/log
## /var/log/apache.log -> only tail the apache log file
files = ["C:\\Release\\TestConfigLog\\*.log"]
## Read files that currently exist from the beginning. Files that are created
## while telegraf is running (and that match the "files" globs) will always
## be read from the beginning.
from_beginning = false
## Method used to watch for file updates. Can be either "inotify" or "poll".
watch_method = "poll"
## Parse logstash-style "grok" patterns:
## Telegraf built-in parsing patterns: https://goo.gl/dkay10
[inputs.logparser.grok]
## This is a list of patterns to check the given log file(s) for.
## Note that adding patterns here increases processing time. The most
## efficient configuration is to have one pattern per logparser.
## Other common built-in patterns are:
## %{COMMON_LOG_FORMAT} (plain apache & nginx access logs)
## %{COMBINED_LOG_FORMAT} (access logs + referrer & agent)
patterns = ['%{WORD:scene},%{NUMBER:version:float},%{TS_WIN:begtime},%{TS_WIN:endtime},%{WORD:canvasName},%{WORD:canvasCase},%{NUMBER:totaltimes:int},%{NUMBER:current:int},%{NUMBER:time_consuming:float}']
## Name of the outputted measurement name.
measurement = "bigscreen"
## Full path(s) to custom pattern files.
## custom_pattern_files = []
## Custom patterns can also be defined here. Put one pattern per line.
custom_patterns = 'TS_WIN %{YEAR}/%{MONTHNUM}/%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?'
## Timezone allows you to provide an override for timestamps that
## don't already include an offset
## e.g. 04/06/2016 12:41:45 data one two 5.43µs
##
## Default: "" which renders UTC
## Options are as follows:
## 1. Local -- interpret based on machine localtime
## 2. "Canada/Eastern" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
## 3. UTC -- or blank/unspecified, will return timestamp in UTC
timezone = "Local"
注意:
files = [" *.log"]
,解決了當前目錄多文件對象匹配的需求watch_method = "poll"
,設置輪訓獲取文件更新- custom_patterns,自定義一個時間格式化模式匹配
InfluxDB 生成的指標數據如下:
> select * from bigscreen limit 5
name: bigscreen
time begtime canvasCase canvasName current endtime host path scene time_consuming totaltimes version
---- ------- ---------- ---------- ------- ------- ---- ---- ----- -------------- ---------- -------
1552296231630588200 2019/3/6 17:48:43 open demo_1 3 2019/3/6 17:50:23 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\1.log TestConfig3 100.1237885 3 5
1552296231630588201 2019/3/6 17:48:43 open demo_1 3 2019/3/6 17:50:23 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\1.log TestConfig3 100.1237885 3 5
1552296231630588202 2019/3/6 17:48:43 open demo_1 3 2019/3/6 17:50:23 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\1.log TestConfig3 100.1237885 3 5
1552296231631587700 2019/3/6 17:48:43 open demo_1 3 2019/3/6 17:50:23 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\1.log TestConfig3 100.1237885 3 5
1552297570005076300 2019/3/6 17:48:23 open demo_1 1 2019/3/6 17:48:30 DESKTOP-MLD0KTS C:\Users\htsd\Desktop\VBI5\Release\TestConfigLog\12.log TestConfig1 6.8270219 3 5
列名都是我們自定義的。
Grafana設置
整體的考慮是使用一個表格進行數據展示,支持按個別字段篩選。
設置篩選變量,滿足字段過濾篩選要求:
創建Dashboard,並選擇表格組件:
定義數據源:
設置表格字段樣式,對時間字段進行格式化
對響應時間字段進行不同級別高亮設置(綠,黃,紅三個顏色)
實際的動態效果如下:
小結
本文通過一個簡單的示例展示了 Telegraf+InfluxDB+Grafana
如何對結構化日誌進行實時監控,當然也支持非結構化日誌採集,大家有興趣的話也可以自己動手實踐。
相關資料:
https://github.com/zuozewei/PerformanceTest-Examples/tree/master/Performance%20Monitoring/Telegraf-InfluxDB-Grafana-Logparser