elk筆記4--grok正則解析
1 grok 切分方法
grok切分規則可按照如下思路進行。
1)找準切分標誌,以切分標誌作爲中心向左或者向右逐個字段抽出,對於正則中的通配符需要進行轉義處理,否則這類字符作爲分割標誌的時候容易解析出錯
2)也可以直接從左到右逐個字段取出
2 grok 切分案例
-
案例1內容:
2016/04/27 12:22:50 OSPF: AdjChg: Nbr 220.220.220.220 on g-or2-a0bjt:10.61.61.61: Init -> Deleted (InactivityTimer)
正則:
2016/04/27 12:22:50 OSPF: AdjChg: Nbr 220.220.220.220 on g-or2-a0bjt:10.61.61.61: Init -> Deleted (InactivityTimer)
注意: OSPF前面需要有空格,否則會導致空格到timestamp中;on前面需要空格,否則會導致解析失敗
結果:
{
“neighborid”: “220.220.220.220”,
“data”: “Deleted (InactivityTimer)”,
“srcstat”: " Init “,
“ip”: “10.61.61.61”,
“type”: “AdjChg”,
“interface”: " g-or2-a0bjt”,
“timestamp”: “2016/04/27 12:22:50”
} -
案例2
內容:[Jul 11 10:22:59][123.123.123.123]<14>[2016-07-11 10:22:59,591][client.log][INFO]bak found in cache, skip it, test_data_2035_20160711_0500
正則1:
\[%{DATA:head}]\[%{DATA:clientip}]<%{NUMBER:pid}>\[%{GREEDYDATA:ts}]\[%{DATA:logtype}]\[%{LOGLEVEL:level}]%{GREEDYDATA:data}
注意:[需要進行轉義
結果:
{
“head”: “Jul 11 10:22:59”,
“logtype”: “client.log”,
“data”: “bak found in cache, skip it, test_data_2035_20160711_0500”,
“level”: “INFO”,
“clientip”: “123.123.123.123”,
“pid”: “14”,
“ts”: “2016-07-11 10:22:59,591”
}
正則2:去掉多餘一個時間\[%{DATA:head}]\[%{DATA:clientip}]<%{NUMBER:pid}>\[2016-07-11 10:22:59,591]\[%{DATA:logtype}]\[%{LOGLEVEL:level}]%{GREEDYDATA:data}
結果:
{
“head”: “Jul 11 10:22:59”,
“logtype”: “client.log”,
“data”: “bak found in cache, skip it, test_data_2035_20160711_0500”,
“level”: “INFO”,
“clientip”: “123.123.123.123”,
“pid”: “14”
} -
案例3 解析syslog 日誌
內容:Apr 19 12:56:07 xg dbus-daemon[1537]: [session uid=1000 pid=1537] Successfully activated service 'org.freedesktop.Tracker1'
正則:
%{GREEDYDATA:timestamp} %{DATA:user} %{DATA:app}\[%{NUMBER:pid}]: %{GREEDYDATA:content}
注意: 此處可以根[ 或者 ] 確定字段的相關關係,然後逐漸向前取,最前面時間直接使用GREEDYDATA匹配即可
結果:
{
“app”: “dbus-daemon”,
“pid”: “1537”,
“user”: “xg”,
“content”: “[session uid=1000 pid=1537] Successfully activated service ‘org.freedesktop.Tracker1’”,
“timestamp”: “Apr 19 12:56:07”
} -
案例4 解析nginx 日誌
內容:120.123.123.123 - - [19/Apr/2020:10:40:59 +0800] "GET /hello HTTP/1.1" 404 200 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36"
正則:
%{IP:server_name} %{DATA:holder1} %{DATA:remote_user} \[%{DATA:localtime}] "%{DATA:request}" %{NUMBER:req_status} %{NUMBER:upstream_status} "%{DATA:holder2}" %{GREEDYDATA:agent}
結果:
{
“localtime”: “19/Apr/2020:10:40:59 +0800”,
“server_name”: “120.123.123.123”,
“request”: “GET /hello HTTP/1.1”,
“agent”: "“Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36"”,
“req_status”: “404”,
“remote_user”: “-”,
“upstream_status”: “200”,
“holder2”: “-”,
“holder1”: “-”
} -
案例5
內容:\[%{DATA:ts}]\[%{DATA:ns}]\[%{DATA:env}]\[%{DATA:logstash_level}]\[%{DATA:service}]\[%{DATA:filename}:%{NUMBER:lineno}]%{GREEDYDATA:msg}
正則:
\[%{DATA:ts}]\[%{DATA:ns}]\[%{DATA:env}]\[%{DATA:logstash_level}]\[%{DATA:service}]\[%{DATA:filename}:%{NUMBER:lineno}]%{GREEDYDATA:msg}
結果:
{
“msg”: “{‘keyword’: ‘’, ‘pageNo’: ‘1’}”,
“filename”: “search.py”,
“lineno”: “29”,
“ns”: “audio-mgr”,
“service”: “apiserver”,
“env”: “production”,
“ts”: “2020-04-29 21:37:54”,
“logstash_level”: " INFO"
}
3 說明
參考文檔:
grok-patterns