关于logstash、elk读取日志问题

最近遇到一个案例

游戏服的进程会产生大量日志,日志内容为json行,需要按小时分割,现在logstash采集的时候总是会遇到解析失败的情况,报错如下:

[2020-06-17T12:13:19,489][ERROR][logstash.codecs.json     ][main] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: incompatible json object type=java.lang.String , only hash map or arrays are supported>, :data=>"\"#distinct_id\":\"xxxxxxxxxx\",\"#type\":\"track\",\"#ip\":\"113.7.22.200\",\"#time\":\"2020-06-17 12:12:19\",\"#event_name\":\"xxxxxxxxx\",\"#account_no\":\"xxxxxxxx\",\"properties\":{\"role_uid\":xxxxxxxxx,\"current_server\":xxx,\"create_server\":xxxx,\"channel\":\"xxxxx\",\"role_name\":\"xx\",\"role_create_time\":\"2020-03-10 18:57:31\"}"}

通过读取官方文档,才知道logstash是有两种读取日志的方式,一种是完整的读取一次,一种是不停的查看文件的末尾行。 

 Tail modeedit

In this mode the plugin aims to track changing files and emit new content as it’s appended to each file. In this mode, files are seen as a never ending stream of content and EOF has no special significance. The plugin always assumes that there will be more content. When files are rotated, the smaller or zero size is detected, the current position is reset to zero and streaming continues. A delimiter must be seen before the accumulated characters can be emitted as a line.

Read modeedit

In this mode the plugin treats each file as if it is content complete, that is, a finite stream of lines and now EOF is significant. A last delimiter is not needed because EOF means that the accumulated characters can be emitted as a line. Further, EOF here means that the file can be closed and put in the "unwatched" state - this automatically frees up space in the active window. This mode also makes it possible to process compressed files as they are content complete. Read mode also allows for an action to take place after processing the file completely.

In the past attempts to simulate a Read mode while still assuming infinite streams was not ideal and a dedicated Read mode is an improvement.

而我采用的是Tail mode,不停的追加读取的方式。

会不会有可能发生这种情况呢? 进程在写日志,一整个json行还没写完,logstash就开始读取日志了。那么就会出现logstash读取到的不是一个完整的json行,而出现解析失败的情况。这种情况下怎么办?答案是添加一个配置,设置stat_interval 间隔。

input {
  file {
    path => "/data/xxxx/*"
    stat_interval => 3800 # 我这里因为是一个小时分割一次,我设置等待3800s也就是1个小时多几分钟后读取,可根据自己的情况设定
    codec => json {
    }
    start_position => "beginning"
    sincedb_write_interval => 10
  }
}

目前看起来是改善了很多。

不过既然是设置等待3800s,也就是读取已经停止更新的文件,那么可以改为Read mode。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章