[日誌處理工作之二]使用flume-ng解析db2日誌的初步步驟

1.flume一般按行爲單位封裝一個event,並對event進行消費、過濾、攔截。針對DB2的日誌,多行爲一個event,而且行數不一致,所以我們最好定製自己的source component,我簡單修改了flume exec的源碼初步實現了將多行封裝成一個event的需求,後面要專門寫一個功能完善的custom source component.
   此處我修改的是flume-ng-core\src\main\java\org\apache\flume\source\ExecSource.java中的337-344行代碼,重新編譯。


2.flume 有抽取event message中的鏈式interceptors,使用正則表達式匹配,配置文件如下:
agent.sources=source1
agent.channels=channel1
agent.sinks=sink1


agent.sources.source1.type = exec
agent.sources.source1.command = tail -F /db2fs/home/db2inst1/sqllib/db2dump/db2diag.log
agent.sources.source1.interceptors = i1 i2 i3 i4 i5 i5 i6 i7 i8 i9 i10 i11 i12 i13 i14 i15 i16


agent.sources.source1.interceptors.i1.type=regex_extractor
agent.sources.source1.interceptors.i1.regex =([0-9]{4}-[0-9]{2}-[0-9]{2}-[0-9]{2}.[0-9]{2}.[0-9]{2}.[0-9]{6}[+][0-9]{3})\\s+
agent.sources.source1.interceptors.i1.serializers=s1
agent.sources.source1.interceptors.i1.serializers.s1.name=TIME


agent.sources.source1.interceptors.i2.type=regex_extractor
agent.sources.source1.interceptors.i2.regex =\\s+([A-Z0-9]+)\\s+LEVEL
agent.sources.source1.interceptors.i2.serializers=s1
agent.sources.source1.interceptors.i2.serializers.s1.name=ID


agent.sources.source1.interceptors.i3.type=regex_extractor
agent.sources.source1.interceptors.i3.regex =LEVEL:\\s([A-Z][a-z]+)PID
agent.sources.source1.interceptors.i3.serializers=s1
agent.sources.source1.interceptors.i3.serializers.s1.name=LEVEL


agent.sources.source1.interceptors.i4.type=regex_extractor
agent.sources.source1.interceptors.i4.regex =PID\\s+:\\s+([0-9]{1,5})\\s+TID
agent.sources.source1.interceptors.i4.serializers=s1
agent.sources.source1.interceptors.i4.serializers.s1.name=PID


agent.sources.source1.interceptors.i5.type=regex_extractor
agent.sources.source1.interceptors.i5.regex =TID\\s+:\\s+([0-9]{15})\\s+PROC
agent.sources.source1.interceptors.i5.serializers=s1
agent.sources.source1.interceptors.i5.serializers.s1.name=TID


agent.sources.source1.interceptors.i6.type=regex_extractor
agent.sources.source1.interceptors.i6.regex =PROC\\s+:\\s+([a-z0-9]+\\s*[0-9]*)
agent.sources.source1.interceptors.i6.serializers=s1
agent.sources.source1.interceptors.i6.serializers.s1.name=PROC


agent.sources.source1.interceptors.i7.type=regex_extractor
agent.sources.source1.interceptors.i7.regex =INSTANCE:\\s+([a-z0-9]+)\\s+
agent.sources.source1.interceptors.i7.serializers=s1
agent.sources.source1.interceptors.i7.serializers.s1.name=INSTANCE


agent.sources.source1.interceptors.i8.type=regex_extractor
agent.sources.source1.interceptors.i8.regex =NODE\\s+:\\s([0-9]{3})
agent.sources.source1.interceptors.i8.serializers=s1
agent.sources.source1.interceptors.i8.serializers.s1.name=NODE


agent.sources.source1.interceptors.i9.type=regex_extractor
agent.sources.source1.interceptors.i9.regex =NODE\\s+:\\s[0-9]{3}\\s*DB\\s+:\\s+([A-Z]{4})
agent.sources.source1.interceptors.i9.serializers=s1
agent.sources.source1.interceptors.i9.serializers.s1.name=DB


agent.sources.source1.interceptors.i10.type=regex_extractor
agent.sources.source1.interceptors.i10.regex =APPHDL\\s+:\\s+([0-9]+-[0-9]+)\\s*
agent.sources.source1.interceptors.i10.serializers=s1
agent.sources.source1.interceptors.i10.serializers.s1.name=APPHDL


agent.sources.source1.interceptors.i11.type=regex_extractor
agent.sources.source1.interceptors.i11.regex =APPID:\\s+(.[A-Z]+\.[0-9a-z]+\.[0-9]+)\\s*
agent.sources.source1.interceptors.i11.serializers=s1
agent.sources.source1.interceptors.i11.serializers.s1.name=APPID


agent.sources.source1.interceptors.i12.type=regex_extractor
agent.sources.source1.interceptors.i12.regex =AUTHID\\s+:\\s+([0-9A-Z]{7}[0-9])\\s+
agent.sources.source1.interceptors.i12.serializers=s1
agent.sources.source1.interceptors.i12.serializers.s1.name=AUTHID


agent.sources.source1.interceptors.i13.type=regex_extractor
agent.sources.source1.interceptors.i13.regex =HOSTNAME:\\s+([a-zA-Z0-9.]+)[A-Z]+
agent.sources.source1.interceptors.i13.serializers=s1
agent.sources.source1.interceptors.i13.serializers.s1.name=HOSTNAME


agent.sources.source1.interceptors.i14.type=regex_extractor
agent.sources.source1.interceptors.i14.regex =EDUID\\s+:\\s+([0-9]+)
agent.sources.source1.interceptors.i14.serializers=s1
agent.sources.source1.interceptors.i14.serializers.s1.name=EDUID


agent.sources.source1.interceptors.i15.type=regex_extractor
agent.sources.source1.interceptors.i15.regex =EDUNAME:\\s+([a-z0-9]+\\s*[\(a-zA-Z\)]*\\s0)\\s*[A-Z]+
agent.sources.source1.interceptors.i15.serializers=s1
agent.sources.source1.interceptors.i15.serializers.s1.name=EDUNAME


agent.sources.source1.interceptors.i16.type=regex_extractor
agent.sources.source1.interceptors.i16.regex =FUNCTION:\\s+(DB2\\s+[A-Z]+,[ a-zA-Z]+,[ a-zA-Z_/:]+,\\sprobe:[0-9]+)
agent.sources.source1.interceptors.i16.serializers=s1
agent.sources.source1.interceptors.i16.serializers.s1.name=FUNCTION


agent.channels.channel1.type=memory
agent.channels.channel1.capacity=1000
agent.channels.channel1.transactionCapacity=100


agent.sources.source1.channels=channel1
agent.sinks.sink1.channel=channel1


#agent.sinks.sink1.type=logger
agent.sinks.sink1.type=org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.sink1.batchSize=100
agent.sinks.sink1.hostNames=X.X.X.X:9300
agent.sinks.sink1.indexName=flume-db2
agent.sinks.sink1.indexType=bar_type
agent.sinks.sink1.clusterName=elasticsearch
agent.sinks.sink1.serializer=org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer


3.另外flume有多種interceptor可以對event中的message進行整理,後面繼續研究


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章