[日誌處理工作之一]整合elasticsearch,kibana,flume-ng,kafka實時採集tomcat日誌

本文涉及的所有操作均在一個CentOS 6.5虛擬機內,部署成功後可供開發和測試使用


各程序版本:apache-flume-1.7.0  apache-tomcat-7.0.27  elasticsearch-1.5.2  kafka_2.11-0.8.2.1  kibana-4.0.2  scala-2.11


Step 1.關於flume:
apache flume當前版本爲1.5.2,內置沒有對kafka的支持,1.7.0版本會正式發佈對kafka的支持。
把1.7版本源代碼下載下來編譯,編譯過程中遇到下載ua-parser-1.3.0.pom失敗,新增
<repository>
  <id>p2.jfrog.org</id>
  <url>http://p2.jfrog.org/libs-releases</url>
</repository>
後解決,編譯成功。詳情可參照
http://blog.csdn.net/yydcj/article/details/38824823
之後啓動flume agent,參考腳本:
bin/flume-ng agent -n agent -c conf -f conf/case1-elasticsearch.conf -Dflume.root.logger=INFO,console


Step 2.啓動zookeeper
kafka內置了zookeeper:
#bin/zookeeper-server-start.sh config/zookeeper.properties
啓動kafka server:
#bin/kafka-server-start.sh config/server.properties


Step 3.start a kafka consumer to test
#bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
通過flume agent source傳輸數據就可以傳輸到kafka的consumer中了


Step 4.開啓flume agent來訂閱kafka中的消息
需要先向flume lib下導入zookeeper的jar包(在kafka安裝目錄lib下)


5.啓動elasticsearch:
#bin/elasticsearch
要想讓flume可以向elastic裏寫入數據,需要向flume的lib裏引入
elasticsearch的jar包,lucene-core的jar包


6.配置好config下kibana.yml中的host、port、elasticsearch_url後啓動kibana:
#bin/kibana


附flume的配置:
case1-flume-kafka.conf:
agent.sources=source1
agent.channels=channel1
agent.sinks=sink1


agent.sources.source1.type=netcat
agent.sources.source1.bind=localhost
agent.sources.source1.port=44444


agent.channels.channel1.type=memory
agent.channels.channel1.capacity=1000
agent.channels.channel1.transactionCapacity=100


agent.sources.source1.channels=channel1
agent.sinks.sink1.channel=channel1


agent.sinks.sink1.type=org.apache.flume.sink.kafka.KafkaSink
agent.sinks.sink1.topic=test
agent.sinks.sink1.brokerList=localhost:9092
agent.sinks.sink1.requestAcks=1
agent.sinks.sink1.batchSize=20




case1-kafka-flume.conf:
agent.sources=source2
agent.channels=channel2
agent.sinks=sink2


agent.sources.source2.type=org.apache.flume.source.kafka.KafkaSource
agent.sources.source2.zookeeperConnect=localhost:2181
agent.sources.source2.topic=test
agent.sources.source2.groupId=flume
agent.sources.source2.kafka.consumer.timeout.ms=100


agent.channels.channel2.type=memory
agent.channels.channel2.capacity=1000
agent.channels.channel2.transactionCapacity=100


agent.sources.source2.channels=channel2
agent.sinks.sink2.channel=channel2


agent.sinks.sink2.type=org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.sink2.batchSize=100
agent.sinks.sink2.hostNames=172.168.0.10:9300
agent.sinks.sink2.indexName=flume
agent.sinks.sink2.indexType=bar_type
agent.sinks.sink2.clusterName=elasticsearch
agent.sinks.sink2.serializer=org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer


7.使用tail -F實時監控tomcat產生的日誌,tomcat log格式使用默認配置:
flume配置文件case-exec-elasticsearch.conf:
agent.sources=r1
agent.channels=channel2
agent.sinks=sink2


agent.sources.r1.type = exec
agent.sources.r1.command = tail -F /db2fs/opt/log-process/apache-tomcat-7.0.27/logs/localhost_access_log.2015-05-25.txt
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type=regex_extractor
agent.sources.r1.interceptors.i1.regex = (?[0][:0-9]*|[0-9]+.[0-9]+.[0-9]+.[0-9]+)\\s-\\s-\\s\\[(.*)\\]\\s"(.*)"\\s([0-9]{3}|-)\\s([0-9]+|-)
agent.sources.r1.interceptors.i1.serializers=s1 s2 s3 s4 s5
agent.sources.r1.interceptors.i1.serializers.s1.name=IP
agent.sources.r1.interceptors.i1.serializers.s2.name=TIME
agent.sources.r1.interceptors.i1.serializers.s3.name=PROTOCAL
agent.sources.r1.interceptors.i1.serializers.s4.name=STATUS_CODE
agent.sources.r1.interceptors.i1.serializers.s5.name=BYTE_COUNT


agent.channels.channel2.type=memory
agent.channels.channel2.capacity=1000
agent.channels.channel2.transactionCapacity=100


agent.sources.r1.channels=channel2
agent.sinks.sink2.channel=channel2


agent.sinks.sink2.type=org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.sink2.batchSize=100
agent.sinks.sink2.hostNames=9.115.42.108:9300
agent.sinks.sink2.indexName=tomcat
agent.sinks.sink2.indexType=bar_type
agent.sinks.sink2.clusterName=elasticsearch
agent.sinks.sink2.serializer=org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer


8.按照7所配置,監控日誌文件即可。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章