flume采集日志到kafka

原創

ZHBXS

2020-02-20 16:01

一、为flume构建agent

先进去flume下的配文件夹里面 (此处我的配置文件夹名字为：myconf) 编写构建agent的配置文件（命名为：flume2kafka.conf）

flume2kafka.conf

# 定义这个agent中各组件的名字
a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# 描述和配置source组件：r1
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /opt/datas
a1.sources.r1.fileHeader = true
 
# 描述和配置sink组件：k1
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = jsonTopic
a1.sinks.k1.kafka.bootstrap.servers = 127.0.0.1:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.ki.kafka.producer.compression.type = snappy
 
# 描述和配置channel组件，此处使用是内存缓存的方式
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
 
# 描述和配置source  channel   sink之间的连接关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

二、启动zookeeper

sh zkServer.sh start

三、启动kafka的producer

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

四、启动flume的agent

bin/flume-ng agent -c conf -f 配置文件夹名/配置文件名 -n a1 -Dflume.root.logger=INFO,console

五、启动kafka的消费者

bin/kafka-server-start.sh config/server.properties

topic 和配置文件flume2kafka.conf里的sink组件中的topic一致

这样就开启了日志采集日志采集完毕之后 flume会提示如下图：

文件会写入到kafka中具体路径是kafka配置文件中server.properties里面Log Basics的配置如下图：

查看文件

数据就写入上图文件中

六、遇到的问题

java.lang.IllegalStateException:File name has been re-used with different files. Spooling assumptions violated for /opt/data/hello.txt.COMPLETED

跟踪抛出异常的源码，SpoolDirectorySource会启动一个线程轮询监控目录下的目标文件，当读取完该文件(readEvents)之后会对该文件进行重名(rollCurrentFile)，当重命名失败时会抛出IllegalStateException，被SpoolDirectoryRunnable catch重新抛出RuntimeException，导致当前线程退出，从源码看SpoolDirectoryRunnable是单线程执行的，因此线程结束后，监控目录下其他文件不再被处理。所以，再新建个 word.txt 文件，flume没有监听动作了。

七、正确做法

不要在flume_test文件夹下直接新建文件，写内容。在其他文件下新建，写好内容，mv 到flume_test文件夹下。

[hadoop@nbdo3 ~]$ cd testdata/
[hadoop@nbdo3 testdata]$ ll
total 4
-rw-rw-r–. 1 hadoop hadoop 71 Mar 10 20:19 hello.txt
[hadoop@nbdo3 testdata]$ cp hello.txt …/data/
[hadoop@nbdo3 testdata]$ echo “123456778” >> world.txt
[hadoop@nbdo3 testdata]$ cp world.txt …/data/
[hadoop@nbdo3 testdata]$

------------- 本文结束感谢您的阅读😜-------------