1、两个Agent串联
串联的Agent中间要采用Avro Sink和 Avro Source方式进行数据传输
案例:
Agent的结构:source -> channel -> sink -> source -> channel -> sink
Agent,Source选择:exec->memory->avro->avro->memory->logger
我是一台虚拟机测试,如果是两台或多台Agent,要注意bind的地址
exec-avro-agent.conf
###exec-avro-agent.conf文件###
exec-avro-agent.sources = exec-source
exec-avro-agent.channels = memory-channel
exec-avro-agent.sinks = avro-sink
exec-avro-agent.sources.exec-source.type = exec
exec-avro-agent.sources.exec-source.command = tail -F /home/hadoop/data/flume/multiple/chuanlian/input/avro_access.data
exec-avro-agent.channels.memory-channel.type = memory
exec-avro-agent.sinks.avro-sink.type = avro
exec-avro-agent.sinks.avro-sink.hostname = localhost
exec-avro-agent.sinks.avro-sink.port = 44444
exec-avro-agent.sources.exec-source.channels = memory-channel
exec-avro-agent.sinks.avro-sink.channel = memory-channel
avro-logger-agent.conf
###avro-logger-agent.conf文件###
avro-logger-agent.sources = avro-source
avro-logger-agent.channels = memory-channel
avro-logger-agent.sinks = logger-sink
avro-logger-agent.sources.avro-source.type = avro
avro-logger-agent.sources.avro-source.bind = localhost
avro-logger-agent.sources.avro-source.port = 44444
avro-logger-agent.channels.memory-channel.type = memory
avro-logger-agent.sinks.logger-sink.type = logger
avro-logger-agent.sources.avro-source.channels = memory-channel
avro-logger-agent.sinks.logger-sink.channel = memory-channel
先启动 avro-logger agent
###先启动 avro-logger agent ####
flume-ng agent \
--name avro-logger-agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/avro-logger-agent.conf \
-Dflume.root.logger=INFO,console
克隆一台,再启动 exec-avro agent
###再启动 exec-avro agent ####
flume-ng agent \
--name exec-avro-agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec-avro-agent.conf \
-Dflume.root.logger=INFO,console
测试
[hadoop@vm01 input]$ mkdir -p multiple/chuanlian/input/
[hadoop@vm01 input]$ vi avro_access.data
[hadoop@vm01 input]$ echo "hello hadoop" >>avro_access.data
2、单Source多Chanel/Sink
Multiplexing the flow:
- Multiplexing Channel Selector :多路Channel选择器,是将根据自定义的选择器规则,将数据发送到指定Channel上,比如同一份日志中根据不同业务,选择性的Sink到HDFS不同目录下
- Replicating Channel Selector :多副本Channel选择器,每个Channel数据是一样的。比如同时传送数据到HDFS做批处理、Kafka流式处理。
Replicating Channel Selector
一个Source的数据传送一份到Hdfs,另一份输出到控制台
NetCat Source -->memory-->Hdfs
-->memory-->logger
配置文件
replicating-channel-agent.conf
a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sources.r1.selector.type = replicating
a1.channels.c1.type = memory
a1.channels.c2.type = memory
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://vm01:9000/flume/multipleFlow/%Y%m%d%H%M
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.filePrefix = wsktest-
a1.sinks.k1.hdfs.rollInterval = 30
a1.sinks.k1.hdfs.rollSize = 100000000
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k2.type = logger
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
启动
flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/replicating-channel-agent.conf \
-Dflume.root.logger=INFO,console
测试
克隆一台,telnet
[hadoop@vm01 conf]$ telnet localhost 44444
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello hadoop
OK
去Hdfs查看
[hadoop@vm01 input]$ hdfs dfs -text /flume/multipleFlow/201908081917/*
hello hadoop
[hadoop@vm01 input]$
3、单Source到HDFS和Kafka
配置信息
Taildir-HdfsAndKafka-Agnet.sources = taildir-source
Taildir-HdfsAndKafka-Agnet.channels = c1 c2
Taildir-HdfsAndKafka-Agnet.sinks = hdfs-sink kafka-sink
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.type = TAILDIR
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups = f1
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups.f1 = /home/hadoop/data/flume/HdfsAndKafka/input/.*
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.positionFile = /home/hadoop/data/flume/HdfsAndKafka/taildir_position/taildir_position.json
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.selector.type = replicating
Taildir-HdfsAndKafka-Agnet.channels.c1.type = memory
Taildir-HdfsAndKafka-Agnet.channels.c2.type = memory
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.type = hdfs
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.path = hdfs://hadoop001:9000/flume/HdfsAndKafka/%Y%m%d%H%M
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.useLocalTimeStamp=true
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.filePrefix = wsktest-
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollInterval = 10
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollSize = 100000000
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollCount = 0
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.fileType=DataStream
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.writeFormat=Text
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.brokerList = localhost:9092
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.topic = wsk_test
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.channels = c1 c2
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.channel = c1
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.channel = c2
启动
flume-ng agent \
--name Taildir-HdfsAndKafka-Agnet \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/Taildir-HdfsAndKafka-Agnet.conf \
-Dflume.root.logger=INFO,console