案列一:多路复用
1)案例需求
使用 Flume-1 监控文件变动,Flume-1 将变动内容传递给 Flume-2,Flume-2 负责存储
到 HDFS。同时 Flume-1 将变动内容传递给Flume-3,Flume-3 负责输出到Local FileSystem。
2 需求架构图:
步骤:
1.创建group1文件夹,创建flume-file-flume.conf文件
a1.sources=s1
a1.channels=c1 c2
a1.sinks=k1 k2
a1.sources.s1.type=TAILDIR
a1.sources.s1.posititionFile=/opt/flume/job/qiye/group1/posititionFile1.json
a1.sources.s1.filegroups=f1
a1.sources.s1.filegroups.f1=/opt/flume/job/qiye/group1/.*log
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=h1
a1.sinks.k1.port=10000
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=h1
a1.sinks.k2.port=10001
a1.sources.s1.channels=c1 c2
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
- 创建flume-flume-hdfs.conf文件
a2.sources=s1
a2.sinks=k1
a2.channels=c1
a2.sources.s1.type=avro
a2.sources.s1.bind=h1
a2.sources.s1.port=10000
a2.sinks.k1.type=hdfs
a2.sinks.k1.hdfs.path=hdfs://h1:9000/flume/%Y%m%d/%H
a2.sinks.k1.hdfs.filePrefix=flume-
a2.sinks.k1.hdfs.useLocalTimeStamp = true
a2.sinks.k1.hdfs.batchSize = 100
a2.sinks.k1.hdfs.rollInterval = 1000
a2.channels.c1.type=memory
a2.sources.s1.channels=c1
a2.sinks.k1.channel=c1
3.创建flume-flume-dir.conf文件
a3.sources = r1
a3.sinks = k1
a3.channels = c2
# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = h1
a3.sources.r1.port = 10001
a3.sinks.k1.type = file_roll
a3.sinks.k1.sink.directory = /opt/flume/job/qiye/group1
# Describe the channel
a3.channels.c2.type = memory
# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2
4.先开启a2,a3,再开启a1
cd /opt/flume/bin
flume-ng agent --name a2 --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flume-hdfs.conf
flume-ng agent --name a3 --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flume-dir.conf
flume-ng agent --name a1 --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flie-flume.conf
- 在/opt/flume/job/qiye/group1文件夹下创建tes.log文件,查看hdfs
案例二:负载均衡和故障转移
1)案例需求
使用 Flume1 监控一个端口,其 sink 组中的 sink 分别对接 Flume2 和 Flume3,采用FailoverSinkProcessor,实现故障转移的功能。
2 ) 架构图:
- 步骤:
1.创建 flume-netcat-flume.conf
配置 1 个 netcat source 和 1 个 channel、1 个 sink group(2 个 sink),分别输送给 flume- flume-console1 和 flume-flume-console2。
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sinkgroups.g1.processor.type=failover
a1.sinkgroups.g1.processor.priority.k1=5
a1.sinkgroups.g1.processor.priority.k2=10
a1.sinkgroups.g1.processor.maxpenalty=10000
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop102
a1.sinks.k2.port = 4142
# Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
2.创建 flume-flume-console1.conf
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop102
a2.sources.r1.port = 4141
# Describe the sink a2.sinks.k1.type = logger
# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1
- a3与a2差不多
4)先开启a2,a3,再开启a1
bin/flume-ng agent --conf conf/ --name a3 --conf-file job/group2/flume-flume-console2.conf -
Dflume.root.logger=INFO,console
bin/flume-ng agent --conf conf/ --name a2 --conf-file job/group2/flume-flume-console1.conf -
Dflume.root.logger=INFO,console
bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group2/flume-netcat-flume.conf
5)往4444端口发送消息:
$ nc localhost 44444
案例三:自定义 Interceptor
1)案例需求
使用 Flume 采集服务器本地日志,需要按照日志类型的不同,将不同种类的日志发往不同的分析系统。
2)需求分析
在实际的开发中,一台服务器产生的日志类型可能有很多种,不同类型的日志可能需要发送到不同的分析系统。此时会用到 Flume 拓扑结构中的 Multiplexing 结构,Multiplexing的原理是,根据 event 中 Header 的某个 key 的值,将不同的 event 发送到不同的 Channel
不同的值。
在该案例中,我们以kafka发送数据模拟日志,以数字(单个)和字母(单个)模拟不同类型的日志,我们需要自定义 interceptor 区分数字和字母,将其分别发往不同的分析系统
(Channel)。
3)架构图:
4步骤:
- 创建maven项目,引入依赖:
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.8.0</version>
</dependency>
2.继承Intercetpor
package com.flume;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import java.util.List;
public class CustomInterceptor implements Interceptor {
@Override
public void initialize() {
}
@Override
public Event intercept(Event event) {
byte[] body=event.getBody();
if(body[0]<'z'&&body[0]>'a'){
event.getHeaders().put("type","letter");
}else if(body[0]>'0'&&body[0]<'9'){
event.getHeaders().put("type","number");
}
return event;
}
@Override
public List<Event> intercept(List<Event> list) {
for(Event event:list){
intercept(event);
}
return list;
}
@Override
public void close() {
}
public static class Builder implements Interceptor.Builder{
@Override
public Interceptor build() {
return new CustomInterceptor();
}
@Override
public void configure(Context context) {
}
}
}
3.打包,将jar包放入flume的Lib文件夹下面
4.创建kafka-flume.conf
a1.sources = s1
a1.sinks = k1 k2
a1.channels = c1 c2
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
#元数据的位置
a1.sources.s1.kafka.bootstrap.servers=h1:9092
a1.sources.s1.kafka.topics=topic_test
#监控的目录
a1.sources.s1.filegroups= f1
a1.sources.s1.filegroups.f1= /opt/flume/job/qiye/group1/inter/.*log
a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=com.flume.CustomInterceptor$Builder
a1.sources.s1.selector.type=multiplexing
a1.sources.s1.selector.header=type
a1.sources.s1.selector.mapping.letter=c1
a1.sources.s1.selector.mapping.number=c2
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = h1
a1.sinks.k1.port = 10000
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = h1
a1.sinks.k2.port = 10001
a1.channels.c1.type = memory
a1.channels.c2.type = memory
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
a1.sources.s1.channels=c1 c2
5.创建logger1.conf
a2.sources = r1
a2.sinks = k1
a2.channels = c1
a2.sources.r1.type = avro
a2.sources.r1.bind = h1
a2.sources.r1.port = 10000
a2.sinks.k1.type = logger
a2.channels.c1.type = memory
a2.sinks.k1.channel = c1
a2.sources.r1.channels = c1
6.logger2和logger1差不多
7.先开启a2,a2,在开启a1,最后开启kafka发送消息,注意顺序:
flume-ng agent --name a3 --conf conf --conf-file /opt/flume/job/qiye/group1/inter/logger2.conf -Dflume.root.logger=INFO,console
flume-ng agent --name a2 --conf conf --conf-file /opt/flume/job/qiye/group1/inter/logger1.conf -Dflume.root.logger=INFO,console
flume-ng agent --name a1 --conf conf --conf-file /opt/flume/job/qiye/group1/inter/kafka-flume.conf -Dflume.root.logger=INFO,console
#kafka
bin/kafka-console-producer.sh -broker-list h1:9092 --topic topic_test