案列一:多路複用
1)案例需求
使用 Flume-1 監控文件變動,Flume-1 將變動內容傳遞給 Flume-2,Flume-2 負責存儲
到 HDFS。同時 Flume-1 將變動內容傳遞給Flume-3,Flume-3 負責輸出到Local FileSystem。
2 需求架構圖:
步驟:
1.創建group1文件夾,創建flume-file-flume.conf文件
a1.sources=s1
a1.channels=c1 c2
a1.sinks=k1 k2
a1.sources.s1.type=TAILDIR
a1.sources.s1.posititionFile=/opt/flume/job/qiye/group1/posititionFile1.json
a1.sources.s1.filegroups=f1
a1.sources.s1.filegroups.f1=/opt/flume/job/qiye/group1/.*log
a1.channels.c1.type=memory
a1.channels.c2.type=memory
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=h1
a1.sinks.k1.port=10000
a1.sinks.k2.type=avro
a1.sinks.k2.hostname=h1
a1.sinks.k2.port=10001
a1.sources.s1.channels=c1 c2
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
- 創建flume-flume-hdfs.conf文件
a2.sources=s1
a2.sinks=k1
a2.channels=c1
a2.sources.s1.type=avro
a2.sources.s1.bind=h1
a2.sources.s1.port=10000
a2.sinks.k1.type=hdfs
a2.sinks.k1.hdfs.path=hdfs://h1:9000/flume/%Y%m%d/%H
a2.sinks.k1.hdfs.filePrefix=flume-
a2.sinks.k1.hdfs.useLocalTimeStamp = true
a2.sinks.k1.hdfs.batchSize = 100
a2.sinks.k1.hdfs.rollInterval = 1000
a2.channels.c1.type=memory
a2.sources.s1.channels=c1
a2.sinks.k1.channel=c1
3.創建flume-flume-dir.conf文件
a3.sources = r1
a3.sinks = k1
a3.channels = c2
# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = h1
a3.sources.r1.port = 10001
a3.sinks.k1.type = file_roll
a3.sinks.k1.sink.directory = /opt/flume/job/qiye/group1
# Describe the channel
a3.channels.c2.type = memory
# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2
4.先開啓a2,a3,再開啓a1
cd /opt/flume/bin
flume-ng agent --name a2 --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flume-hdfs.conf
flume-ng agent --name a3 --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flume-dir.conf
flume-ng agent --name a1 --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flie-flume.conf
- 在/opt/flume/job/qiye/group1文件夾下創建tes.log文件,查看hdfs
案例二:負載均衡和故障轉移
1)案例需求
使用 Flume1 監控一個端口,其 sink 組中的 sink 分別對接 Flume2 和 Flume3,採用FailoverSinkProcessor,實現故障轉移的功能。
2 ) 架構圖:
- 步驟:
1.創建 flume-netcat-flume.conf
配置 1 個 netcat source 和 1 個 channel、1 個 sink group(2 個 sink),分別輸送給 flume- flume-console1 和 flume-flume-console2。
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sinkgroups.g1.processor.type=failover
a1.sinkgroups.g1.processor.priority.k1=5
a1.sinkgroups.g1.processor.priority.k2=10
a1.sinkgroups.g1.processor.maxpenalty=10000
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop102
a1.sinks.k2.port = 4142
# Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
2.創建 flume-flume-console1.conf
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop102
a2.sources.r1.port = 4141
# Describe the sink a2.sinks.k1.type = logger
# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1
- a3與a2差不多
4)先開啓a2,a3,再開啓a1
bin/flume-ng agent --conf conf/ --name a3 --conf-file job/group2/flume-flume-console2.conf -
Dflume.root.logger=INFO,console
bin/flume-ng agent --conf conf/ --name a2 --conf-file job/group2/flume-flume-console1.conf -
Dflume.root.logger=INFO,console
bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group2/flume-netcat-flume.conf
5)往4444端口發送消息:
$ nc localhost 44444
案例三:自定義 Interceptor
1)案例需求
使用 Flume 採集服務器本地日誌,需要按照日誌類型的不同,將不同種類的日誌發往不同的分析系統。
2)需求分析
在實際的開發中,一臺服務器產生的日誌類型可能有很多種,不同類型的日誌可能需要發送到不同的分析系統。此時會用到 Flume 拓撲結構中的 Multiplexing 結構,Multiplexing的原理是,根據 event 中 Header 的某個 key 的值,將不同的 event 發送到不同的 Channel
不同的值。
在該案例中,我們以kafka發送數據模擬日誌,以數字(單個)和字母(單個)模擬不同類型的日誌,我們需要自定義 interceptor 區分數字和字母,將其分別發往不同的分析系統
(Channel)。
3)架構圖:
4步驟:
- 創建maven項目,引入依賴:
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.8.0</version>
</dependency>
2.繼承Intercetpor
package com.flume;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import java.util.List;
public class CustomInterceptor implements Interceptor {
@Override
public void initialize() {
}
@Override
public Event intercept(Event event) {
byte[] body=event.getBody();
if(body[0]<'z'&&body[0]>'a'){
event.getHeaders().put("type","letter");
}else if(body[0]>'0'&&body[0]<'9'){
event.getHeaders().put("type","number");
}
return event;
}
@Override
public List<Event> intercept(List<Event> list) {
for(Event event:list){
intercept(event);
}
return list;
}
@Override
public void close() {
}
public static class Builder implements Interceptor.Builder{
@Override
public Interceptor build() {
return new CustomInterceptor();
}
@Override
public void configure(Context context) {
}
}
}
3.打包,將jar包放入flume的Lib文件夾下面
4.創建kafka-flume.conf
a1.sources = s1
a1.sinks = k1 k2
a1.channels = c1 c2
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
#元數據的位置
a1.sources.s1.kafka.bootstrap.servers=h1:9092
a1.sources.s1.kafka.topics=topic_test
#監控的目錄
a1.sources.s1.filegroups= f1
a1.sources.s1.filegroups.f1= /opt/flume/job/qiye/group1/inter/.*log
a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=com.flume.CustomInterceptor$Builder
a1.sources.s1.selector.type=multiplexing
a1.sources.s1.selector.header=type
a1.sources.s1.selector.mapping.letter=c1
a1.sources.s1.selector.mapping.number=c2
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = h1
a1.sinks.k1.port = 10000
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = h1
a1.sinks.k2.port = 10001
a1.channels.c1.type = memory
a1.channels.c2.type = memory
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
a1.sources.s1.channels=c1 c2
5.創建logger1.conf
a2.sources = r1
a2.sinks = k1
a2.channels = c1
a2.sources.r1.type = avro
a2.sources.r1.bind = h1
a2.sources.r1.port = 10000
a2.sinks.k1.type = logger
a2.channels.c1.type = memory
a2.sinks.k1.channel = c1
a2.sources.r1.channels = c1
6.logger2和logger1差不多
7.先開啓a2,a2,在開啓a1,最後開啓kafka發送消息,注意順序:
flume-ng agent --name a3 --conf conf --conf-file /opt/flume/job/qiye/group1/inter/logger2.conf -Dflume.root.logger=INFO,console
flume-ng agent --name a2 --conf conf --conf-file /opt/flume/job/qiye/group1/inter/logger1.conf -Dflume.root.logger=INFO,console
flume-ng agent --name a1 --conf conf --conf-file /opt/flume/job/qiye/group1/inter/kafka-flume.conf -Dflume.root.logger=INFO,console
#kafka
bin/kafka-console-producer.sh -broker-list h1:9092 --topic topic_test