6.Flume 企业开发案例与架构设计

案列一:多路复用

1）案例需求
使用 Flume-1 监控文件变动，Flume-1 将变动内容传递给 Flume-2，Flume-2 负责存储
到 HDFS。同时 Flume-1 将变动内容传递给Flume-3，Flume-3 负责输出到Local FileSystem。
2 需求架构图:

步骤:
1.创建group1文件夹,创建flume-file-flume.conf文件

a1.sources=s1
a1.channels=c1 c2
a1.sinks=k1 k2

a1.sources.s1.type=TAILDIR
a1.sources.s1.posititionFile=/opt/flume/job/qiye/group1/posititionFile1.json
a1.sources.s1.filegroups=f1
a1.sources.s1.filegroups.f1=/opt/flume/job/qiye/group1/.*log

a1.channels.c1.type=memory
a1.channels.c2.type=memory

a1.sinks.k1.type=avro
a1.sinks.k1.hostname=h1
a1.sinks.k1.port=10000

a1.sinks.k2.type=avro
a1.sinks.k2.hostname=h1
a1.sinks.k2.port=10001

a1.sources.s1.channels=c1 c2
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2

创建flume-flume-hdfs.conf文件

a2.sources=s1
a2.sinks=k1
a2.channels=c1



a2.sources.s1.type=avro
a2.sources.s1.bind=h1
a2.sources.s1.port=10000

a2.sinks.k1.type=hdfs
a2.sinks.k1.hdfs.path=hdfs://h1:9000/flume/%Y%m%d/%H
a2.sinks.k1.hdfs.filePrefix=flume-
a2.sinks.k1.hdfs.useLocalTimeStamp = true
a2.sinks.k1.hdfs.batchSize = 100
a2.sinks.k1.hdfs.rollInterval = 1000

a2.channels.c1.type=memory

a2.sources.s1.channels=c1
a2.sinks.k1.channel=c1

3.创建flume-flume-dir.conf文件

a3.sources = r1
a3.sinks = k1 
a3.channels = c2

# Describe/configure the source 
a3.sources.r1.type = avro 
a3.sources.r1.bind = h1
a3.sources.r1.port = 10001

a3.sinks.k1.type = file_roll 
a3.sinks.k1.sink.directory = /opt/flume/job/qiye/group1

# Describe the channel 
a3.channels.c2.type = memory 

# Bind the source and sink to the channel 
a3.sources.r1.channels = c2 
a3.sinks.k1.channel = c2

4.先开启a2,a3,再开启a1

cd /opt/flume/bin


flume-ng agent   --name a2  --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flume-hdfs.conf
flume-ng agent   --name a3  --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flume-dir.conf
flume-ng agent   --name a1  --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flie-flume.conf

在/opt/flume/job/qiye/group1文件夹下创建tes.log文件,查看hdfs

案例二:负载均衡和故障转移

1）案例需求
使用 Flume1 监控一个端口，其 sink 组中的 sink 分别对接 Flume2 和 Flume3，采用FailoverSinkProcessor，实现故障转移的功能。
2 ) 架构图:

步骤:
1.创建 flume-netcat-flume.conf
配置 1 个 netcat source 和 1 个 channel、1 个 sink group（2 个 sink），分别输送给 flume- flume-console1 和 flume-flume-console2。

a1.sources = r1
a1.channels = c1 
a1.sinkgroups = g1 
a1.sinks = k1 k2

a1.sources.r1.type = netcat 
a1.sources.r1.bind = localhost 
a1.sources.r1.port = 44444

a1.sinkgroups.g1.processor.type=failover
a1.sinkgroups.g1.processor.priority.k1=5
a1.sinkgroups.g1.processor.priority.k2=10
a1.sinkgroups.g1.processor.maxpenalty=10000

a1.sinks.k1.type = avro 
a1.sinks.k1.hostname = hadoop102
a1.sinks.k2.type = avro 
a1.sinks.k2.hostname = hadoop102 
a1.sinks.k2.port = 4142

# Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel 
a1.sources.r1.channels = c1 
a1.sinkgroups.g1.sinks = k1 k2 
a1.sinks.k1.channel = c1 
a1.sinks.k2.channel = c1

2.创建 flume-flume-console1.conf


# Name the components on this agent
 a2.sources = r1
a2.sinks = k1 
a2.channels = c1

# Describe/configure the source 
a2.sources.r1.type = avro 
a2.sources.r1.bind = hadoop102 
a2.sources.r1.port = 4141

# Describe the sink a2.sinks.k1.type = logger

# Describe the channel 
a2.channels.c1.type = memory 
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel 
a2.sources.r1.channels = c1 
a2.sinks.k1.channel = c1

a3与a2差不多

4)先开启a2,a3,再开启a1



bin/flume-ng agent --conf conf/ --name a3	--conf-file	job/group2/flume-flume-console2.conf	-
Dflume.root.logger=INFO,console

 bin/flume-ng agent --conf conf/ --name a2	--conf-file	job/group2/flume-flume-console1.conf	-
Dflume.root.logger=INFO,console

bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group2/flume-netcat-flume.conf

5)往4444端口发送消息:

$ nc localhost 44444

案例三:自定义 Interceptor

1）案例需求
使用 Flume 采集服务器本地日志，需要按照日志类型的不同，将不同种类的日志发往不同的分析系统。
2）需求分析
在实际的开发中，一台服务器产生的日志类型可能有很多种，不同类型的日志可能需要发送到不同的分析系统。此时会用到 Flume 拓扑结构中的 Multiplexing 结构，Multiplexing的原理是，根据 event 中 Header 的某个 key 的值，将不同的 event 发送到不同的 Channel
不同的值。

在该案例中，我们以kafka发送数据模拟日志，以数字（单个）和字母（单个）模拟不同类型的日志，我们需要自定义 interceptor 区分数字和字母，将其分别发往不同的分析系统
（Channel）。
3)架构图:

4步骤:

创建maven项目,引入依赖:

<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.8.0</version>
</dependency>

2.继承Intercetpor

package com.flume;


import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.util.List;

public class CustomInterceptor implements Interceptor {

    @Override
    public void initialize() {

    }

    @Override
    public Event intercept(Event event) {
        byte[] body=event.getBody();
        if(body[0]<'z'&&body[0]>'a'){
            event.getHeaders().put("type","letter");
        }else if(body[0]>'0'&&body[0]<'9'){
            event.getHeaders().put("type","number");
        }
        return event;
    }

    @Override
    public List<Event> intercept(List<Event> list) {
        for(Event event:list){
            intercept(event);
        }
        return list;
    }

    @Override
    public void close() {

    }
    public static class Builder implements Interceptor.Builder{

        @Override
        public Interceptor build() {
            return new CustomInterceptor();
        }

        @Override
        public void configure(Context context) {

        }
    }
}

3.打包,将jar包放入flume的Lib文件夹下面
4.创建kafka-flume.conf

a1.sources = s1
a1.sinks = k1 k2 
a1.channels = c1 c2


a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
#元数据的位置
a1.sources.s1.kafka.bootstrap.servers=h1:9092
a1.sources.s1.kafka.topics=topic_test
#监控的目录
a1.sources.s1.filegroups= f1
a1.sources.s1.filegroups.f1= /opt/flume/job/qiye/group1/inter/.*log


a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=com.flume.CustomInterceptor$Builder

a1.sources.s1.selector.type=multiplexing
a1.sources.s1.selector.header=type
a1.sources.s1.selector.mapping.letter=c1
a1.sources.s1.selector.mapping.number=c2

a1.sinks.k1.type = avro 
a1.sinks.k1.hostname = h1
a1.sinks.k1.port = 10000
 
a1.sinks.k2.type = avro 
a1.sinks.k2.hostname = h1
a1.sinks.k2.port = 10001

a1.channels.c1.type = memory 
a1.channels.c2.type = memory 


a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
a1.sources.s1.channels=c1 c2

5.创建logger1.conf

a2.sources = r1 
a2.sinks = k1 
a2.channels = c1

a2.sources.r1.type = avro 
a2.sources.r1.bind = h1
a2.sources.r1.port = 10000

a2.sinks.k1.type = logger

a2.channels.c1.type = memory 

a2.sinks.k1.channel = c1 
a2.sources.r1.channels = c1

6.logger2和logger1差不多

7.先开启a2,a2,在开启a1,最后开启kafka发送消息,注意顺序:

flume-ng agent   --name a3  --conf conf --conf-file /opt/flume/job/qiye/group1/inter/logger2.conf  -Dflume.root.logger=INFO,console
flume-ng agent   --name a2  --conf conf --conf-file /opt/flume/job/qiye/group1/inter/logger1.conf  -Dflume.root.logger=INFO,console
flume-ng agent   --name a1  --conf conf --conf-file /opt/flume/job/qiye/group1/inter/kafka-flume.conf   -Dflume.root.logger=INFO,console


#kafka
bin/kafka-console-producer.sh -broker-list h1:9092 --topic topic_test

6.Flume 企业开发案例与架构设计

案列一:多路复用

案例二:负载均衡和故障转移

案例三:自定义 Interceptor

2.項目需求解析

spark(二)--spark-core---RDD進階知識(圖文詳解,基於IDEA開發)

電商平臺分析平臺----需求六:實時統計之黑名單機制

電商平臺分析平臺----需求七,九前置知識

需求一:各個範圍Session步長、訪問時長佔比統計

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結