6.Flume 企業開發案例與架構設計

案列一:多路複用

1）案例需求
使用 Flume-1 監控文件變動，Flume-1 將變動內容傳遞給 Flume-2，Flume-2 負責存儲
到 HDFS。同時 Flume-1 將變動內容傳遞給Flume-3，Flume-3 負責輸出到Local FileSystem。
2 需求架構圖:

步驟:
1.創建group1文件夾,創建flume-file-flume.conf文件

a1.sources=s1
a1.channels=c1 c2
a1.sinks=k1 k2

a1.sources.s1.type=TAILDIR
a1.sources.s1.posititionFile=/opt/flume/job/qiye/group1/posititionFile1.json
a1.sources.s1.filegroups=f1
a1.sources.s1.filegroups.f1=/opt/flume/job/qiye/group1/.*log

a1.channels.c1.type=memory
a1.channels.c2.type=memory

a1.sinks.k1.type=avro
a1.sinks.k1.hostname=h1
a1.sinks.k1.port=10000

a1.sinks.k2.type=avro
a1.sinks.k2.hostname=h1
a1.sinks.k2.port=10001

a1.sources.s1.channels=c1 c2
a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2

創建flume-flume-hdfs.conf文件

a2.sources=s1
a2.sinks=k1
a2.channels=c1



a2.sources.s1.type=avro
a2.sources.s1.bind=h1
a2.sources.s1.port=10000

a2.sinks.k1.type=hdfs
a2.sinks.k1.hdfs.path=hdfs://h1:9000/flume/%Y%m%d/%H
a2.sinks.k1.hdfs.filePrefix=flume-
a2.sinks.k1.hdfs.useLocalTimeStamp = true
a2.sinks.k1.hdfs.batchSize = 100
a2.sinks.k1.hdfs.rollInterval = 1000

a2.channels.c1.type=memory

a2.sources.s1.channels=c1
a2.sinks.k1.channel=c1

3.創建flume-flume-dir.conf文件

a3.sources = r1
a3.sinks = k1 
a3.channels = c2

# Describe/configure the source 
a3.sources.r1.type = avro 
a3.sources.r1.bind = h1
a3.sources.r1.port = 10001

a3.sinks.k1.type = file_roll 
a3.sinks.k1.sink.directory = /opt/flume/job/qiye/group1

# Describe the channel 
a3.channels.c2.type = memory 

# Bind the source and sink to the channel 
a3.sources.r1.channels = c2 
a3.sinks.k1.channel = c2

4.先開啓a2,a3,再開啓a1

cd /opt/flume/bin


flume-ng agent   --name a2  --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flume-hdfs.conf
flume-ng agent   --name a3  --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flume-dir.conf
flume-ng agent   --name a1  --conf conf --conf-file /opt/flume/job/qiye/group1/flume-flie-flume.conf

在/opt/flume/job/qiye/group1文件夾下創建tes.log文件,查看hdfs

案例二:負載均衡和故障轉移

1）案例需求
使用 Flume1 監控一個端口，其 sink 組中的 sink 分別對接 Flume2 和 Flume3，採用FailoverSinkProcessor，實現故障轉移的功能。
2 ) 架構圖:

步驟:
1.創建 flume-netcat-flume.conf
配置 1 個 netcat source 和 1 個 channel、1 個 sink group（2 個 sink），分別輸送給 flume- flume-console1 和 flume-flume-console2。

a1.sources = r1
a1.channels = c1 
a1.sinkgroups = g1 
a1.sinks = k1 k2

a1.sources.r1.type = netcat 
a1.sources.r1.bind = localhost 
a1.sources.r1.port = 44444

a1.sinkgroups.g1.processor.type=failover
a1.sinkgroups.g1.processor.priority.k1=5
a1.sinkgroups.g1.processor.priority.k2=10
a1.sinkgroups.g1.processor.maxpenalty=10000

a1.sinks.k1.type = avro 
a1.sinks.k1.hostname = hadoop102
a1.sinks.k2.type = avro 
a1.sinks.k2.hostname = hadoop102 
a1.sinks.k2.port = 4142

# Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel 
a1.sources.r1.channels = c1 
a1.sinkgroups.g1.sinks = k1 k2 
a1.sinks.k1.channel = c1 
a1.sinks.k2.channel = c1

2.創建 flume-flume-console1.conf


# Name the components on this agent
 a2.sources = r1
a2.sinks = k1 
a2.channels = c1

# Describe/configure the source 
a2.sources.r1.type = avro 
a2.sources.r1.bind = hadoop102 
a2.sources.r1.port = 4141

# Describe the sink a2.sinks.k1.type = logger

# Describe the channel 
a2.channels.c1.type = memory 
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel 
a2.sources.r1.channels = c1 
a2.sinks.k1.channel = c1

a3與a2差不多

4)先開啓a2,a3,再開啓a1



bin/flume-ng agent --conf conf/ --name a3	--conf-file	job/group2/flume-flume-console2.conf	-
Dflume.root.logger=INFO,console

 bin/flume-ng agent --conf conf/ --name a2	--conf-file	job/group2/flume-flume-console1.conf	-
Dflume.root.logger=INFO,console

bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group2/flume-netcat-flume.conf

5)往4444端口發送消息:

$ nc localhost 44444

案例三:自定義 Interceptor

1）案例需求
使用 Flume 採集服務器本地日誌，需要按照日誌類型的不同，將不同種類的日誌發往不同的分析系統。
2）需求分析
在實際的開發中，一臺服務器產生的日誌類型可能有很多種，不同類型的日誌可能需要發送到不同的分析系統。此時會用到 Flume 拓撲結構中的 Multiplexing 結構，Multiplexing的原理是，根據 event 中 Header 的某個 key 的值，將不同的 event 發送到不同的 Channel
不同的值。

在該案例中，我們以kafka發送數據模擬日誌，以數字（單個）和字母（單個）模擬不同類型的日誌，我們需要自定義 interceptor 區分數字和字母，將其分別發往不同的分析系統
（Channel）。
3)架構圖:

4步驟:

創建maven項目,引入依賴:

<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.8.0</version>
</dependency>

2.繼承Intercetpor

package com.flume;


import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.util.List;

public class CustomInterceptor implements Interceptor {

    @Override
    public void initialize() {

    }

    @Override
    public Event intercept(Event event) {
        byte[] body=event.getBody();
        if(body[0]<'z'&&body[0]>'a'){
            event.getHeaders().put("type","letter");
        }else if(body[0]>'0'&&body[0]<'9'){
            event.getHeaders().put("type","number");
        }
        return event;
    }

    @Override
    public List<Event> intercept(List<Event> list) {
        for(Event event:list){
            intercept(event);
        }
        return list;
    }

    @Override
    public void close() {

    }
    public static class Builder implements Interceptor.Builder{

        @Override
        public Interceptor build() {
            return new CustomInterceptor();
        }

        @Override
        public void configure(Context context) {

        }
    }
}

3.打包,將jar包放入flume的Lib文件夾下面
4.創建kafka-flume.conf

a1.sources = s1
a1.sinks = k1 k2 
a1.channels = c1 c2


a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
#元數據的位置
a1.sources.s1.kafka.bootstrap.servers=h1:9092
a1.sources.s1.kafka.topics=topic_test
#監控的目錄
a1.sources.s1.filegroups= f1
a1.sources.s1.filegroups.f1= /opt/flume/job/qiye/group1/inter/.*log


a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=com.flume.CustomInterceptor$Builder

a1.sources.s1.selector.type=multiplexing
a1.sources.s1.selector.header=type
a1.sources.s1.selector.mapping.letter=c1
a1.sources.s1.selector.mapping.number=c2

a1.sinks.k1.type = avro 
a1.sinks.k1.hostname = h1
a1.sinks.k1.port = 10000
 
a1.sinks.k2.type = avro 
a1.sinks.k2.hostname = h1
a1.sinks.k2.port = 10001

a1.channels.c1.type = memory 
a1.channels.c2.type = memory 


a1.sinks.k1.channel=c1
a1.sinks.k2.channel=c2
a1.sources.s1.channels=c1 c2

5.創建logger1.conf

a2.sources = r1 
a2.sinks = k1 
a2.channels = c1

a2.sources.r1.type = avro 
a2.sources.r1.bind = h1
a2.sources.r1.port = 10000

a2.sinks.k1.type = logger

a2.channels.c1.type = memory 

a2.sinks.k1.channel = c1 
a2.sources.r1.channels = c1

6.logger2和logger1差不多

7.先開啓a2,a2,在開啓a1,最後開啓kafka發送消息,注意順序:

flume-ng agent   --name a3  --conf conf --conf-file /opt/flume/job/qiye/group1/inter/logger2.conf  -Dflume.root.logger=INFO,console
flume-ng agent   --name a2  --conf conf --conf-file /opt/flume/job/qiye/group1/inter/logger1.conf  -Dflume.root.logger=INFO,console
flume-ng agent   --name a1  --conf conf --conf-file /opt/flume/job/qiye/group1/inter/kafka-flume.conf   -Dflume.root.logger=INFO,console


#kafka
bin/kafka-console-producer.sh -broker-list h1:9092 --topic topic_test

6.Flume 企業開發案例與架構設計

案列一:多路複用

案例二:負載均衡和故障轉移

案例三:自定義 Interceptor

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

cs04 CSS Measurement Units

2.項目需求解析

spark(二)--spark-core---RDD進階知識(圖文詳解,基於IDEA開發)

電商平臺分析平臺----需求六:實時統計之黑名單機制

電商平臺分析平臺----需求七,九前置知識

需求一:各個範圍Session步長、訪問時長佔比統計

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結