Flume自定義Source、Sink、Interceptor

一 自定義Source

現在我們自己定義一個Source,循環的往Channel發送數據,然後通過Logger Sink把數據控制檯顯示出來

1.1 創建MySource

自定義的時候,繼承AbstractSource,實現ConfigurablePollableSource

public class MySource 
extends AbstractSource implements Configurable, PollableSource {

    private String prefix;

    private String suffix;

    @Override
    public Status process() throws EventDeliveryException {

        Status status = null;

        try {
            for (int i = 0; i < 100; i++) {
                SimpleEvent event = new SimpleEvent();
                event.setBody((prefix + i + suffix).getBytes());
                //把數據發送到Channel
                getChannelProcessor().processEvent(event);
            }
            status = Status.READY;
        } catch (Throwable t) {
            status = Status.BACKOFF;
        }

        //發完100個信息之後,暫停5秒後再發送
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return status;
    }

    /**
     * 這裏是從配置文件讀取數據
     *
     * @param context
     */
    @Override
    public void configure(Context context) {
        //從配置文件讀取的數據,默認爲hadoop-
        this.prefix = context.getString("prefix", "hadoop-");
        //從配置文件讀取的數據
        this.suffix = context.getString("suffix");
    }
}

1.2 打包上傳

通過maven把寫好的類打包成jar包

mvn clean package 

然後上傳到${FLUME_HOME}/lib

1.3 配置文件

然後我們配置custom-source.conf文件

# 定義agent a1
a1.sources = r1
a1.channels = c1
a1.sinks = k1

# 定義source
a1.sources.r1.type = com.bigdata.flume.MySource //自定義類的className
a1.sources.r1.suffix = -ruoze //代碼configure後綴屬性

# 定義channel
a1.channels.c1.type = memory

# 定義sink
a1.sinks.k1.type = logger

# 綁定source和sink到channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

1.4 啓動agent

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/script/flume/custom-source/custom-source.conf \
-Dflume.root.logger=INFO,console

–name:agent的名字
–conf:flume的conf目錄
–conf-file: 1.3步驟的配置文件
-Dflume.root.logger:控制檯顯示日誌

1.5 結果展示

在這裏插入圖片描述

二 自定義Sink

2.1 創建MySink

我們現在要自定義一個Sink,從channel獲取數據,然後加上前後綴,打印出來
自定義MySink,繼承AbstractSink,實現Configurable

public class MySink 
extends AbstractSink implements Configurable {

    private static final org.apache.log4j.Logger logger = org.apache.log4j.Logger.getLogger(MySink.class);

    private String prefix;

    private String suffix;

    @Override
    public Status process() throws EventDeliveryException {
        Status status = null;

        Channel channel = getChannel();
        Transaction txn = channel.getTransaction();

        try {
            txn.begin();
            //從channel獲取event
            Event event = channel.take();
            //如果event不爲null,則打印出信息
            if (event != null) {
                byte[] body = event.getBody();
                String str = new String(body);
                logger.info(prefix + str + suffix);
                status = Status.READY;
            } else {
                status = Status.BACKOFF;
            }

            txn.commit();
        } catch (Throwable e) {
            txn.rollback();
            status = Status.BACKOFF;

            if (e instanceof Error) {
                throw (Error) e;
            }
        } finally {
            txn.close();
        }

        return status;
    }
    /**
     * 從配置文件中讀取信息
     *
     * @param context
     */
    @Override
    public void configure(Context context) {
        this.prefix = context.getString("prefix", "RUOZE-");
        this.suffix = context.getString("suffix");
    }
}

2.2 打包上傳

通過maven把寫好的類打包成jar包

mvn clean package 

然後上傳到${FLUME_HOME}/lib

2.3 配置文件

# 定義agent a1
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# 定義source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop001
a1.sources.r1.port = 44444
# 定義channel
a1.channels.c1.type = memory
# 定義sink
a1.sinks.k1.type = com.bigdata.flume.MySink //自定義sink的className
a1.sinks.k1.suffix = hadoop //後綴屬性
# 綁定source和sink到channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2.4 啓動agent

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/script/flume/custom-source/custom-sink.conf \
-Dflume.root.logger=INFO,console

–name:agent的名字
–conf:flume的conf目錄
–conf-file: 1.3步驟的配置文件
-Dflume.root.logger:控制檯顯示日誌

三 自定義Interceptor

在這裏插入圖片描述
現在有個需求,數據內容包含gifshow的發送到一個Channel,其他的數據發送到其他Channel,然後通過avro發送到其他Agent

3.1 創建MyInterceptor

public class MyInterceptor implements Interceptor {

    private List<Event> events;

    @Override
    public void initialize() {
        events = new ArrayList<>();
    }

    @Override
    public Event intercept(Event event) {
        Map<String, String> heardersMap = event.getHeaders();
        String body = new String(event.getBody());
        //header裏添加不同的屬性時,就可以在配置文件中配置選擇器,根據header頭信息選擇不同的Channel
        if (body.contains("gifshow")) {
            heardersMap.put("type", "gifshow");
        } else {
            heardersMap.put("type", "other");
        }
        return event;
    }

    @Override
    public List<Event> intercept(List<Event> events) {
        for (Event event : events) {
            this.events.add(intercept(event));
        }
        return this.events;
    }

    @Override
    public void close() {

    }

    public static class Builder implements Interceptor.Builder {

        @Override
        public Interceptor build() {
            return new MyInterceptor();
        }

        @Override
        public void configure(Context context) {

        }
    }
}

3.2 打包上傳

通過maven把寫好的類打包成jar包

mvn clean package 

然後上傳到${FLUME_HOME}/lib

3.3 配置文件

agent1配置文件

# 定義agent a1
a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2
# 定義source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop001
a1.sources.r1.port = 44444
# 定義channel
a1.channels.c1.type = memory
a1.channels.c2.type = memory

# 定義攔截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = com.bigdata.flume.MyInterceptor$Builder
# 定義selector
a1.sources.r1.selector.type = multiplexing
# header的key
a1.sources.r1.selector.header = type
# 如果type爲gifshow,走c1
a1.sources.r1.selector.mapping.gifshow = c1 
# 如果type爲other,走c2
a1.sources.r1.selector.mapping.other = c2 

# 定義sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop001
a1.sinks.k1.port = 44445

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop001
a1.sinks.k2.port = 44446

# 綁定source和sink到channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

agent2 配置文件

a2.sources = r1
a2.channels = c1
a2.sinks = k1

a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop001
a2.sources.r1.port = 44446

a2.channels.c1.type = memory

a2.sinks.k1.type = logger

a2.sinks.k1.channel = c1
a2.sources.r1.channels = c1

agent3 配置文件

a3.sources = r1
a3.channels = c1
a3.sinks = k1

a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop001
a3.sources.r1.port = 44445

a3.channels.c1.type = memory

a3.sinks.k1.type = logger

a3.sinks.k1.channel = c1
a3.sources.r1.channels = c1

3.4 啓動agent

啓動agent2

flume-ng agent \
--name a2 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/script/flume/custom-interceptor/agent2.conf \
-Dflume.root.logger=INFO,console

啓動agent3

flume-ng agent \
--name a3 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/script/flume/custom-interceptor/agent3.conf \
-Dflume.root.logger=INFO,console

啓動agent1

flume-ng agent \
--name a1 \
--conf ${FLUME_HOME}/conf \
--conf-file /home/hadoop/script/flume/custom-interceptor/agent1.conf \
-Dflume.root.logger=INFO,console
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章