Flume之使用Loadbalancing Sink Processor實現sink負載均衡


前言

  • Load balancing Sink Processor,顧名思義,即能夠對Sink組中的每個Sink實現負載均衡,默認採用的是輪詢round_robin的方式,還可以使用隨機方式random,或者用戶自己實現AbstractSinkSelector抽象類定義自己的Sink Selector類,並提供FQCN(Full Qualified Class Name)全類名來進行配置,並且Load balancing Sink Processor還提供了指數退避backoff,即當某個Sink掛掉時,將會將其加入到黑名單,一定時間內不再訪問此Sink,退避時間呈指數增長並默認最大值爲30000ms,可以手動設置

使用示例

1)flume1.properties

# flume1:此配置用於監控某個端口將其追加內容輸出到flume2和flume3中
# 並將兩個Sink組成一個sink group,並將Sink Processor設置成load_balance類型
# a1:Netcat Source-> Memory Channel-> Load balancing Sink Processor-> Avro Sink

# Agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1 k2

# Sink groups
a1.sinkgroups = g1
# 設置sink group中的sinks
a1.sinkgroups.g1.sinks = k1 k2
# 配置Load balancing Sink Processor(只有sink group纔可以使用sink processor)
a1.sinkgroups.g1.processor.type = load_balance
# 設置開啓指數避讓
a1.sinkgroups.g1.processor.backoff = true
# 設置Processor的selector爲輪詢round_robin
a1.sinkgroups.g1.processor.selector = round_robin
# 設置最大避讓時間(ms)
a1.sinkgroups.g1.processor.maxTimeOut = 10000


# Sources
# 配置a1.sources.r1的各項屬性參數,類型/綁定主機ip/端口號
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop101
a1.sources.r1.port = 44444

# Channels
# 配置a1.channerls.c1的各項屬性參數,緩存方式/最多緩存的Event個數/單次傳輸的Event個數
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Sinks
# sinks.k1
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141
# sinks.k2
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop103
a1.sinks.k2.port = 4141

# Bind
# 注意:source可以綁定多個channel,但是sink/sink group只能綁定單個channel
# r1->c1->g1
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

2)flume2.properties

# flume2:此配置用於將來自指定Avro端口的數據輸出到控制檯
# a2:Avro Source->Memory Channel->Logger Sink

# Agent
a2.sources = r1
a2.channels = c1
a2.sinks = k1

# Sources
# a2.sources.r1
a2.sources.r1.type = avro
# 設置監聽本地IP
a2.sources.r1.bind = 0.0.0.0
# 設置監聽端口號
a2.sources.r1.port = 4141

# Channels
# a2.channels.c1
# 使用內存作爲緩存/最多緩存的Event個數/單次傳輸的Event個數
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Sinks
# 運行時設置參數 -Dflume.root.logger=INFO,console 即輸出到控制檯實時顯示
a2.sinks.k1.type = logger
# 設置Event的Body中寫入log的最大字節數(默認值爲16)
a2.sinks.k1.maxBytesToLog = 256

# Bind
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

3)flume3.properties

# flume3:此配置用於將來自指定Avro端口的數據輸出到控制檯
# a3:Avro Source->Memory Channel->Logger Sink

# Agent
a3.sources = r1
a3.channels = c1
a3.sinks = k1

# Sources
# a3.sources.r1
a3.sources.r1.type = avro
# 設置監聽本地IP
a3.sources.r1.bind = 0.0.0.0
# 設置監聽端口號
a3.sources.r1.port = 4141

# Channels
# a3.channels.c1
# 使用內存作爲緩存/最多緩存的Event個數/單次傳輸的Event個數
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100

# Sinks
# 運行時設置參數 -Dflume.root.logger=INFO,console 即輸出到控制檯實時顯示
a3.sinks.k1.type = logger
# 設置Event的Body中寫入log的最大字節數(默認值爲16)
a3.sinks.k1.maxBytesToLog = 256

# Bind
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

4)啓動命令

Flume Agent a1至a3分別運行在主機hadoop101、hadoop102、hadoop103上

./bin/flume-ng agent -n a1 -c conf -f flume1.properties
./bin/flume-ng agent -n a2 -c conf -f flume2.properties -Dflume.root.logger=INFO,console
./bin/flume-ng agent -n a3 -c conf -f flume3.properties -Dflume.root.logger=INFO,console

5)實現功能

agent a1將指定端口的監聽數據採用輪詢的方式傳輸給a2和a3,並分別輸出到各自的控制檯


End~

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章