hadoop系列: 日誌收集工具flume

  1. 多路複用: 一個  source                     ----->多個 (channel----sink)
  2.  故障切換:  一個(source ---channel) ---->一個sink組(多個sink )
  3. 常用sink配置: hdfs,hive,hbase

多路複用: ( Multiplexing the flow )

官方解釋爲:一個source的數據,可以有兩種發送方式<複製、複用>,發送給多個channel   (This fan out can be replicating or multiplexing. In case of replicating flow, each event is sent to all three channels. For the multiplexing case, an event is delivered to a subset of available channels when an event’s attribute matches a preconfigured value 

使用案例1:(單機模擬--多路複用-replicating: source= syslogtcp,    channel=memory,     sink=logger, hdfs )

 

配置如下( mutiplex.conf )

#agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# source1--(重複性)複製
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 8888
a1.sources.r1.host = master1
a1.sources.r1.selector.type = replicating
a1.sources.r1.channels = c1 c2

# source2--(選擇性)複用
#a1.sources.r1.type = http
#a1.sources.r1.bind = 0.0.0.0
#a1.sources.r1.port = 8888
#a1.sources.r1.selector.type= multiplexing
#a1.sources.r1.selector.header= heads
#a1.sources.r1.selector.mapping.hdfshead= c1
#a1.sources.r1.selector.mapping.loghead= c2
#a1.sources.r1.selector.default= c1


# sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.path = /flume/test-mutiplex-dfs
a1.sinks.k1.hdfs.filePrefix = events-test2

a1.sinks.k2.type=logger

# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

#相互關聯 source--channel, sink--channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

窗口1---啓動flume:    flume-ng agent -f   ./mutiplex.conf  -n a1 

窗口2---使用nc連接8888端口,輸入數據:  1234

( 窗口1---查看: 接收的數據如下)

   (窗口3:---在hive中,查看hdfs數據如下)

                       

使用案例1:(單機模擬--多路複用-multiplexing: source= syslogtcp,    channel=memory,     sink=logger, hdfs ) 

配置文件: 註釋上面的mutiplex.conf  的 ‘source1’, 解開'source2'

窗口1---啓動flume:  flume-ng agent -f muti-mutiplex.conf -n a1

窗口2---發送數據:

curl -X POST -d '[{"headers" :{"heads" : "loghead"},"body" :"111"}]' http://master1:8888

curl -X POST -d '[{"headers" :{"heads" : "hdfshead"},"body" :"222"}]' http://master1:8888

(窗口1---查看接收的數據 如下)

  (窗口3:---在hive中,查看hdfs數據如下)


 故障切換

sink組管理多個sink,  sink processor 可以爲組裏面的sink成員提供 <均衡負載, 容錯>   ( Sink groups allow users to group multiple sinks into one entity. Sink processors can be used to provide load balancing capabilities over all sinks inside the group or to achieve fail over from one sink to another in case of temporal failure)

sink processor 默認爲default, 只支持單個sink, 不提供 均衡負載和容錯的功能 ( Default sink processor accepts only a single sink)

準備:          3臺機器(生產者, 消費者1,消費者2),   創建flume 配置文件

配置文件1:生產者 ( 把source的數據轉移到 消費者1, 消費者2  )

# agent
a1.sources = r1
a1.sinks= k1 k2
a1.channels = c1

#  source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# sink
a1.sinkgroups= g1
a1.sinkgroups.g1.sinks=k1 k2

#sink組:容錯配置
a1.sinkgroups.g1.processor.type=failover

  #sink組:負載均衡配置
  #a1.sinkgroups.g1.processor.type = load_balance
  #a1.sinkgroups.g1.processor.backoff = true
  #a1.sinkgroups.g1.processor.selector = round_robin [默認]
  #a1.sinkgroups.g1.processor.selector = random
a1.sinkgroups.g1.processor.priority.k1=5
a1.sinkgroups.g1.processor.priority.k2=10

a1.sinks.k1.type= avro
a1.sinks.k1.channel= c1
a1.sinks.k1.hostname= slave1
a1.sinks.k1.port= 50000

a1.sinks.k2.type= avro
a1.sinks.k2.channel= c1
a1.sinks.k2.hostname= slave2
a1.sinks.k2.port= 50000


# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100
a1.channels.c1.transactionCapacity = 100

# bind: source-channel, sink-channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

配置文件2:消費者1 ( 啓動avro 服務, 接收source數據  )--->消費者2 配置與此相同

# agent 
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#  source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 50000

# sink
a1.sinks.k1.type = logger

# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# bind: source-channel, sink-channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

測試:驗證failover ---->先啓動 兩個消費者, 再啓動生產者

窗口1:(生產者) nc localhost 8888  【輸入數據 123  aaa bbb】

窗口2:(slave2)  查看接收的數據

關閉slave2的flume服務,繼續在窗口1輸入數據 【xxxxxx   xxx2222】

窗口3:(slave1) 可以查看到接收的數據

 再次開啓slave2的flume服務,繼續在窗口1輸入數據 【mmmm+++++】

窗口2:(slave2) 可以看到數據又被轉移到slave2,  slave1不再接收數據


常用sink配置: hdfs,hive,hbase 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章