hadoop系列：日誌收集工具flume

多路複用：一個 source ----->多個 (channel----sink)
故障切換: 一個（source ---channel） ---->一個sink組(多個sink )
常用sink配置： hdfs,hive,hbase

多路複用：（ Multiplexing the flow ）

官方解釋爲：一個source的數據，可以有兩種發送方式<複製、複用>，發送給多個channel (This fan out can be replicating or multiplexing. In case of replicating flow, each event is sent to all three channels. For the multiplexing case, an event is delivered to a subset of available channels when an event’s attribute matches a preconfigured value

使用案例1：（單機模擬--多路複用-replicating： source= syslogtcp, channel=memory, sink=logger, hdfs )

配置如下( mutiplex.conf )

#agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2

# source1--（重複性）複製
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 8888
a1.sources.r1.host = master1
a1.sources.r1.selector.type = replicating
a1.sources.r1.channels = c1 c2

# source2--(選擇性)複用
#a1.sources.r1.type = http
#a1.sources.r1.bind = 0.0.0.0
#a1.sources.r1.port = 8888
#a1.sources.r1.selector.type= multiplexing
#a1.sources.r1.selector.header= heads
#a1.sources.r1.selector.mapping.hdfshead= c1
#a1.sources.r1.selector.mapping.loghead= c2
#a1.sources.r1.selector.default= c1


# sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.path = /flume/test-mutiplex-dfs
a1.sinks.k1.hdfs.filePrefix = events-test2

a1.sinks.k2.type=logger

# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

#相互關聯 source--channel， sink--channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

窗口1---啓動flume: flume-ng agent -f ./mutiplex.conf -n a1

窗口2---使用nc連接8888端口，輸入數據： 1234

（窗口1---查看：接收的數據如下）

(窗口3:---在hive中，查看hdfs數據如下）

使用案例1：（單機模擬--多路複用-multiplexing： source= syslogtcp, channel=memory, sink=logger, hdfs )

配置文件：註釋上面的mutiplex.conf 的 ‘source1’，解開'source2'

窗口1---啓動flume: flume-ng agent -f muti-mutiplex.conf -n a1

窗口2---發送數據：

curl -X POST -d '[{"headers" :{"heads" : "loghead"},"body" :"111"}]' http://master1:8888

curl -X POST -d '[{"headers" :{"heads" : "hdfshead"},"body" :"222"}]' http://master1:8888

（窗口1---查看接收的數據如下）

(窗口3:---在hive中，查看hdfs數據如下）

故障切換

sink組管理多個sink, sink processor 可以爲組裏面的sink成員提供 <均衡負載，容錯> （ Sink groups allow users to group multiple sinks into one entity. Sink processors can be used to provide load balancing capabilities over all sinks inside the group or to achieve fail over from one sink to another in case of temporal failure)

sink processor 默認爲default, 只支持單個sink, 不提供均衡負載和容錯的功能 ( Default sink processor accepts only a single sink)

準備: 3臺機器(生產者，消費者1，消費者2), 創建flume 配置文件

配置文件1：生產者 ( 把source的數據轉移到消費者1，消費者2 ）

# agent
a1.sources = r1
a1.sinks= k1 k2
a1.channels = c1

#  source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# sink
a1.sinkgroups= g1
a1.sinkgroups.g1.sinks=k1 k2

#sink組：容錯配置
a1.sinkgroups.g1.processor.type=failover

  #sink組：負載均衡配置
  #a1.sinkgroups.g1.processor.type = load_balance
  #a1.sinkgroups.g1.processor.backoff = true
  #a1.sinkgroups.g1.processor.selector = round_robin [默認]
  #a1.sinkgroups.g1.processor.selector = random
a1.sinkgroups.g1.processor.priority.k1=5
a1.sinkgroups.g1.processor.priority.k2=10

a1.sinks.k1.type= avro
a1.sinks.k1.channel= c1
a1.sinks.k1.hostname= slave1
a1.sinks.k1.port= 50000

a1.sinks.k2.type= avro
a1.sinks.k2.channel= c1
a1.sinks.k2.hostname= slave2
a1.sinks.k2.port= 50000


# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100
a1.channels.c1.transactionCapacity = 100

# bind: source-channel, sink-channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

配置文件2：消費者1 ( 啓動avro 服務，接收source數據）--->消費者2 配置與此相同

# agent 
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#  source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 50000

# sink
a1.sinks.k1.type = logger

# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# bind: source-channel, sink-channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

測試：驗證failover ---->先啓動兩個消費者，再啓動生產者

窗口1：（生產者） nc localhost 8888 【輸入數據 123 aaa bbb】

窗口2：（slave2) 查看接收的數據

關閉slave2的flume服務，繼續在窗口1輸入數據【xxxxxx xxx2222】

窗口3：（slave1) 可以查看到接收的數據

再次開啓slave2的flume服務，繼續在窗口1輸入數據【mmmm+++++】

窗口2：（slave2) 可以看到數據又被轉移到slave2， slave1不再接收數據

hadoop系列：日誌收集工具flume

多路複用：（ Multiplexing the flow ）

故障切換

常用sink配置： hdfs,hive,hbase

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

hadoop系列： spark 訪問hive表報錯

redis: 初步使用&集羣搭建

樸素貝葉斯分類：使用案例

推薦算法：基於物品的協同過濾算法

推薦算法：基於用戶的協同過濾算法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

hadoop系列： 日誌收集工具flume

多路複用： （ Multiplexing the flow ）

故障切換

常用sink配置： hdfs,hive,hbase

hadoop系列：日誌收集工具flume

多路複用：（ Multiplexing the flow ）