文章目錄

練習題

練習題

依據flume步驟的原理【見上篇博客】，輕鬆搞定以下flume練習題，點擊此文字即可轉接至上篇博客

練習1

需求：使用Flume監聽一個端口，收集該端口數據，並打印到控制檯。

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#步驟二：source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost 
a1.sources.r1.port = 44444  
#步驟三： channel selector
a1.sources.r1.selector.type = replicating
#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
#步驟五： sinkprocessor，默認配置defaultsinkprocessor
#步驟六： sink
# Describe the sink
a1.sinks.k1.type = logger
#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

練習2

需求：實時監控Hive日誌，並上傳到HDFS中

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#步驟二：source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive/logs/hive.log 
a1.sources.r1.shell = /bin/bash -c

#步驟三： channel selector
a1.sources.r1.selector.type = replicating
#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
#步驟五： sinkprocessor，默認配置defaultsinkprocessor
#步驟六： sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop102:9820/flume/upload2/%Y%m%d/%H   
#上傳文件的前綴
a1.sinks.k1.hdfs.filePrefix = upload-
#是否按照時間滾動文件夾
a1.sinks.k1.hdfs.round = true
#多少時間單位創建一個新的文件夾
a1.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位
a1.sinks.k1.hdfs.roundUnit = hour
#是否使用本地時間戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a1.sinks.k1.hdfs.batchSize = 100
#設置文件類型，可支持壓縮
a1.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的文件
a1.sinks.k1.hdfs.rollInterval = 60 
#設置每個文件的滾動大小大概是128M
a1.sinks.k1.hdfs.rollSize = 134217700
#文件的滾動與Event數量無關
a1.sinks.k1.hdfs.rollCount = 0
#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

練習3

需求：使用Flume監聽整個目錄的文件，並上傳至HDFS

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#步驟二：source
# Describe/configure the source
a1.sources.r1.type = spooldir 
a1.sources.r1.spoolDir = /opt/module/flume/upload 
a1.sources.r1.fileSuffix = .COMPLETED 
a1.sources.r1.fileHeader = true 
#忽略所有以.tmp結尾的文件，不上傳
a1.sources.r1.ignorePattern = ([^ ]*\.tmp)

#步驟三： channel selector
a1.sources.r1.selector.type = replicating
#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
#步驟五： sinkprocessor，默認配置defaultsinkprocessor
#步驟六： sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop102:9820/flume/upload/%Y%m%d/%H   
#上傳文件的前綴
a1.sinks.k1.hdfs.filePrefix = upload-
#是否按照時間滾動文件夾
a1.sinks.k1.hdfs.round = true
#多少時間單位創建一個新的文件夾
a1.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位
a1.sinks.k1.hdfs.roundUnit = hour
#是否使用本地時間戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a1.sinks.k1.hdfs.batchSize = 100
#設置文件類型，可支持壓縮
a1.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的文件
a1.sinks.k1.hdfs.rollInterval = 60 
#設置每個文件的滾動大小大概是128M
a1.sinks.k1.hdfs.rollSize = 134217700
#文件的滾動與Event數量無關
a1.sinks.k1.hdfs.rollCount = 0
#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

練習4

需求：使用Flume監聽整個目錄的實時追加文件，並上傳至HDFS

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#步驟二：source
# Describe/configure the source
a1.sources.r1.type = TAILDIR 
a1.sources.r1.positionFile = /opt/module/flume/tail_dir.json -- 指定position_file 的位置(記錄每次上傳後的偏移量，實現斷點續傳的關鍵)
a1.sources.r1.filegroups = f1 f2 -- 監控的文件目錄集合
a1.sources.r1.filegroups.f1 = /opt/module/flume/files/.*file.* -- 定義監控的文件目錄1
a1.sources.r1.filegroups.f2 = /opt/module/flume/files/.*log.* -- 定義監控的文件目錄2

#步驟三： channel selector
a1.sources.r1.selector.type = replicating
#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 
#步驟五： sinkprocessor，默認配置defaultsinkprocessor
#步驟六： sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop102:9820/flume/upload3/%Y%m%d/%H   
#上傳文件的前綴
a1.sinks.k1.hdfs.filePrefix = upload-
#是否按照時間滾動文件夾
a1.sinks.k1.hdfs.round = true
#多少時間單位創建一個新的文件夾
a1.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位
a1.sinks.k1.hdfs.roundUnit = hour
#是否使用本地時間戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a1.sinks.k1.hdfs.batchSize = 100
#設置文件類型，可支持壓縮
a1.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的文件
a1.sinks.k1.hdfs.rollInterval = 60 
#設置每個文件的滾動大小大概是128M
a1.sinks.k1.hdfs.rollSize = 134217700
#文件的滾動與Event數量無關
a1.sinks.k1.hdfs.rollCount = 0
#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

練習5

需求：使用Flume-1監控文件變動，Flume-1將變動內容傳遞給Flume-2，Flume-2負責存儲到HDFS。同時Flume-1將變動內容傳遞給Flume-3，Flume-3負責輸出到Local FileSystem。

flume1：

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
#步驟二：source
# Describe/configure the source
a1.sources.r1.type = TAILDIR 
a1.sources.r1.positionFile = /opt/module/flume/tail_dir.json 
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /opt/module/flume/files/.*log.* 
#步驟三： channel selector
a1.sources.r1.selector.type = replicating
#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000 
a1.channels.c2.transactionCapacity = 100 
#步驟五： sinkprocessor，默認配置defaultsinkprocessor
#步驟六： sink
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop103 
a1.sinks.k1.port = 6666  

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop104 
a1.sinks.k2.port = 8888  

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

flume2:

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#步驟二：source
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop103 
a1.sources.r1.port = 6666 
#步驟三： channel selector
a1.sources.r1.selector.type = replicating
#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

#步驟五： sinkprocessor，默認配置defaultsinkprocessor
#步驟六： sink
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop102:9820/flume/upload4/%Y%m%d/%H   
#上傳文件的前綴
a1.sinks.k1.hdfs.filePrefix = upload-
#是否按照時間滾動文件夾
a1.sinks.k1.hdfs.round = true
#多少時間單位創建一個新的文件夾
a1.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位
a1.sinks.k1.hdfs.roundUnit = hour
#是否使用本地時間戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a1.sinks.k1.hdfs.batchSize = 100
#設置文件類型，可支持壓縮
a1.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的文件
a1.sinks.k1.hdfs.rollInterval = 60  
#設置每個文件的滾動大小大概是128M
a1.sinks.k1.hdfs.rollSize = 134217700
#文件的滾動與Event數量無關
a1.sinks.k1.hdfs.rollCount = 0

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume3:

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#步驟二：source
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop104 
a1.sources.r1.port = 8888

#步驟三： channel selector
a1.sources.r1.selector.type = replicating

#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

#步驟五： sinkprocessor，默認配置defaultsinkprocessor

#步驟六： sink
# Describe the sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /opt/module/flume/datas/flume3

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

練習6

需求：使用Flume1監控一個端口，其sink組中的sink分別對接Flume2和Flume3，採用FailoverSinkProcessor，實現故障轉移的功能

flume1

#步驟一：agent Name
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2

#步驟二：source
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost 
a1.sources.r1.port = 44444 

#步驟三： channel selector
a1.sources.r1.selector.type = replicating

#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

#步驟五： sinkprocessor
a1.sinkgroups.g1.processor.type = failover 
a1.sinkgroups.g1.processor.priority.k1 = 10 
a1.sinkgroups.g1.processor.priority.k2 = 5 
a1.sinkgroups.g1.processor.maxpenalty = 10000 

#步驟六： sink
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop103
a1.sinks.k1.port = 1111  

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop104
a1.sinks.k2.port = 2222  

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

flume2

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#步驟二：source
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop103
a1.sources.r1.port = 1111 

#步驟三： channel selector
a1.sources.r1.selector.type = replicating

#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

#步驟五： sinkprocessor，默認配置defaultsinkprocessor

#步驟六： sink
# Describe the sink
a1.sinks.k1.type = logger 

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume3

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#步驟二：source
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop104
a1.sources.r1.port = 2222

#步驟三： channel selector
a1.sources.r1.selector.type = replicating

#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

#步驟五： sinkprocessor，默認配置defaultsinkprocessor

#步驟六： sink
# Describe the sink
a1.sinks.k1.type = logger 

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

練習7

需求：hadoop102上的Flume-1監控文件/opt/module/group.log，

hadoop103上的Flume-2監控某一個端口的數據流，

Flume-1與Flume-2將數據發送給hadoop104上的Flume-3，Flume-3將最終數據打印到控制檯。

flume1

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#步驟二：source
a1.sources.r1.type = TAILDIR 
a1.sources.r1.positionFile = /opt/module/flume/tail_dir.json 
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /opt/module/flume/files/.*log.* 

#步驟三： channel selector
a1.sources.r1.selector.type = replicating

#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

#步驟五： sinkprocessor，默認配置defaultsinkprocessor

#步驟六： sink
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop104
a1.sinks.k1.port = 4141  

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume2

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#步驟二：source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost 
a1.sources.r1.port = 3333

#步驟三： channel selector
a1.sources.r1.selector.type = replicating

#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

#步驟五： sinkprocessor，默認配置defaultsinkprocessor

#步驟六： sink
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop104
a1.sinks.k1.port = 4141  

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume3

#步驟一：agent Name
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#步驟二：source
a1.sources.r1.type = avro
a1.sources.r1.bind = hadoop104
a1.sources.r1.port = 4141 

#步驟三： channel selector
a1.sources.r1.selector.type = replicating

#步驟四： channel
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100 

#步驟五： sinkprocessor，默認配置defaultsinkprocessor

#步驟六： sink
# Describe the sink
a1.sinks.k1.type = logger 

#步驟七：連接source、channel、sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Flume練習題

文章目錄

練習題

練習1

練習2

練習3

練習4

練習5

練習6

練習7

《Python進階》學習筆記

一個docker容器暴露多個端口

leetcode 60 排列序列

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

待更了很久，Scala總結終結篇終於出來了

史上最全的圖文並茂Spark知識點一：Spark環境的安裝

超級硬核的Scala總結下篇1之集合

Scala項目練習，圖文並茂全面分析：Scala分佈式計算

超級硬核的Scala總結中篇之面向對象編程

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結