測試 flume 案例

前臺打印測試（單節點flume測試）

# 定義這個 agent 中各個組件的名字
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 描述和配置 source 組件：r1
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# 描述和配置 sink 組件：k1
a1.sinks.k1.type = logger
# 描述和配置 channel 組件，此處使用是內存緩存的方式
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 描述和配置 source channel sink 之間的連接關係
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

利用 flume 檢查端口號 44444，啓動telnet發送數據在前段支架展示。

啓動 flume 命令：

bin/flume-ng agent -c conf -f agentconf/netcat-logger.properties -n a1 -Dflume.root.logger=INFO,console

安裝 telnet ：yum install -y telnet

啓動 telnet ：telnet localhost 44444

採文件錄到 HDFS

#定義三大組件的名稱
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1
# 配置 source 組件
agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /home/hadoop/logs/
agent1.sources.source1.fileHeader = false
#配置攔截器
agent1.sources.source1.interceptors = i1
agent1.sources.source1.interceptors.i1.type = host
agent1.sources.source1.interceptors.i1.hostHeader = hostname
# 配置 sink 組件
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path=/test/flume_log/%y-%m-%d/%H-%M
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
agent1.sinks.sink1.hdfs.batchSize= 100
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat =Text
agent1.sinks.sink1.hdfs.rollSize = 102400
agent1.sinks.sink1.hdfs.rollCount = 1000000
agent1.sinks.sink1.hdfs.rollInterval = 60
#agent1.sinks.sink1.hdfs.round = true
#agent1.sinks.sink1.hdfs.roundValue = 10
#agent1.sinks.sink1.hdfs.roundUnit = minute
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive = 120
agent1.channels.channel1.capacity = 500000
agent1.channels.channel1.transactionCapacity = 600
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

啓動 flume ：

測試：
1、如果 HDFS 集羣是高可用集羣，那麼必須要放入 core-site.xml 和 hdfs-site.xml 文件到$FLUME_HOME/conf 目錄中
2、查看監控的/home/Hadoop/logs 文件夾中的文件是否被正確上傳到 HDFS 上 3、在該目錄中創建文件，或者從其他目錄往該目錄加入文件，驗證是否新增的文件能被自動的上傳到 HDFS

bin/flume-ng agent -c conf -f agentconf/spooldir-hdfs.properties -n agent1

採集數據到HDFS

採集需求：比如業務系統使用 log4j 生成的日誌，日誌內容不斷增加，需要把追加到日誌文件中的數據實時採集到 HDFS

agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1
# Describe/configure tail -F source1
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -F /home/hadoop/logs/catalina.out
agent1.sources.source1.channels = channel1
#configure host for source
agent1.sources.source1.interceptors = i1
agent1.sources.source1.interceptors.i1.type = host
agent1.sources.source1.interceptors.i1.hostHeader = hostname
# Describe sink1
agent1.sinks.sink1.type = hdfs
#a1.sinks.k1.channel = c1
agent1.sinks.sink1.hdfs.path =hdfs://myha01/weblog/flume-event/%y-%m-%d/%H-%M
agent1.sinks.sink1.hdfs.filePrefix = tomcat_
agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
agent1.sinks.sink1.hdfs.batchSize= 100
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat =Text
agent1.sinks.sink1.hdfs.rollSize = 102400
agent1.sinks.sink1.hdfs.rollCount = 1000000
agent1.sinks.sink1.hdfs.rollInterval = 60
agent1.sinks.sink1.hdfs.round = true
agent1.sinks.sink1.hdfs.roundValue = 10
agent1.sinks.sink1.hdfs.roundUnit = minute
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive = 120
agent1.channels.channel1.capacity = 500000
agent1.channels.channel1.transactionCapacity = 600
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

啓動 flume ：

bin/flume-ng agent -c conf -f agentconf/tail-hdfs.properties -n agent1

採集數據到kafka

flume監控文件追加數據，採集到kafka中

kafka相關操作

kafka 各種shell操作：

1.每個節點啓動 kafka
	nohup kafka-server-start.sh /home/hadoop/kafka_2.12-2.2.2/config/server.properties >/home/hadoop/logs/kafka_logs/out.log 2>&1 &
2.創建 topic
	kafka-topics.sh --create --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --replication-factor 3 --partitions 10 --topic kafka_test
3.查看已經創建的所有kafka topic
	kafka-topics.sh --list --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181
4.查看某個指定的kafka topic詳細信息
	kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --describe --topic kafka_test
5.開啓生產者模擬生成數據
	kafka-console-producer.sh --broker-list hadoop01:9092,hadoop02:9092,hadoop03:9092 --topic kafka_test
6.開啓消費者模擬消費數據
	kafka-console-consumer.sh --bootstrap-server hadoop01:9092,hadoop02:9092,hadoop03:9092 --from-beginning --topic kafka_test
7.查看某topic某分區的偏移量最大和最小值
	kafka-run-class.sh kafka.tools.GetOffsetShell --topic kafka_test --time -1 --broker-list hadoop01:9092,hadoop02:9092,hadoop03:9092 --partitions 1
8.增加topic分區數
	kafka-topic.sh --alter --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --topic kafka_test --partitions 20

	kafka-topic.sh --alter --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --topic kafka_test --replication-factor 2
9.刪除topic
	kafka-topics.sh --delete --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --topic kafka_test

flume文件 exec-kafka.conf

agent1.sources = r1
agent1.channels = c1
agent1.sinks = k1

#define sources
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /home/hadoop/logs/flume.log

#define channels
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100

#define sink
agent1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.k1.brokerList = hadoop01:9092,hadoop02:9092,hadoop03:9092
agent1.sinks.k1.topic = flume-kafka
agent1.sinks.k1.batchSize = 4
agent1.sinks.k1.requiredAcks = 1

#bind sources and sink to channel 
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

啓動 flume 命令

/home/hadoop/flume-1.8.0/bin/flume-ng agent --conf /home/hadoop/flume-1.8.0/conf/ --name agent1 --conf-file /home/hadoop/flume-1.8.0/agentconf/exec-kafka.conf -Dflume.root.logger=DEBUG,console

flume 學習筆記

測試 flume 案例

前臺打印測試（單節點flume測試）

採文件錄到 HDFS

採集數據到HDFS

採集數據到kafka

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

WaterDrop On Spark（v1.x 版本只支持spark）

SparkStreaming + kafka 的 offset 保存在 Zookeeper、MySQL、HBase、Redis，kafka 中

DataX 使用筆記

WaterDrop on spark/flink(v2.x 支持spark/flink)

sparkStream 學習代碼

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結