Kafka | Flume Sink日誌到Kafka&HDFS

記錄下將服務端AC設備產生的數據採集到Flume中,然後基於Flume Sink 把數據日誌同時寫入到Kafka與HDFS中,對於Kafka中的數據保存到指定的Topic中,然後後續基於Spark Streaming採用Direct的方式,將數據從消息隊列中,取出並做處理。

Flume採集日誌文件,同時Sink寫入到Kafka與HDFS。: agent = ac_online_user,如下:

ac_online_user.sources = ac_source
ac_online_user.channels = ac_channel_kafka ac_channel_hdfs
ac_online_user.sinks = ac_sink_kafka ac_sink_hdfs

ac_online_user.sources.ac_source.type = TAILDIR
ac_online_user.sources.ac_source.channels = ac_channel_kafka
ac_online_user.sources.ac_source.positionFile = /var/log/flume/position/accessaconlineuser.log
ac_online_user.sources.ac_source.recursiveDirectorySearch = true
ac_online_user.sources.ac_source.fileHeader = true
ac_online_user.sources.ac_source.fileHeaderKey = fileName
ac_online_user.sources.ac_source.filegroups = group_ac_online_user
ac_online_user.sources.ac_source.filegroups.group_ac_online_user = /var/log/accessaconlineuser.log
ac_online_user.sources.ac_source.deserializer.maxLineLength = 20480000

ac_online_user.channels.ac_channel_kafka.type = memory
ac_online_user.channels.ac_channel_kafka.capacity = 30000
ac_online_user.channels.ac_channel_kafka.transactionCapacity = 10000
ac_online_user.channels.ac_channel_kafka.useDualCheckpoints = true
ac_online_user.channels.ac_channel_kafka.checkpointDir = /data4/flume/agent/kafka/ac_online_user/checkpoint
ac_online_user.channels.ac_channel_kafka.dataDir = /data4/flume/agent/kafka/ac_online_user/datadir/
ac_online_user.channels.ac_channel_kafka.backupCheckpointDir = /data4/flume/agent/kafka/ac_online_user/backup/
ac_online_user.channels.ac_channel_kafka.checkpointInterval = 600000
ac_online_user.channels.ac_channel_kafka.keep-alive = 600

#ac_online_user.sinks.ac_sink_kafka.channel = ac_channel_kafka
ac_online_user.sinks.ac_sink_kafka.type = org.apache.flume.sink.kafka.KafkaSink
ac_online_user.sinks.ac_sink_kafka.kafka.bootstrap.servers = 10.10.10.1:9092,10.10.10.2:9092,10.10.10.3:9092
ac_online_user.sinks.ac_sink_kafka.kafka.topic = ac_online_user
ac_online_user.sinks.ac_sink_kafka.kafka.batchSize = 20
ac_online_user.sinks.ac_sink_kafka.kafka.producer.requiredAcks=1

ac_online_user.channels.ac_channel_hdfs.type = memory
ac_online_user.channels.ac_channel_hdfs.capacity = 30000
ac_online_user.channels.ac_channel_hdfs.transactionCapacity = 10000
ac_online_user.channels.ac_channel_hdfs.useDualCheckpoints = true
ac_online_user.channels.ac_channel_hdfs.checkpointDir = /data4/flume/agent/hdfs/ac_online_user/checkpoint
ac_online_user.channels.ac_channel_hdfs.dataDir = /data4/flume/agent/hdfs/ac_online_user/datadir/
ac_online_user.channels.ac_channel_hdfs.backupCheckpointDir = /data4/flume/agent/hdfs/ac_online_user/backup/
ac_online_user.channels.ac_channel_hdfs.checkpointInterval = 600000
ac_online_user.channels.ac_channel_hdfs.keep-alive = 600

#ac_online_user.sinks.ac_sink_hdfs.channel = ac_channel_hdfs
ac_online_user.sinks.ac_sink_hdfs.type = hdfs
ac_online_user.sinks.ac_sink_hdfs.hdfs.path = hdfs://hadoop-master:9000/datalog/ac_online_user/%Y/%m/%d/%H/%M
ac_online_user.sinks.ac_sink_hdfs.hdfs.filePrefix = ac.online.10.254.32.203-
ac_online_user.sinks.ac_sink_hdfs.hdfs.fileType = DataStream
ac_online_user.sinks.ac_sink_hdfs.hdfs.useLocalTimeStamp = true
ac_online_user.sinks.ac_sink_hdfs.hdfs.callTimeout = 1000000
ac_online_user.sinks.ac_sink_hdfs.hdfs.batchSize = 1000
ac_online_user.sinks.ac_sink_hdfs.hdfs.closeTries = 0
ac_online_user.sinks.ac_sink_hdfs.hdfs.round = true
ac_online_user.sinks.ac_sink_hdfs.hdfs.rollCount = 0
ac_online_user.sinks.ac_sink_hdfs.hdfs.rollSize = 134217728
ac_online_user.sinks.ac_sink_hdfs.hdfs.rollInterval = 10
ac_online_user.sinks.ac_sink_hdfs.hdfs.idleTimeout = 300

ac_online_user.sources.ac_source.channels = ac_channel_kafka ac_channel_hdfs
ac_online_user.sinks.ac_sink_hdfs.channel = ac_channel_hdfs
ac_online_user.sinks.ac_sink_kafka.channel = ac_channel_kafka

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章