Flume

Flume

原創

wutian713

2020-04-02 13:39

大數據-Flume

環境
Flume-ng 1.9.0
source TAILDIR
channel memory
sink hdfs

一、性能測試

針對以下場景對Flume進行性能測試

場景1：一個channel，採用lzo壓縮
場景1：一個channel，不壓縮
場景1：兩個channel，採用lzo壓縮
場景1：兩個channel，不壓縮

配置文件


# Name the components on this agent
exec-hdfs-agent.sources = r1
exec-hdfs-agent.sinks = s1 s2
exec-hdfs-agent.channels = c1 c2

# Describe/configure the source
exec-hdfs-agent.sources.r1.selector.type = com.sjj.ParityChannelSelector 
exec-hdfs-agent.sources.r1.type = TAILDIR
exec-hdfs-agent.sources.r1.channels = c1 c2
exec-hdfs-agent.sources.r1.positionFile = /tmp/flume/logs/taildir_position.json
exec-hdfs-agent.sources.r1.filegroups = test
exec-hdfs-agent.sources.r1.filegroups.test= /tmp/flume/data/.*.test.log

# Describe the sink1
exec-hdfs-agent.sinks.s1.channel = c1
exec-hdfs-agent.sinks.s1.type = hdfs
exec-hdfs-agent.sinks.s1.hdfs.path = hdfs://namenode/test/%y-%m-%d
exec-hdfs-agent.sinks.s1.hdfs.fileType= DataStream
#exec-hdfs-agent.sinks.s1.hdfs.fileType= CompressedStream
#exec-hdfs-agent.sinks.s1.hdfs.codeC= com.hadoop.compression.lzo.LzopCodec
exec-hdfs-agent.sinks.s1.hdfs.writeFormat= Text
exec-hdfs-agent.sinks.s1.hdfs.batchSize= 20000
exec-hdfs-agent.sinks.s1.hdfs.rollSize= 128000000
exec-hdfs-agent.sinks.s1.hdfs.rollCount= 0
exec-hdfs-agent.sinks.s1.hdfs.rollInterval=0
exec-hdfs-agent.sinks.s1.hdfs.minBlockReplicas=1
exec-hdfs-agent.sinks.s1.hdfs.callTimeout=20000
exec-hdfs-agent.sinks.s1.hdfs.useLocalTimeStamp=true
exec-hdfs-agent.sinks.s1.hdfs.fileSuffix=.lzo
exec-hdfs-agent.sinks.s1.hdfs.filePrefix=c1

# Describe the sink2
exec-hdfs-agent.sinks.s2.channel = c2
exec-hdfs-agent.sinks.s2.type = hdfs 
exec-hdfs-agent.sinks.s2.hdfs.path = hdfs://namenode/test/%y-%m-%d
exec-hdfs-agent.sinks.s2.hdfs.fileType= DataStream
exec-hdfs-agent.sinks.s2.hdfs.writeFormat= Text
exec-hdfs-agent.sinks.s2.hdfs.batchSize= 10000
exec-hdfs-agent.sinks.s2.hdfs.rollSize= 128000000
exec-hdfs-agent.sinks.s2.hdfs.rollCount= 0
exec-hdfs-agent.sinks.s2.hdfs.rollInterval=0
exec-hdfs-agent.sinks.s2.hdfs.minBlockReplicas=1
exec-hdfs-agent.sinks.s2.hdfs.callTimeout=20000
exec-hdfs-agent.sinks.s2.hdfs.useLocalTimeStamp=true
exec-hdfs-agent.sinks.s2.hdfs.fileSuffix=.lzo
exec-hdfs-agent.sinks.s2.hdfs.filePrefix=c2

# Use a channel1 which buffers events in memory
exec-hdfs-agent.channels.c1.type = memory
exec-hdfs-agent.channels.c1.capacity=50000
exec-hdfs-agent.channels.c1.transactionCapacity=10000

# Use a channel1 which buffers events in memory
exec-hdfs-agent.channels.c2.type = memory
exec-hdfs-agent.channels.c2.capacity=50000
exec-hdfs-agent.channels.c2.transactionCapacity=10000

二、Flume寫hdfs（開啓kerberos）

（一）增加配置
在Flume配置文件加入以下配置
[email protected]，/usr/local/flume/conf/test.keytab替換成你自己的

[email protected]
exec-hdfs-agent.sinks.s2.hdfs.kerberosKeytab=/usr/local/flume/conf/test.keytab

（二）拷貝文件

把keytab文件拷貝到flume的conf目錄下（創建keytab等操作請參考另一篇博文）。
把hadoop集羣的core-site.xml和hdfs-site.xml拷貝到flume的conf目錄下。
wewe

三、Flume寫hdfs（使用lzo壓縮）

（一）增加配置
fileType= CompressedStream代表使用壓縮

#exec-hdfs-agent.sinks.s2.hdfs.fileType= CompressedStream
#exec-hdfs-agent.sinks.s2.hdfs.codeC= com.hadoop.compression.lzo.LzopCodec

（二）拷貝JAR包

手工編譯lzo和hadoop-lzo的，直接將jar包放在plugins.d下即可。
使用Cloudera安裝hadoop-lzo parcel的，要將jar包和native下鏈接都放在plugins.d下。
爲何這樣可以參考 hadoop-lzo.jar和hadoop-gpl-compression.jar區別：http://guoyunsky.iteye.com/blog/1289475

（三）拷貝配置文件
從hadoop集羣上拉取core-site.xml放在flume/conf下，其實主要使用

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>

四、HA
由於HDFS集羣的HA機制，當hdfs集羣的namenode狀態發生變化時，flume上報時會報出Exception, Operation category READ(WRITE) is not supported in state standby，因爲standby namenode是不對提供服務的。那麼此時flume就處於不可用狀態，必須手工修改配置文件中sink的hdfs.path值的namenode，然後重啓flume才能解決。當要收集日誌的服務器很多時，會增加很多人力成本；另外，日誌上報狀態監控沒有做好的話，也許用到這個日誌的時候纔會發現flume出現問題。

解決這個問題也比較簡單，就是將集羣中hdfs-site.xml複製一份到flume的conf目錄下即可，當namenode狀態切換時，flume也能正確將日誌上報到hdfs中。此時，hdfs.path配置也可以省略域名。

exec-hdfs-agent.sinks.s1.hdfs.path = /test/%y-%m-%d

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Flume

Nginx (一) 基礎入門

FastDfs (二) Nginx 整合 Fastdfs

Mybatis DAO層參數傳遞

Nginx（二）實踐中遇到問題

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結