Apache Flume:1.9.0
1、下載
wget https://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz
2、解壓
tar -zxvf apache-flume-1.9.0-bin.tar.gz
3、數據流
4、創建一張hive 目標表
create table action_log
(id string,
write_date string,
name string)
COMMENT 'click action log'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
5、查看錶hdfs目錄
hdfs://bigdata-dev1.nexttao:8020/warehouse/tablespace/managed/hive/flume.db/action_log
6、插入一條測試數據
insert into action_log values ('1','2019-12-13 00:00:00','Raymond');
通過HDFS再插入一條
7、寫一個簡單flume配置文件
#agent1表示代理名稱
agent1.sources=source1
agent1.sinks=sink1
agent1.channels=channel1
#配置source1
agent1.sources.source1.type=TAILDIR
agent1.sources.source1.filegroups = f1
agent1.sources.source1.filegroups.f1 = /data/log/tracy/.*log.*
agent1.sources.source1.channels=channel1
agent1.sources.source1.fileHeader = false
#加攔截器
agent1.sources.source1.interceptors = i1
#時間戳攔截器
agent1.sources.source1.interceptors.i1.type = timestamp
#配置channel1
agent1.channels.channel1.type=file
agent1.channels.channel1.checkpointDir=/data/flume/tracy/cheackpointDir
agent1.channels.channel1.dataDirs=/data/flume/tracy/dataDirs
#配置sink1
agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=hdfs://bigdata-dev1.nexttao:8020/warehouse/tablespace/managed/hive/flume.db/action_log
#DataStream類似於textfile
agent1.sinks.sink1.hdfs.fileType=DataStream
#只寫入event的body部分
agent1.sinks.sink1.hdfs.writeFormat=TEXT
#hdfs創建多長時間新建文件,0不基於時間
agent1.sinks.sink1.hdfs.rollInterval=1
agent1.sinks.sink1.channel=channel1
agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d
8、啓動flume-ng
./flume-ng agent -n agent1 -c ../conf -f ../conf/log2hive.properties -Dflume.root.logger=DEBUG,console
9、啓動報錯
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
更換guava jar包版本
10、在監控的log目錄下添加新log
11、查看flume log
顯示數據已經寫入hdfs
12、查看hdfs
13、查看hive數據
14、再往1.log寫入一條數據
查看hive:
總結:這是一個從下載flume,到配置log2hive的簡單流程,只是簡單跑通,後續需要做優化壓測等