大数据平台分布式搭建 - Flume部署

Section I:文件清单

1.apache-flume-1.8.0-bin.tar.gz

Section II: 下载链接

[Flume 下载链接]:http://flume.apache.org/releases/index.html

Section III: 通信工具Telnet和Flume部署

总览,集群信息:

节点角色 Master Slave1 Slave2
IP 192.168.137.128 192.168.137.129 192.168.137.130
HostName BlogMaster BlogSlave1 BlogSlave2
Hadoop BolgMaster-YES BlogSlave1-YES BlogSlave2-YES
Telnet BolgMaster-YES BlogSlave1-YES BlogSlave2-YES
Flume BolgMaster-YES BlogSlave1-NO BlogSlave2-NO

Step 1: 集群各节点均需安装telnet通信工具

BlogMaster、BlogSlave1和BlogSlave2节点均需安装Telnet通信工具,安装命令如下:
对于BlogMaster节点:

[root@BlogMaster conf]# yum install telnet

对于BlogSlave1节点:

[root@BlogSlave1 ~]# yum install telnet

对于BlogSlave2节点:

[root@BlogSlave2 ~]# yum install telnet

Step 2: Flume部署

以下操作仅在主节点BlogMaster进行。

  • Step 2.1: 解压flume安装包至指定目录

具体地,解压指定目录为/opt/cluster,即Hadoop集群所在根目录,解压命令如下:

[root@BlogMaster ~]# tar -zxvf apache-flume-1.8.0-bin.tar.gz -C /opt/cluster/
  • Step 2.2: 配置flume-env.sh环境变量(位于:/opt/cluster/apache-flume-1.8.0-bin/conf)

值得注意,进入该目录后,不出意外只会有flume-env.sh.template的文件。这里,则需以cp命令将其拷贝并重命名为flume-env.sh。之后进入该文件,修改其原始关联的JAVA_HOME,具体如下:

# Enviroment variables can be set here.
export JAVA_HOME=/opt/cluster/jdk1.8.0_181
  • Step 2.3: 配置log4j.properties文件的flume日志目录选项(位于:/opt/cluster/apache-flume-1.8.0-bin/conf)

进入该文件后,修改存储flume运行的日志记录的目录选项,具体如下:

flume.log.dir=/opt/cluster/apache-flume-1.8.0-bin/logs

之后,一定要在flume安装目录下创建名为"logs”的文件夹。

[root@BlogMaster apache-flume-1.8.0-bin]# mkdir logs
  • Step 2.4: 与HDFS交互的Hadoop相关Jar包配置

为使Flume具备将所监控数据与Hadoop集群的HDFS系统进行数据交互的能力,此处需要配置Flume与HDFS交互的Hadoop相关Jar的文件,并将其拷贝于Flume安装目录下lib子目录中。这里所指Jar包文件包含内容,具体如下:

  1. commons-configuration-1.6.jar (位于/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
  2. hadoop-auth-2.8.4.jar (位于/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
  3. hadoop-common-2.8.4.jar(位于/opt/cluster/hadoop-2.8.4/share/hadoop/common)
  4. hadoop-hdfs-2.8.4.jar(位于/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs)
  5. commons-io-2.4.jar (位于/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
  6. htrace-core4-4.0.1-incubating.jar(位于/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)

对此,操作如下:
第一步: 进入/opt/cluster/hadoop-2.8.4/share/hadoop/common目录,执行如下命令:

[root@BlogMaster common]# cp hadoop-common-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib

第二步: 进入/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs目录,执行如下命令:

[root@BlogMaster hdfs]# cp hadoop-hdfs-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib

第三步: 进入/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib目录,执行如下命令:

[root@BlogMaster lib]# cp commons-configuration-1.6.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp hadoop-auth-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp commons-io-2.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp htrace-core4-4.0.1-incubating.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
监控测试

进入Flume安装目录下,创建一个专用于Flume执行监控任务的名为“Job”目录,以便于管理。

Step 1: 控制台方式监测数据

进入该Job目录后,采用touch命令创建一个名为“netcat-flume-logger.conf”的文件,并进入该文件添加如下内容:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = BlogMaster
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

之后,进入如下三步操作:
第一步: 在主节点BlogMaster执行启动Flume agent的命令

[root@BlogMaster apache-flume-1.8.0-bin]# bin/flume-ng agent --conf conf/ --name a1 --conf-file job/netcat-flume-logger.conf -Dflume.root.logger=INFO,console

出现如下结果,则说明Flume进入监测状态:

Info: Sourcing environment configuration script /opt/cluster/apache-flume-1.8.0-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/cluster/hadoop-2.8.4/bin/hadoop) for HDFS access
Info: Including Hive libraries found via (/opt/cluster/apache-hive-1.2.2-bin) for Hive access
+ exec /opt/cluster/jdk1.8.0_181/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/opt/cluster/apache-flume-1.8.0-bin/conf:/opt/cluster/apache-flume-1.8.0-bin/lib/*:/opt/cluster/hadoop-2.8.4/etc/hadoop:/opt/cluster/hadoop-2.8.4/share/hadoop/common/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/common/*:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs/*:/opt/cluster/hadoop-2.8.4/share/hadoop/yarn/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/yarn/*:/opt/cluster/hadoop-2.8.4/share/hadoop/mapreduce/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/mapreduce/*:/opt/cluster/hadoop-2.8.4/contrib/capacity-scheduler/*.jar:/opt/cluster/apache-hive-1.2.2-bin/lib/*' -Djava.library.path=:/opt/cluster/hadoop-2.8.4/lib/native org.apache.flume.node.Application --name a1 --conf-file job/netcat-flume-logger.conf
SLF4J: Class path contains multiple SLF4J bindings.

第二步: 在任意一台节点执行产生数据的“生产者”命令

[root@BlogSlave2 ~]# telnet BlogMaster 44444

出现如下结果,则说明进入生产状态。

[root@BlogSlave2 ~]# telnet BlogMaster 44444
Trying 192.168.137.128...
Connected to BlogMaster.
Escape character is '^]'.

第三步: 对比观察生产者产生数据和BlogMaster的Shell端反馈数据的异同
在生产者端输入数据,看主节点BlogMaster Shell端是否反馈相同数据:
生产者端:

[root@BlogSlave2 ~]# telnet BlogMaster 44444
Trying 192.168.137.128...
Connected to BlogMaster.
Escape character is '^]'.
sda
OK
 I love Xiaoxiong
OK

BlogMaster Shell端:

2019-11-15 09:45:35,605 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 73 64 61 0D                                     sda. }
2019-11-15 09:46:21,682 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 20 49 20 6C 6F 76 65 20 58 69 61 6F 78 69 6F 6E  I love Xiaoxion }

两者的一致,说明本地监测与通信配置成功。

Step 2: HDFS方式监测数据

在执行下述操作前,需启动Hadoop集群和Yarn服务。

进一步,进入该Job目录后,采用touch命令创建一个名为“flume-file-hdfs.conf”的文件,并进入该文件添加如下内容:

## define agent
a2.sources = r2
a2.channels = c2
a2.sinks = k2

## define sources
a2.sources.r2.type = exec
a2.sources.r2.command =tail -F /opt/cluster/apache-hive-1.2.2-bin/logs/hive.log
a2.sources.r2.shell = /bin/bash -c
a2.sources.r2.batchSize=800000

## define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000000
a2.channels.c2.transactionCapacity = 100000

## define sinks
a2.sinks.k2.type = hdfs
##将收集的日志文件放在对应的目录下
a2.sinks.k2.hdfs.path = hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/%Y%m%d/%S/
## 文件类型
a2.sinks.k2.hdfs.fileType = DataStream
##文件写入格式
a2.sinks.k2.hdfs.writeFormat = Text
a2.sinks.k2.hdfs.batchSize = 100000
a2.sinks.k2.hdfs.rollInterval=0
##hdfs 设置大小
a2.sinks.k2.hdfs.rollSize=102400000
a2.sinks.k2.hdfs.rollCount=10000
##启动本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp=true 

##日志文件前缀
a2.sinks.k2.hdfs.filePrefix = events-
a2.sinks.k2.hdfs.round = true
a2.sinks.k2.hdfs.roundValue = 10
a2.sinks.k2.hdfs.roundUnit = second

### bind the sources and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

其后,执行如下两步操作:
第一步:启动Fume Agent,开始监测

[root@BlogMaster apache-flume-1.8.0-bin]# bin/flume-ng agent --conf conf/ --name a2 --conf-file job/flume-file-hdfs.conf -Dflume.root.logger=INFO,console

第二步:操作Hive,产生日志

hive (flume_test)> select * from student;
OK
student.id	student.name
1	stu1

第三步:查看Flume监测数据是否存储于HDFS指定目录
Flume启动任务Shell端显示日志信息,如下:

2019-11-15 10:29:57,724 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/20191115/50//events-.1573784997429.tmp
2019-11-15 10:30:00,276 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false
2019-11-15 10:30:00,330 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/20191115/00//events-.1573785000277.tmp

登录网址:http://192.168.137.128:50070/explorer.html#/,查看存放于HDFS上的flume文件夹下的数据信息:
在这里插入图片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章