Flume学习笔记 --- Flume搭建与配置

1.添加一个Agent
Flume代理配置存储在本地文件中。这里是一个遵循Java属性文件格式的文本文件。可以在同一个配置文件中至指定一个或多个Agent的配置。
配置文件包含代理中每个Source, Sink和Channel的属性以及如何连接称为数据流。

2.配置单个组件
流中的每个组件（source, sink, channel) 都具有特定与类别和实例化的名称，类型和属性集。
例如，Avro源需要主机名（IP地址）和端口号来接收数据。
channel可以具有最大队列大小（‘容量’）。
HDFS的sink需要知道文件系统的URL，创建文件的路径，文件轮换频率（”hdfs.rollinterval”）
组件的所有此类属性需要在Flume Agent的属性文件中设置。

3.将各组件链接到一起
Agent知道要加载哪些组件以及它们如何连接已构成流程。这是通过列出代理中每个Source、Sink、Channel的名称，然后为每个Sink和Source指定连接的通道来完成。
例如这里是实际例子：（tier1是agent的名字，后面的source、channnel等配置如下）

tier1.sources = source1

tier1.channels = kafka-mobile-channel kafka-schedule-channel kafka-nginx-channel kafka-bindcarderr-channel kafka-weuser-channel kafka-rrdweb-channel kafka-p2pweb-channel

tier1.sources.source1.type = avro

tier1.sources.source1.bind = 0.0.0.0

tier1.sources.source1.port = 44444

tier1.sources.source1.channels = kafka-mobile-channel kafka-schedule-channel kafka-nginx-channel kafka-bindcarderr-channel kafka-weuser-channel kafka-rrdweb-channel kafka-p2pweb-channel

tier1.sources.source1.selector.type = multiplexing

tier1.sources.source1.selector.header = topic

tier1.sources.source1.selector.mapping.mobile = kafka-mobile-channel

tier1.sources.source1.selector.mapping.schedule = kafka-schedule-channel

tier1.sources.source1.selector.mapping.nginx = kafka-nginx-channel

tier1.sources.source1.selector.mapping.bindcarderr = kafka-bindcarderr-channel

tier1.sources.source1.selector.mapping.we-user = kafka-weuser-channel

tier1.sources.source1.selector.mapping.rrd-web = kafka-rrdweb-channel

tier1.sources.source1.selector.mapping.p2p-web = kafka-p2pweb-channel

tier1.channels.kafka-mobile-channel.type = org.apache.flume.channel.kafka.KafkaChannel

tier1.channels.kafka-mobile-channel.parseAsFlumeEvent = false

tier1.channels.kafka-mobile-channel.kafka.topic = tomcat-mobile

tier1.channels.kafka-mobile-channel.kafka.consumer.group.id = flume-tomcat-mobile

tier1.channels.kafka-mobile-channel.kafka.consumer.auto.offset.reset = earliest

tier1.channels.kafka-mobile-channel.kafka.bootstrap.servers = xxxx3.data.com:9092,xxx2.data.com:9092,xxx1.data.com:9092

4.启动一个新的Agent
一个Agent 开始用存放在Flume本地分布式文件目录bin中的Flume-ng的Script脚本。启动时需要去指定agent的名字，配置的文件目录。运行的命令行如下：

$ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template

执行后Agent将按照source、sink的给定配置运行。

5.一个简单的例子：
这里我们给定了一个描述单节点Flume部署的配置文件。这个配置使用户生成的event，然后将其记录到console。

# example.conf: A single-node Flume configuration

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

此配置定义了名字为a1的Agent, a1监听端口在4444上数据source, 一个缓冲内存中event的数据channel，以及一个将event数据记录到控制台的sink。
配置文件命名各种组件，然后描述其类型和配置的参数。给定的配置文件可能会定义几个命名Agent，当一个给定的Flume进程启动，会传递一个标志，告诉它要现实哪个Agent的命名。

给定一个配置文件，我们也可以按照下面的方式启动Flume:

$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

请注意。在完整的部署中，我们通常会包含一个选项： -conf = <conf-dir>。<conf-dir>目录将包含一个shell脚本flume-env.sh以及log4属性文件。这个例子中我们传递了一个Java选项来强制Flume登陆到控制台，我们没有定义环境脚本。

在一个单独的终端，我们可以向Flume发送一个事件。

$ telnet localhost 44444

Trying 127.0.0.1...

Connected to localhost.localdomain (127.0.0.1).

Escape character is '^]'.

Hello world! <ENTER>

OK

原始的Flume终端将在日志消息中输出event

12/06/19 15:32:19 INFO source.NetcatSource: Source starting

12/06/19 15:32:19 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

12/06/19 15:32:34 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D Hello world!. }

6.在配置文件中使用环境变量

Flume能够替换配置中的环境变量，例如：

a1.sources = r1

a1.sources.r1.type = netcat

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = ${NC_PORT}

a1.sources.r1.channels = c1

注意：它目前仅适用于值，不适用于键。
通过设置

propertiesImplemention = org.apache.flume.node.EnvVatResolverProPerties

可以通过Agent程序调用java系统启动此属性功能。

例如：

$ NC_PORT = 44444 bin / flume-ng agent -conf conf -conf-file example.conf -name a1 -Dflume.root.logger = INFO，console -DpropertiesImplementation = org.apache.flume.node.EnvVarResolverProperties

7.记录原始数据
在许多生产环境中记录流经Channel的原始数据流，不是所希望的行为，因为这可能导致泄漏敏感数据或者相关的安全配置（例如密钥）到Flume日志文件。默认情况下，Flume不会记录此类信息。
另一方面，如果数据管道被破坏，Flume将尝试提供调试问题的线索。

调试管道问题的一个重罚是设置连接到Logger Sink的附加内存通道，它将所有的事件数据输出到Flume日志。但是在某些情况下，这种方法是不够的。
为了能够记录事件和配置相关的数据，除了Log4j属性外，还必须设置一些Java系统属性。

要启动与配置相关的日志记录，请设置Java系统属性-Dorg.apache.flume.log.printconfig = true.这可以在命令行上传递，也可以在Flume-env.sh中Java_OPTS变量中设置。

要启用数据记录，请按照上述相同方式设置Java系统属性-Dorg.apache.flume.log.printconfig = true。对于大数据组件，还必须做log4j日志级别设置。

下面是启动配置日志记录和原始数据日志记录的示例，同时还将Log4J日志级别设置为DEBUG以用语控制台输出：

$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true

8.Zookeeper基础设置
Flume通过Zookeeper支持代理配置。这是一个试验性功能，配置文件需要在可配置的前缀下的Zookeeper中上传，配置文件存储在Zookeeper节点数据中。以下是Agent a1与a2 的Zookeeper节点树的外观

- /flume

|- /a1 [Agent config file]

|- /a2 [Agent config file]

一旦配置文件是被更新的，开始这个agent带有流操作

$ bin/flume-ng agent –conf conf -z zkhost:2181,zkhost1:2181 -p /flume –name a1 -Dflume.root.logger=INFO,console

Argument Name	Default	Description
z	–	Zookeeper connection string. Comma separated list of hostname:port
p	/flume	Base Path in Zookeeper to store Agent configurations

Flume学习笔记 --- Flume搭建与配置

找出給定數組中兩個元素和剛好等於給定目標值的最小下標，時間複雜度要求O(n)

LeetCode --- 762. Prime Number of Set Bits in Binary Representation 解題報告

Python 實戰深拷貝與淺拷貝

LeetCode --- 748. Shortest Completing Word 解題報告

數據倉庫學習筆記 --- 緩慢變化維

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結