Flume學習筆記 --- Flume搭建與配置

1.添加一個Agent
Flume代理配置存儲在本地文件中。這裏是一個遵循Java屬性文件格式的文本文件。可以在同一個配置文件中至指定一個或多個Agent的配置。
配置文件包含代理中每個Source, Sink和Channel的屬性以及如何連接稱爲數據流。

2.配置單個組件
流中的每個組件（source, sink, channel) 都具有特定與類別和實例化的名稱，類型和屬性集。
例如，Avro源需要主機名（IP地址）和端口號來接收數據。
channel可以具有最大隊列大小（‘容量’）。
HDFS的sink需要知道文件系統的URL，創建文件的路徑，文件輪換頻率（”hdfs.rollinterval”）
組件的所有此類屬性需要在Flume Agent的屬性文件中設置。

3.將各組件鏈接到一起
Agent知道要加載哪些組件以及它們如何連接已構成流程。這是通過列出代理中每個Source、Sink、Channel的名稱，然後爲每個Sink和Source指定連接的通道來完成。
例如這裏是實際例子：（tier1是agent的名字，後面的source、channnel等配置如下）

tier1.sources = source1

tier1.channels = kafka-mobile-channel kafka-schedule-channel kafka-nginx-channel kafka-bindcarderr-channel kafka-weuser-channel kafka-rrdweb-channel kafka-p2pweb-channel

tier1.sources.source1.type = avro

tier1.sources.source1.bind = 0.0.0.0

tier1.sources.source1.port = 44444

tier1.sources.source1.channels = kafka-mobile-channel kafka-schedule-channel kafka-nginx-channel kafka-bindcarderr-channel kafka-weuser-channel kafka-rrdweb-channel kafka-p2pweb-channel

tier1.sources.source1.selector.type = multiplexing

tier1.sources.source1.selector.header = topic

tier1.sources.source1.selector.mapping.mobile = kafka-mobile-channel

tier1.sources.source1.selector.mapping.schedule = kafka-schedule-channel

tier1.sources.source1.selector.mapping.nginx = kafka-nginx-channel

tier1.sources.source1.selector.mapping.bindcarderr = kafka-bindcarderr-channel

tier1.sources.source1.selector.mapping.we-user = kafka-weuser-channel

tier1.sources.source1.selector.mapping.rrd-web = kafka-rrdweb-channel

tier1.sources.source1.selector.mapping.p2p-web = kafka-p2pweb-channel

tier1.channels.kafka-mobile-channel.type = org.apache.flume.channel.kafka.KafkaChannel

tier1.channels.kafka-mobile-channel.parseAsFlumeEvent = false

tier1.channels.kafka-mobile-channel.kafka.topic = tomcat-mobile

tier1.channels.kafka-mobile-channel.kafka.consumer.group.id = flume-tomcat-mobile

tier1.channels.kafka-mobile-channel.kafka.consumer.auto.offset.reset = earliest

tier1.channels.kafka-mobile-channel.kafka.bootstrap.servers = xxxx3.data.com:9092,xxx2.data.com:9092,xxx1.data.com:9092

4.啓動一個新的Agent
一個Agent 開始用存放在Flume本地分佈式文件目錄bin中的Flume-ng的Script腳本。啓動時需要去指定agent的名字，配置的文件目錄。運行的命令行如下：

$ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template

執行後Agent將按照source、sink的給定配置運行。

5.一個簡單的例子：
這裏我們給定了一個描述單節點Flume部署的配置文件。這個配置使用戶生成的event，然後將其記錄到console。

# example.conf: A single-node Flume configuration

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

此配置定義了名字爲a1的Agent, a1監聽端口在4444上數據source, 一個緩衝內存中event的數據channel，以及一個將event數據記錄到控制檯的sink。
配置文件命名各種組件，然後描述其類型和配置的參數。給定的配置文件可能會定義幾個命名Agent，當一個給定的Flume進程啓動，會傳遞一個標誌，告訴它要現實哪個Agent的命名。

給定一個配置文件，我們也可以按照下面的方式啓動Flume:

$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

請注意。在完整的部署中，我們通常會包含一個選項： -conf = <conf-dir>。<conf-dir>目錄將包含一個shell腳本flume-env.sh以及log4屬性文件。這個例子中我們傳遞了一個Java選項來強制Flume登陸到控制檯，我們沒有定義環境腳本。

在一個單獨的終端，我們可以向Flume發送一個事件。

$ telnet localhost 44444

Trying 127.0.0.1...

Connected to localhost.localdomain (127.0.0.1).

Escape character is '^]'.

Hello world! <ENTER>

OK

原始的Flume終端將在日誌消息中輸出event

12/06/19 15:32:19 INFO source.NetcatSource: Source starting

12/06/19 15:32:19 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]

12/06/19 15:32:34 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D Hello world!. }

6.在配置文件中使用環境變量

Flume能夠替換配置中的環境變量，例如：

a1.sources = r1

a1.sources.r1.type = netcat

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = ${NC_PORT}

a1.sources.r1.channels = c1

注意：它目前僅適用於值，不適用於鍵。
通過設置

propertiesImplemention = org.apache.flume.node.EnvVatResolverProPerties

可以通過Agent程序調用java系統啓動此屬性功能。

例如：

$ NC_PORT = 44444 bin / flume-ng agent -conf conf -conf-file example.conf -name a1 -Dflume.root.logger = INFO，console -DpropertiesImplementation = org.apache.flume.node.EnvVarResolverProperties

7.記錄原始數據
在許多生產環境中記錄流經Channel的原始數據流，不是所希望的行爲，因爲這可能導致泄漏敏感數據或者相關的安全配置（例如密鑰）到Flume日誌文件。默認情況下，Flume不會記錄此類信息。
另一方面，如果數據管道被破壞，Flume將嘗試提供調試問題的線索。

調試管道問題的一個重罰是設置連接到Logger Sink的附加內存通道，它將所有的事件數據輸出到Flume日誌。但是在某些情況下，這種方法是不夠的。
爲了能夠記錄事件和配置相關的數據，除了Log4j屬性外，還必須設置一些Java系統屬性。

要啓動與配置相關的日誌記錄，請設置Java系統屬性-Dorg.apache.flume.log.printconfig = true.這可以在命令行上傳遞，也可以在Flume-env.sh中Java_OPTS變量中設置。

要啓用數據記錄，請按照上述相同方式設置Java系統屬性-Dorg.apache.flume.log.printconfig = true。對於大數據組件，還必須做log4j日誌級別設置。

下面是啓動配置日誌記錄和原始數據日誌記錄的示例，同時還將Log4J日誌級別設置爲DEBUG以用語控制檯輸出：

$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true

8.Zookeeper基礎設置
Flume通過Zookeeper支持代理配置。這是一個試驗性功能，配置文件需要在可配置的前綴下的Zookeeper中上傳，配置文件存儲在Zookeeper節點數據中。以下是Agent a1與a2 的Zookeeper節點樹的外觀

- /flume

|- /a1 [Agent config file]

|- /a2 [Agent config file]

一旦配置文件是被更新的，開始這個agent帶有流操作

$ bin/flume-ng agent –conf conf -z zkhost:2181,zkhost1:2181 -p /flume –name a1 -Dflume.root.logger=INFO,console

Argument Name	Default	Description
z	–	Zookeeper connection string. Comma separated list of hostname:port
p	/flume	Base Path in Zookeeper to store Agent configurations

Flume學習筆記 --- Flume搭建與配置

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

找出給定數組中兩個元素和剛好等於給定目標值的最小下標，時間複雜度要求O(n)

LeetCode --- 762. Prime Number of Set Bits in Binary Representation 解題報告

Python 實戰深拷貝與淺拷貝

LeetCode --- 748. Shortest Completing Word 解題報告

數據倉庫學習筆記 --- 緩慢變化維

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結