flume學習筆記(一):cdh5.14.2中安裝,啓動,測試flume

說明:此文特爲初次使用cdh上flume,並且對flume有一定認識的同學參考使用,具體請參考官網:
Apache Flume™
環境:centos7.3 1708 ,cdh 5.14.2

1. 在cdh中添加flume服務

看圖:
圖一
這裏寫圖片描述
圖二
這裏寫圖片描述
圖三
這裏寫圖片描述
圖四
這裏寫圖片描述
圖片五
這裏寫圖片描述
圖片六
在這裏啓動一下flume
這裏寫圖片描述
圖片七
這裏寫圖片描述
圖片八
這裏寫圖片描述

2.使用默認配置測試flume正常運行

默認配置文件配置了以netcat(網絡打印輸出)作爲source,以內存memery作爲channel,以logger作爲sink輸出到日誌文件中的一個簡單樣例配置。
配置如下(如果是做flume的安裝測試,無需改動該配置):

# Please paste flume.conf here. Example:

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources  = source1
tier1.channels = channel1
tier1.sinks    = sink1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.source1.type     = netcat
tier1.sources.source1.bind     = 127.0.0.1
tier1.sources.source1.port     = 9999
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type   = memory
tier1.sinks.sink1.type         = logger
tier1.sinks.sink1.channel      = channel1

# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100

agent的名字是tier1
source是source1
channel是channel1
sink是sink1

source的類型是netcat(來自網絡的屏幕輸出)
監聽的網絡地址是127.0.0.1本地
監聽端口是 9999

source輸出給channel1
使用memory作爲channel1
channel1輸出給sink1
sink1的類型是logger(日誌)
最後一行是規定channel1每次的緩存能力是100

到這裏,一切準備就緒了

3.

下面開始測試:
在cdh04機器中,(也是上述安裝了flume,和作了配置的機器),使用telnet工具連接到127.0.0.1(或則localhost) 9999端口(上述配置中source綁定的監聽端口)【如果沒有安裝telnet,參考後面的telnet安裝說明】
telnet localhost 9999
使用telnet連接到localhost本主機
出現Escape character is ‘^]’.後說明連接就緒
我們隨意發送一些東西:
HELLO------------------
回車
如下:

telnet localhost 9999

Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HELLO------------------
OK

4. 查看經過flume採集到日誌中的情況:

日誌位置:
這裏寫圖片描述
找到此位置,tail -100 flume-cmf-flume-AGENT-cdh04.log
找到
這裏寫圖片描述

2018-08-16 14:21:05,100 INFO org.apache.flume.instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: channel1 started
2018-08-16 14:21:05,600 INFO org.apache.flume.node.Application: Starting Sink sink1
2018-08-16 14:21:05,600 INFO org.apache.flume.node.Application: Starting Source source1
2018-08-16 14:21:05,601 INFO org.apache.flume.source.NetcatSource: Source starting
2018-08-16 14:21:05,602 INFO org.apache.flume.source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:9999]
2018-08-16 14:21:05,603 INFO org.mortbay.log: jetty-6.1.26.cloudera.4
2018-08-16 14:21:05,604 INFO org.mortbay.log: Started [email protected]:41414
2018-08-16 16:03:25,948 INFO org.apache.flume.sink.LoggerSink: Event: { headers:{} body: 48 45 4C 4C 4F 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D HELLO----------- }

至此說明flume安裝沒問題了,可以使用了。

5. 安裝telnet

sudo yum -y install telnet-0.17-64.el7.x86_64

6. 將netcat數據通過flume採集到hdfs

按照如下配置修改flume的配置文件即可

# Please paste flume.conf here. Example:

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources  = source1
#tier1.sources  = avro-source1
tier1.channels = channel1
tier1.sinks    = sink1

# For each source, channel, and sink, set
# standard properties.
tier1.sources.source1.type     = netcat
tier1.sources.source1.bind     = 127.0.0.1
tier1.sources.source1.port     = 9999
tier1.sources.source1.channels = channel1
tier1.channels.channel1.type   = memory



# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
#tier1.sources.avro-source1.channels = ch1
#tier1.sources.avro-source1.type = avro
#tier1.sources.avro-source1.bind = 0.0.0.0
#tier1.sources.avro-source1.port = 41414
#tier1.sources.avro-source1.threads = 5
 
#define source monitor a file
#tier1.sources.avro-source1.type = exec
#tier1.sources.avro-source1.shell = /bin/bash -c
#tier1.sources.avro-source1.command = tail -n +0 -F cdh03:/home/d2
#tier1.sources.avro-source1.channels = channel1
#tier1.sources.avro-source1.threads = 5
 



# tier1.sinks.sink1.type         = hdfs
tier1.sinks.sink1.channel      = channel1

# Describe the sink
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = /flume/
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.filePrefix=test_flume
tier1.sinks.sink1.hdfs.rollCount=0
tier1.sinks.sink1.hdfs.rollInterval=0


# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.channel1.capacity = 100
  • 提示:tier1.sinks.sink1.hdfs.path = /flume/這句指定了數據存放到hdfs中的位置,但這裏並沒有帶’hdfs://'這個schame,是因爲,在cdh中配置的flume會自動識別配置hdfs的這個schame。當然你加上也不會錯。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章