大數據分析架構本地環境搭建及測試

原創

EnderWang

2019-06-11 06:56

功能：log 日誌收集和分析

流程：1.應用程序產生本地log文件

2.flume監控文件並收集日誌到kafka中

3.spark Structure streaming監聽kafka獲取結構流進行分析，結果輸出到DB

4.頁面通過查詢DB顯示結果

環境搭建：1.flume（apache-flume-1.9.0-bin）

（1）下載壓縮包解壓

（2）修改配置文件（採用spooldir souce ，memory channel，kafka sink）

# define agent
testAgent.sources = testSource
testAgent.channels = testChannel
testAgent.sinks = testSink

# define source
testAgent.sources.testSource.type = spooldir
testAgent.sources.testSource.spoolDir = /bigData/flumeTest
testAgent.sources.testSource.fileHeader = true

#testAgent.sources.testSource.type = TAILDIR
#testAgent.sources.testSource.positionFile = /bigData/flumeTest/taildir_position.json
#testAgent.sources.testSource.filegroups = f1
#testAgent.sources.testSource.filegroups.f1 = /bigData/flumeTest/hello.txt
#testAgent.sources.testSource.headers.f1.headerKey1 = value1
#testAgent.sources.testSource.fileHeader = true
#testAgent.sources.testSource.maxBatchCount = 1000

# define sink
#testAgent.sinks.testSink.type = logger
#testAgent.sinks.testSink.type = file_roll
#testAgent.sinks.testSink.sink.directory = /bigData/sinkTest

testAgent.sinks.testSink.type = org.apache.flume.sink.kafka.KafkaSink
testAgent.sinks.testSink.kafka.topic = test
testAgent.sinks.testSink.kafka.bootstrap.servers = 127.0.0.1:9092
testAgent.sinks.testSink.kafka.flumeBatchSize = 20
testAgent.sinks.testSink.kafka.producer.acks = 1
testAgent.sinks.testSink.kafka.producer.linger.ms = 1
testAgent.sinks.testSink.kafka.producer.compression.type = snappy


# define channel
testAgent.channels.testChannel.type= memory
testAgent.channels.testChannel.capacity=1000
testAgent.channels.testChannel.transactionCapacity=100

#bind source&sink channel
testAgent.sources.testSource.channels = testChannel
testAgent.sinks.testSink.channel = testChannel

2.zookeeper（zookeeper-3.4.5），kafka（kafka_2.12-2.2.0）安裝

（1）下載壓縮包，配置環境變量

3.hadoop（hadoop-2.7.7），spark（spark-2.4.3-bin-hadoop2.7）安裝

（1）下載壓縮包，配置環境變量

測試過程：1.啓動zookeeper

zkserver

2.啓動kafka

.\bin\windows\kafka-server-start.bat .\config\server.properties

3.啓動spark監聽程序

4.啓動flume

bin\flume-ng.cmd agent -n testAgent -c conf -f conf\flume-conf.properties.template -property

"flume.root.logger=INFO,console"

5.flume監控目錄中生成文件

echo {"userId":"0003","userName":"testUser","userAge":43} >> test.json

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

大數據分析架構本地環境搭建及測試

Android開發使用SQLite存儲數據

Druid攝取kafka實時流json配置

Java 壓縮解壓（tar）

使用shade插件生成jar

大數據分析架構本地環境搭建及測試

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結