大數據分析架構本地環境搭建及測試

功能:log 日誌收集和分析

流程:1.應用程序產生本地log文件

           2.flume監控文件並收集日誌到kafka中

           3.spark Structure streaming監聽kafka獲取結構流進行分析,結果輸出到DB

           4.頁面通過查詢DB顯示結果

環境搭建:1.flume(apache-flume-1.9.0-bin)

                    (1)下載壓縮包解壓

                    (2)修改配置文件(採用spooldir souce ,memory channel,kafka sink)

                              

# define agent
testAgent.sources = testSource
testAgent.channels = testChannel
testAgent.sinks = testSink

# define source
testAgent.sources.testSource.type = spooldir
testAgent.sources.testSource.spoolDir = /bigData/flumeTest
testAgent.sources.testSource.fileHeader = true

#testAgent.sources.testSource.type = TAILDIR
#testAgent.sources.testSource.positionFile = /bigData/flumeTest/taildir_position.json
#testAgent.sources.testSource.filegroups = f1
#testAgent.sources.testSource.filegroups.f1 = /bigData/flumeTest/hello.txt
#testAgent.sources.testSource.headers.f1.headerKey1 = value1
#testAgent.sources.testSource.fileHeader = true
#testAgent.sources.testSource.maxBatchCount = 1000

# define sink
#testAgent.sinks.testSink.type = logger
#testAgent.sinks.testSink.type = file_roll
#testAgent.sinks.testSink.sink.directory = /bigData/sinkTest

testAgent.sinks.testSink.type = org.apache.flume.sink.kafka.KafkaSink
testAgent.sinks.testSink.kafka.topic = test
testAgent.sinks.testSink.kafka.bootstrap.servers = 127.0.0.1:9092
testAgent.sinks.testSink.kafka.flumeBatchSize = 20
testAgent.sinks.testSink.kafka.producer.acks = 1
testAgent.sinks.testSink.kafka.producer.linger.ms = 1
testAgent.sinks.testSink.kafka.producer.compression.type = snappy


# define channel
testAgent.channels.testChannel.type= memory
testAgent.channels.testChannel.capacity=1000
testAgent.channels.testChannel.transactionCapacity=100

#bind source&sink channel
testAgent.sources.testSource.channels = testChannel
testAgent.sinks.testSink.channel = testChannel

                2.zookeeper(zookeeper-3.4.5),kafka(kafka_2.12-2.2.0)安裝

                   (1)下載壓縮包,配置環境變量

                3.hadoop(hadoop-2.7.7),spark(spark-2.4.3-bin-hadoop2.7)安裝

                   (1)下載壓縮包,配置環境變量

測試過程:1.啓動zookeeper

                     zkserver

                   2.啓動kafka

                        .\bin\windows\kafka-server-start.bat .\config\server.properties

                   3.啓動spark監聽程序

                   4.啓動flume

                        bin\flume-ng.cmd agent -n testAgent -c conf -f conf\flume-conf.properties.template -property 

                        "flume.root.logger=INFO,console"

                    5.flume監控目錄中生成文件

                         echo {"userId":"0003","userName":"testUser","userAge":43} >> test.json

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章