流程圖
Flume收集Nginx的日誌,然後存在Kafka隊列中,由storm讀取Kafka中的日誌信息,經過相關處理後,保存到HBase和MySQL中
安裝步驟Kafka
從官網下載安裝包, 解壓到安裝目錄
到kafka官網下載頁面下載:http://kafka.apache.org/downloads版本:kafka_2.10-0.8.1.1.tgz
$ tar -zxvf kafka_2.10-0.8.1.1.tgz -C /work/opt/modules/
修改配置文件
/opt/modules/kafka_2.10-0.8.2.1/config/server.propertiesbroker.id=0 port=9092 host.name=bigdata01.com num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/work/opt/modules/kafka_2.10-0.8.2.1/log-data num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=168 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 log.cleaner.enable=false zookeeper.connect=bigdata01.com:2181 zookeeper.connection.timeout.ms=6000
啓動broker
啓動之前,要先確保Zookeeper正常運行。broker啓動命令如下:
$ nohup bin/kafka-server-start.sh config/server.properties > logs/server-start.log 2>&1 &
查看進程是否正常:
$ ps -ef | grep kafka
檢查端口9092是否開放:
$ netstat -tlnup | grep 9092
創建topic
kafka正常啓動運行後,在kafka解壓路徑下,執行命令:
$ bin/kafka-topics.sh --create --topic nginxlog --partitions 1 --replication-factor 1 --zookeeper bigdata01.com:2181
查看topic詳情:
$ bin/kafka-topics.sh --describe --topic nginxlog --zookeeper bigdata01.com:2181
啓動console消息生產者,發送消息到kafka的topic上
$ bin/kafka-console-producer.sh --broker-list bigdata01.com:9092 --topic nginxlog
啓動console消息消費者,讀取kafka上topic的消息
$ bin/kafka-console-consumer.sh --zookeeper bigdata01.com: 2181 --topic nginxlog --from-beginning
模擬產生Nginx日誌文件
在服務器上創建一個工作目錄
mkdir -p /home/beifeng/project_workspace
將data-generate-1.0-SNAPSHOT-jar-with-dependencies.jar文件上傳到剛纔創建好的工作目錄
下載地址執行命令
java -jar data-generate-1.0-SNAPSHOT-jar-with-dependencies.jar 100 >> nginx.log
通過tail -f nginx.log 查看日誌生成情況
停止產生日誌,先使用jps查看進程pid,然後kill掉
配置Flume
編寫flume agent配置文件flume-kafka-storm.properties
內容如下:# The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called 'agent' a1.sources =s1 a1.channels =c1 a1.sinks = kafka_sink # define sources a1.sources.s1.type = exec a1.sources.s1.command =tail -F /home/beifeng/project_workplace/nginx.log #define channels a1.channels.c1.type = memory a1.channels.c1.capacity = 100 a1.channels.c1.transactionCapacity = 100 #define kafka sinks a1.sinks.kafka_sink.type =org.apache.flume.sink.kafka.KafkaSink a1.sinks.kafka_sink.topic=nginxlog a1.sinks.kafka_sink.brokerList=bigdata01.com:9092 a1.sinks.kafka_sink.requireAcks=1 a1.sinks.kafka_sink.batch=20 # Bind the source and sink to the channel a1.sources.s1.channels = c1 a1.sinks.kafka_sink.channel = c1
啓動flume agent
$ bin/flume-ng agent -n a1 -c conf/ --conf-file conf/flume-kafka-storm.properties -Dflume.root.logger=INFO,console
啓動kakfa的console消費者查看是否有日誌產生