（1）Storm實時日誌分析實戰--項目準備

原創

阳小林

2020-02-22 09:15

流程圖

Flume收集Nginx的日誌，然後存在Kafka隊列中，由storm讀取Kafka中的日誌信息，經過相關處理後，保存到HBase和MySQL中

安裝步驟Kafka

從官網下載安裝包，解壓到安裝目錄

到kafka官網下載頁面下載:http://kafka.apache.org/downloads版本：kafka_2.10-0.8.1.1.tgz
```
$ tar -zxvf kafka_2.10-0.8.1.1.tgz -C /work/opt/modules/
```

修改配置文件
/opt/modules/kafka_2.10-0.8.2.1/config/server.properties

broker.id=0
port=9092
host.name=bigdata01.com
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/work/opt/modules/kafka_2.10-0.8.2.1/log-data
num.partitions=1
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.cleaner.enable=false
zookeeper.connect=bigdata01.com:2181
zookeeper.connection.timeout.ms=6000

啓動broker

啓動之前，要先確保Zookeeper正常運行。broker啓動命令如下：

$ nohup bin/kafka-server-start.sh config/server.properties > logs/server-start.log 2>&1 &

查看進程是否正常：
```
$ ps -ef | grep kafka
```
檢查端口9092是否開放：
```
$ netstat -tlnup | grep 9092
```

創建topic

kafka正常啓動運行後，在kafka解壓路徑下，執行命令：

$ bin/kafka-topics.sh --create --topic nginxlog --partitions 1 --replication-factor 1 --zookeeper bigdata01.com:2181

查看topic詳情：

$ bin/kafka-topics.sh --describe --topic nginxlog --zookeeper bigdata01.com:2181

啓動console消息生產者，發送消息到kafka的topic上

$ bin/kafka-console-producer.sh --broker-list bigdata01.com:9092 --topic nginxlog

啓動console消息消費者，讀取kafka上topic的消息

$ bin/kafka-console-consumer.sh --zookeeper bigdata01.com:
2181 --topic nginxlog --from-beginning

模擬產生Nginx日誌文件

在服務器上創建一個工作目錄

mkdir -p /home/beifeng/project_workspace

將data-generate-1.0-SNAPSHOT-jar-with-dependencies.jar文件上傳到剛纔創建好的工作目錄
下載地址

執行命令

java -jar data-generate-1.0-SNAPSHOT-jar-with-dependencies.jar 100 >> nginx.log

通過tail -f nginx.log 查看日誌生成情況
停止產生日誌，先使用jps查看進程pid，然後kill掉

配置Flume

編寫flume agent配置文件flume-kafka-storm.properties
內容如下：


# The configuration file needs to define the sources, 


# the channels and the sinks.


# Sources, channels and sinks are defined per agent, 


# in this case called 'agent'


a1.sources =s1

a1.channels =c1

a1.sinks = kafka_sink


# define sources

a1.sources.s1.type = exec

a1.sources.s1.command =tail -F /home/beifeng/project_workplace/nginx.log



#define channels

a1.channels.c1.type = memory

a1.channels.c1.capacity = 100

a1.channels.c1.transactionCapacity = 100



#define kafka sinks

a1.sinks.kafka_sink.type =org.apache.flume.sink.kafka.KafkaSink
a1.sinks.kafka_sink.topic=nginxlog 
a1.sinks.kafka_sink.brokerList=bigdata01.com:9092
a1.sinks.kafka_sink.requireAcks=1
a1.sinks.kafka_sink.batch=20




# Bind the source and sink to the channel

a1.sources.s1.channels = c1
a1.sinks.kafka_sink.channel = c1

啓動flume agent

$ bin/flume-ng agent -n a1 -c conf/ --conf-file conf/flume-kafka-storm.properties -Dflume.root.logger=INFO,console

啓動kakfa的console消費者查看是否有日誌產生

陽小林

發佈了49 篇原創文章 · 獲贊 34 · 訪問量 15萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

（1）Storm實時日誌分析實戰--項目準備

流程圖

安裝步驟Kafka

模擬產生Nginx日誌文件

配置Flume

Thread.interrupted()與Thread.isInterrupted()的區別

jdk1.8中ConcurrentLinkedQueue的實現原理

（3）Storm實時日誌分析實戰--編碼實現

Synchronized的實現與原理

原子操作的實現與原理

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結