因爲項目需求,需要構建一個實時的數據監控系統,把平臺上報的業務數據以1分鐘的粒度進行呈現。爲此我構建了以下的一個架構來實現。
平臺上報的業務數據會實時的發送消息給Kafka,例如平臺每次爲車輛進行OTA升級時,會發送一個OTA業務請求的事件,一個OTA業務完成或失敗的事件。這些事件會發送到Kafka,然後Spark Streaming會進行實時的數據處理並保存到時序數據庫。前端的WEB 報表平臺會每一分鐘調用後端提供的RESTAPI來讀取數據庫,進行報表的刷新。
Kafka消息系統的搭建
首先是搭建Kafka,我下載的是kafka 2.12版本。按照官網的介紹先啓動zookeeper: bin/zookeeper-server-start.sh config/zookeeper.properties,然後啓動kafka: bin/kafka-server-start.sh config/server.properties
編寫一個producer程序模擬發送OTA的請求事件,以及OTA執行成功或失敗的事件,如以下代碼:
from kafka import KafkaProducer
import datetime
import time
import random
producer = KafkaProducer(bootstrap_servers='localhost:9092')
for i in range(100):
ts = datetime.datetime.now().isoformat()
msg = ts+','+'OTA request'
producer.send('OTA',msg.encode('utf8'))
time.sleep(0.5)
flag = random.randint(1,10)
ts = datetime.datetime.now().isoformat()
if flag>2:
msg = ts+','+'OTA complete'
else:
msg = ts+','+'OTA failure'
producer.send('OTA',msg.encode('utf8'))
編寫一個consumer程序訂閱OTA這個主題,接收事件:
from kafka import KafkaConsumer
consumer = KafkaConsumer('OTA')
for msg in consumer:
print((msg.value).decode('utf8'))
運行producer和consumer,可以看到能正常發送和接收事件。
創建時序數據庫
因爲數據是按照時間順序實時上報的,因此採用時序數據庫來進行數據的存放和後期的讀取是最有效的。TimescaleDB是一個開源的時序數據庫,可以作爲postgres的插件運行。具體如何使用可以上官網https://docs.timescale.com/瞭解,官網上還有很好的一些教程,對紐約的出租車的出行情況進行數據分析。安裝好timescaleDB之後,我們就可以在postgres上來創建一個數據庫了。以下是創建一個OTA數據庫,裏面定義了一張ota的數據表,包括了2個字段時間戳和業務類型
create database ota;
\c ota
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE TABLE "ota" (ts TIMESTAMPTZ, serviceType text);
SPARK Streaming處理實時數據
接下來就是用一個實時數據處理平臺來訂閱Kafka的數據並實時寫入到時序數據庫了。這裏我用的是SPARK streaming。先定義一個獲取數據庫連接池的程序connection_pool.py
import psycopg2
from psycopg2.pool import SimpleConnectionPool
conn_pool = SimpleConnectionPool(1,10,"dbname=ota user=postgres password=XXXXXX")
def getConnection():
return conn_pool.getconn()
def putConnection(conn):
conn_pool.putconn(conn)
def closeConnection():
conn_pool.closeall()
實時處理Kafka數據並寫入數據庫sparkstream.py
from kafka import KafkaProducer
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils, TopicAndPartition
from pyspark import SparkConf, SparkContext
import connection_pool
offsetRanges = []
def start():
sconf=SparkConf()
sconf.set('spark.cores.max',3)
sc=SparkContext(appName='OTAStream',conf=sconf)
ssc=StreamingContext(sc,5)
brokers ="localhost:9092"
topic='OTA'
start = 0
partition = 0
ota_data = KafkaUtils.createDirectStream(
ssc,
[topic],
kafkaParams={"metadata.broker.list":brokers},
fromOffsets={TopicAndPartition(topic,partition):start}
)
ota_data.foreachRDD(offset)
ota_fields = ota_data.map(lambda x:x[1].split(','))
ota_fields.foreachRDD(lambda rdd: rdd.foreachPartition(echo))
ssc.start()
ssc.awaitTermination(15)
connection_pool.closeConnection()
def offset(rdd):
global offsetRanges
offsetRanges = rdd.offsetRanges()
def echo(recordOfPartition):
conn = connection_pool.getConnection()
cursor = conn.cursor()
for record in recordOfPartition:
sql = "insert into ota values('%s', '%s')" %(record[0], record[1])
cursor.execute(sql)
conn.commit()
connection_pool.putConnection(conn)
if __name__ == '__main__':
start()
在SPARK的目錄下運行以下命令來提交任務:bin/spark-submit --jars jars/spark-streaming-kafka-0-8-assembly_2.11-2.4.5.jar --py-files ~/projects/monitor/connection_pool.py ~/projects/monitor/sparkstream.py,同時啓動Kafka的producer程序,之後查詢ota數據庫,可以看到裏面會有相應的數據產生。
定義後端RESTAPI查詢數據庫
後端需要提供數據給前端,這裏我用Flask來創建一個RESTAPI,代碼如下:
from flask import make_response, Flask
from flask_cors import CORS
import psycopg2
import json
import math
conn = psycopg2.connect("dbname=ota user=postgres password=123456")
cursor = conn.cursor()
sql = "select a.five_sec, a.cnt as complete, b.cnt as failure, cast(a.cnt as float)/(a.cnt+b.cnt) as percent from " + \
"(SELECT time_bucket('5 second', time) AS five_sec, count(*) as cnt FROM ota " + \
"WHERE servicetype='OTA complete' GROUP BY five_sec ORDER BY five_sec) a " + \
"full join " + \
"(SELECT time_bucket('5 second', time) AS five_sec, count(*) as cnt FROM ota " + \
"WHERE servicetype='OTA failure' GROUP BY five_sec ORDER BY five_sec) b " + \
"on a.five_sec=b.five_sec ORDER BY a.five_sec DESC LIMIT 10;"
app = Flask(__name__)
CORS(app, supports_credentials=True)
@app.route('/ota')
def get_ota():
cursor.execute(sql)
timebucket = []
complete_cnt = []
failure_cnt = []
complete_rate = []
records = cursor.fetchall()
for record in records:
#timebucket.append(record[0].strftime('%Y-%m-%d %H:%M:%S'))
timebucket.append(record[0].strftime('%H:%M:%S'))
complete_cnt.append(0 if record[1]==None else record[1])
failure_cnt.append(0 if record[2]==None else record[2])
if record[1]==None:
rate = 0.
elif record[2]==None:
rate = 100.
else:
rate = round(record[3],3)*100
complete_rate.append(rate)
timebucket = list(reversed(timebucket))
complete_cnt = list(reversed(complete_cnt))
failure_cnt = list(reversed(failure_cnt))
complete_rate = list(reversed(complete_rate))
result = {'category':timebucket,'complete':complete_cnt,'failure':failure_cnt,'rate':complete_rate}
response = make_response(json.dumps(result))
return response
以上代碼中可以見到,採用時序數據庫,可以很方便的對數據按時間順序進行分桶查詢,例如我以上的代碼是對數據按照每5秒的間隔統計一次,計算每5秒內的OTA業務完成的次數和失敗的次數,並計算完成率。最後把結果以JSON方式返回。
前端報表監控
最後一部分就是在前端進行報表展現,這裏採用的是ECHARTS,這是百度開源的一個報表Javascript模塊。我用Node.js來搭建前端的界面。
在命令行輸入以下命令:
mkdir react-monitor
npm init -y
npm -i webpack webpack-cli -D
npm -i -D babel-core babel-loader@7 babel-preset-env babel-preset-react
編輯webpack.config.js文件:
var webpack = require('webpack');
var path = require('path');
const {CleanWebpackPlugin} = require("clean-webpack-plugin");
var APP_DIR = path.resolve(__dirname, 'src');
var BUILD_DIR = path.resolve(__dirname, 'dist');
const HtmlWebpackPlugin = require("html-webpack-plugin");
var config = {
entry:APP_DIR+'/index.jsx',
output:{
path:BUILD_DIR,
filename:'bundle.js'
},
module:{
rules:[
{
test:/\.(js|jsx)$/,
exclude:/node_modules/,
use:{
loader:"babel-loader"
}
},
{
test:/\.css$/,
loader:'style-loader!css-loader'
}
]
},
devServer:{
port:3000,
contentBase:"./dist"
},
plugins:[
new HtmlWebpackPlugin({
template: "index.html",
inject: true,
sourceMap: true,
chunksSortMode: "dependency"
}),
new CleanWebpackPlugin()
]
};
module.exports = config;
創建.babelrc文件,如以下配置:
{
"presets": ["env","react"],
"plugins": [[
"transform-runtime",
{
"helpers": false,
"polyfill": false,
"regenerator": true,
"moduleName": "babel-runtime"
}
]]
}
在命令行中輸入以下命令,安裝NPM的包
npm install react react-dom -S
npm install html-webpack-plugin clean-webpack-plugin -D
npm install axios --save
npm install echarts --save
創建一個index.html頁面,用於放置圖表:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width,initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>hello world</title>
</head>
<body>
<div id="chart" style="height:500px;width:1000px"></div>
</body>
</html>
在src目錄下創建一個index.jsx文件,用於創建react component封裝echarts:
import React from 'react';
import {render} from 'react-dom';
import echarts from 'echarts';
import axios from 'axios';
let option = {
tooltip: {
trigger: 'axis',
axisPointer: {
type: 'cross',
crossStyle: {
color: '#999'
}
}
},
toolbox: {
feature: {
dataView: {show: true, readOnly: false},
magicType: {show: true, type: ['line', 'bar']},
restore: {show: true},
saveAsImage: {show: true}
}
},
legend: {
data: ['OTA Complete', 'OTA Failure', 'OTA Complete Rate']
},
xAxis: [
{
type: 'category',
data: [],
axisPointer: {
type: 'shadow'
},
axisLabel: {
interval: 0,
rotate: 60
}
}
],
yAxis: [
{
type: 'value',
name: 'Count',
min: 0,
max: 20,
interval: 5,
axisLabel: {
formatter: '{value}'
}
},
{
type: 'value',
name: 'Percent',
min: 0,
max: 100,
interval: 10,
axisLabel: {
formatter: '{value} %'
}
}
],
series: [
{
name: 'OTA Complete',
type: 'bar',
data: []
},
{
name: 'OTA Failure',
type: 'bar',
data: []
},
{
name: 'OTA Complete Rate',
type: 'line',
yAxisIndex: 1,
data: []
}
]
};
class App extends React.Component{
constructor(props){
super(props);
this.state={
category:[],
series_data_1:[],
series_data_2:[],
series_data_3:[]
};
}
async componentDidMount () {
let myChart = echarts.init(document.getElementById('chart'));
await axios.get('http://localhost:5000/ota')
.then(function (response){
option.xAxis[0].data = response.data.category;
option.series[0].data = response.data.complete;
option.series[1].data = response.data.failure;
option.series[2].data = response.data.rate;
})
.catch(function (error) {
console.log(error);
});
myChart.setOption(option,true)
this.state.timer=setInterval(async ()=>{
await axios.get('http://localhost:5000/ota')
.then(function (response){
option.xAxis[0].data = response.data.category;
option.series[0].data = response.data.complete;
option.series[1].data = response.data.failure;
option.series[2].data = response.data.rate;
})
.catch(function (error) {
console.log(error);
});
myChart.setOption(option,true);
}, 1000*2)
}
async getData(){
return await axios.get('http://localhost:5000/ota');
}
render(){
return 'abc'
}
componentWillUnmount() {
clearInterval(this.interval);
}
}
render(<App/>, document.getElementById('chart'));
運行效果
現在可以檢驗一下效果了,按照以下步驟執行:
- 開啓Kafka, 往OTA topic發佈測試數據
- 提交SPARK任務,實時處理數據並寫入timescaleDB
- 運行Flask,執行命令export FLASK_APP=flaskapi.py, flask run
- 在React項目中執行npm run start
- 打開瀏覽器,輸入localhost:3000,即可看到報表每2秒更新一次,如以下界面