構建實時數據可視化監控的全棧實現(Kafka+Spark+TimescaleDB+Flask+Node.js)

因爲項目需求,需要構建一個實時的數據監控系統,把平臺上報的業務數據以1分鐘的粒度進行呈現。爲此我構建了以下的一個架構來實現。

平臺上報的業務數據會實時的發送消息給Kafka,例如平臺每次爲車輛進行OTA升級時,會發送一個OTA業務請求的事件,一個OTA業務完成或失敗的事件。這些事件會發送到Kafka,然後Spark Streaming會進行實時的數據處理並保存到時序數據庫。前端的WEB 報表平臺會每一分鐘調用後端提供的RESTAPI來讀取數據庫,進行報表的刷新。

Kafka消息系統的搭建

首先是搭建Kafka,我下載的是kafka 2.12版本。按照官網的介紹先啓動zookeeper: bin/zookeeper-server-start.sh config/zookeeper.properties,然後啓動kafka: bin/kafka-server-start.sh config/server.properties

編寫一個producer程序模擬發送OTA的請求事件,以及OTA執行成功或失敗的事件,如以下代碼:

from kafka import KafkaProducer
import datetime
import time
import random

producer = KafkaProducer(bootstrap_servers='localhost:9092')
for i in range(100):
    ts = datetime.datetime.now().isoformat()
    msg = ts+','+'OTA request'
    producer.send('OTA',msg.encode('utf8'))
    time.sleep(0.5)
    flag = random.randint(1,10)
    ts = datetime.datetime.now().isoformat()
    if flag>2:
        msg = ts+','+'OTA complete'
    else:
        msg = ts+','+'OTA failure'
    producer.send('OTA',msg.encode('utf8'))

編寫一個consumer程序訂閱OTA這個主題,接收事件:

from kafka import KafkaConsumer
 
consumer = KafkaConsumer('OTA')
for msg in consumer:
    print((msg.value).decode('utf8'))

運行producer和consumer,可以看到能正常發送和接收事件。

創建時序數據庫

因爲數據是按照時間順序實時上報的,因此採用時序數據庫來進行數據的存放和後期的讀取是最有效的。TimescaleDB是一個開源的時序數據庫,可以作爲postgres的插件運行。具體如何使用可以上官網https://docs.timescale.com/瞭解,官網上還有很好的一些教程,對紐約的出租車的出行情況進行數據分析。安裝好timescaleDB之後,我們就可以在postgres上來創建一個數據庫了。以下是創建一個OTA數據庫,裏面定義了一張ota的數據表,包括了2個字段時間戳和業務類型

create database ota;
\c ota
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
CREATE TABLE "ota" (ts TIMESTAMPTZ, serviceType text);

SPARK Streaming處理實時數據

接下來就是用一個實時數據處理平臺來訂閱Kafka的數據並實時寫入到時序數據庫了。這裏我用的是SPARK streaming。先定義一個獲取數據庫連接池的程序connection_pool.py

import psycopg2
from psycopg2.pool import SimpleConnectionPool

conn_pool = SimpleConnectionPool(1,10,"dbname=ota user=postgres password=XXXXXX")

def getConnection():
    return conn_pool.getconn()

def putConnection(conn):
    conn_pool.putconn(conn)

def closeConnection():
    conn_pool.closeall()

實時處理Kafka數據並寫入數據庫sparkstream.py

from kafka import KafkaProducer
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils, TopicAndPartition
from pyspark import SparkConf, SparkContext
import connection_pool
offsetRanges = [] 

def start():  
    sconf=SparkConf()  
    sconf.set('spark.cores.max',3)  
    sc=SparkContext(appName='OTAStream',conf=sconf)  
    ssc=StreamingContext(sc,5)  
    brokers ="localhost:9092"  
    topic='OTA'  
    start = 0  
    partition = 0  
    ota_data = KafkaUtils.createDirectStream(
        ssc,
        [topic],
        kafkaParams={"metadata.broker.list":brokers},
        fromOffsets={TopicAndPartition(topic,partition):start}
    )  
    ota_data.foreachRDD(offset)
    ota_fields = ota_data.map(lambda x:x[1].split(','))
    ota_fields.foreachRDD(lambda rdd: rdd.foreachPartition(echo))
    ssc.start()  
    ssc.awaitTermination(15)  
    connection_pool.closeConnection()
 
def offset(rdd):  
    global offsetRanges  
    offsetRanges = rdd.offsetRanges()  

def echo(recordOfPartition):  
    conn = connection_pool.getConnection()
    cursor = conn.cursor()
    for record in recordOfPartition:
        sql = "insert into ota values('%s', '%s')" %(record[0], record[1])
        cursor.execute(sql)
    conn.commit()
    connection_pool.putConnection(conn)

if __name__ == '__main__':  
    start()

在SPARK的目錄下運行以下命令來提交任務:bin/spark-submit --jars jars/spark-streaming-kafka-0-8-assembly_2.11-2.4.5.jar --py-files ~/projects/monitor/connection_pool.py ~/projects/monitor/sparkstream.py,同時啓動Kafka的producer程序,之後查詢ota數據庫,可以看到裏面會有相應的數據產生。

定義後端RESTAPI查詢數據庫

後端需要提供數據給前端,這裏我用Flask來創建一個RESTAPI,代碼如下:

from flask import make_response, Flask
from flask_cors import CORS
import psycopg2
import json
import math
conn = psycopg2.connect("dbname=ota user=postgres password=123456")
cursor = conn.cursor()
sql = "select a.five_sec, a.cnt as complete, b.cnt as failure, cast(a.cnt as float)/(a.cnt+b.cnt) as percent from " + \
    "(SELECT time_bucket('5 second', time) AS five_sec, count(*) as cnt FROM ota " + \
    "WHERE servicetype='OTA complete' GROUP BY five_sec ORDER BY five_sec) a " + \
    "full join " + \
    "(SELECT time_bucket('5 second', time) AS five_sec, count(*) as cnt FROM ota " + \
    "WHERE servicetype='OTA failure' GROUP BY five_sec ORDER BY five_sec) b " + \
    "on a.five_sec=b.five_sec ORDER BY a.five_sec DESC LIMIT 10;"
app = Flask(__name__)
CORS(app, supports_credentials=True)
@app.route('/ota')
def get_ota():
    cursor.execute(sql)
    timebucket = []
    complete_cnt = []
    failure_cnt = []
    complete_rate = []
    records = cursor.fetchall()
    for record in records:
        #timebucket.append(record[0].strftime('%Y-%m-%d %H:%M:%S'))
        timebucket.append(record[0].strftime('%H:%M:%S'))
        complete_cnt.append(0 if record[1]==None else record[1])
        failure_cnt.append(0 if record[2]==None else record[2])
        if record[1]==None:
            rate = 0.
        elif record[2]==None:
            rate = 100.
        else:
            rate = round(record[3],3)*100
        complete_rate.append(rate)
    timebucket = list(reversed(timebucket))
    complete_cnt = list(reversed(complete_cnt))
    failure_cnt = list(reversed(failure_cnt))
    complete_rate = list(reversed(complete_rate))
    result = {'category':timebucket,'complete':complete_cnt,'failure':failure_cnt,'rate':complete_rate}
    response = make_response(json.dumps(result))
    return response

以上代碼中可以見到,採用時序數據庫,可以很方便的對數據按時間順序進行分桶查詢,例如我以上的代碼是對數據按照每5秒的間隔統計一次,計算每5秒內的OTA業務完成的次數和失敗的次數,並計算完成率。最後把結果以JSON方式返回。

前端報表監控

最後一部分就是在前端進行報表展現,這裏採用的是ECHARTS,這是百度開源的一個報表Javascript模塊。我用Node.js來搭建前端的界面。

在命令行輸入以下命令:

mkdir react-monitor
npm init -y
npm -i webpack webpack-cli -D
npm -i -D babel-core babel-loader@7 babel-preset-env babel-preset-react

編輯webpack.config.js文件:

var webpack = require('webpack');
var path = require('path');
const {CleanWebpackPlugin} = require("clean-webpack-plugin");
var APP_DIR = path.resolve(__dirname, 'src');
var BUILD_DIR = path.resolve(__dirname, 'dist');
const HtmlWebpackPlugin = require("html-webpack-plugin");
var config = {
    entry:APP_DIR+'/index.jsx',
    output:{
        path:BUILD_DIR,
        filename:'bundle.js'
    },
    module:{
        rules:[
            {
                test:/\.(js|jsx)$/,
                exclude:/node_modules/,
                use:{
                    loader:"babel-loader"
                }
            },
            {
                test:/\.css$/,
                loader:'style-loader!css-loader'
            }
        ]
    },
    devServer:{
        port:3000,
        contentBase:"./dist"
    },
    plugins:[
        new HtmlWebpackPlugin({
            template: "index.html",
            inject: true,
            sourceMap: true,
            chunksSortMode: "dependency"
        }),
        new CleanWebpackPlugin()
    ]
};
module.exports = config;

創建.babelrc文件,如以下配置:

{
    "presets": ["env","react"],
    "plugins": [[
        "transform-runtime",
        {
          "helpers": false,
          "polyfill": false,
          "regenerator": true,
          "moduleName": "babel-runtime"
        }
    ]]
}

在命令行中輸入以下命令,安裝NPM的包

npm install react react-dom -S
npm install html-webpack-plugin clean-webpack-plugin -D
npm install axios --save
npm install echarts --save

創建一個index.html頁面,用於放置圖表:

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width,initial-scale=1.0">
        <meta http-equiv="X-UA-Compatible" content="ie=edge">
        <title>hello world</title>
    </head>
    <body>
        <div id="chart" style="height:500px;width:1000px"></div>
    </body>
</html>

在src目錄下創建一個index.jsx文件,用於創建react component封裝echarts:

import React from 'react';
import {render} from 'react-dom';
import echarts from 'echarts';
import axios from 'axios';

let option = {
    tooltip: {
        trigger: 'axis',
        axisPointer: {
            type: 'cross',
            crossStyle: {
                color: '#999'
            }
        }
    },
    toolbox: {
        feature: {
            dataView: {show: true, readOnly: false},
            magicType: {show: true, type: ['line', 'bar']},
            restore: {show: true},
            saveAsImage: {show: true}
        }
    },
    legend: {
        data: ['OTA Complete', 'OTA Failure', 'OTA Complete Rate']
    },
    xAxis: [
        {
            type: 'category',
            data: [],
            axisPointer: {
                type: 'shadow'
            },
            axisLabel: {
                interval: 0,
                rotate: 60
            }
        }
    ],
    yAxis: [
        {
            type: 'value',
            name: 'Count',
            min: 0,
            max: 20,
            interval: 5,
            axisLabel: {
                formatter: '{value}'
            }
        },
        {
            type: 'value',
            name: 'Percent',
            min: 0,
            max: 100,
            interval: 10,
            axisLabel: {
                formatter: '{value} %'
            }
        }
    ],
    series: [
        {
            name: 'OTA Complete',
            type: 'bar',
            data: []
        },
        {
            name: 'OTA Failure',
            type: 'bar',
            data: []
        },
        {
            name: 'OTA Complete Rate',
            type: 'line',
            yAxisIndex: 1,
            data: []
        }
    ]
};

class App extends React.Component{
    constructor(props){
        super(props);
        this.state={
            category:[],
            series_data_1:[],
            series_data_2:[],
            series_data_3:[]
        };
    }
    async componentDidMount () {
        let myChart = echarts.init(document.getElementById('chart'));
        await axios.get('http://localhost:5000/ota')
        .then(function (response){
            option.xAxis[0].data = response.data.category;
            option.series[0].data = response.data.complete;
            option.series[1].data = response.data.failure;
            option.series[2].data = response.data.rate;
        })
        .catch(function (error) {
            console.log(error);
        });
        myChart.setOption(option,true)
        this.state.timer=setInterval(async ()=>{
            await axios.get('http://localhost:5000/ota')
            .then(function (response){
                option.xAxis[0].data = response.data.category;
                option.series[0].data = response.data.complete;
                option.series[1].data = response.data.failure;
                option.series[2].data = response.data.rate;
            })
            .catch(function (error) {
                console.log(error);
            });
            myChart.setOption(option,true);
        }, 1000*2)
    }
    async getData(){
        return await axios.get('http://localhost:5000/ota');
    }
    render(){
        return 'abc'
    }
    componentWillUnmount() {
        clearInterval(this.interval);
    }
}
render(<App/>, document.getElementById('chart'));

運行效果

現在可以檢驗一下效果了,按照以下步驟執行:

  1. 開啓Kafka, 往OTA topic發佈測試數據
  2. 提交SPARK任務,實時處理數據並寫入timescaleDB
  3. 運行Flask,執行命令export FLASK_APP=flaskapi.py, flask run
  4. 在React項目中執行npm run start
  5. 打開瀏覽器,輸入localhost:3000,即可看到報表每2秒更新一次,如以下界面

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章