Apache Druid 從Kafka加載數據 -- 全流程分析

目錄

一、Kafka 創建topic、生產者

二、向kafka生產數據

三、Apache Druid 配置DataSource 數據源

1) Start

2) Connect

3) Pase Data

4) Pase Time

5) Transform【可跳過】

6) Filter 【可跳過】

7) Configure Schema【重點配置】

8) Partition

9) Tune

10) Pulish

11) Edit Json spec

四、查詢示例說明


一、Kafka 創建topic、生產者

1. 創建topic

kafka-topics.sh --create --zookeeper node-01:2181,node-02:2181,node-03:2181 --replication-factor 1 --partitions 1 --topic fast_sales

2. 創建生產者

kafka-console-producer.sh --broker-list node-01:9092,node-02:9092,node-03:9092 --topic fast_sales

3. 創建消費者

kafka-console-consumer.sh --bootstrap-server node-01:9092,node-02:9092,node-03:9092 --topic fast_sales --group topic_test1_g1

二、向kafka生產數據

{"timestamp":"2020-08-08T01:03.00z","category":"手機","areaName":"北京","monye":"1450"}
{"timestamp":"2020-08-08T01:03.00z","category":"手機","areaName":"北京","monye":"1450"}
{"timestamp":"2020-08-08T01:03.00z","category":"家電","areaName":"北京","monye":"1550"}

{"timestamp":"2020-08-08T01:03.00z","category":"手機","areaName":"深圳","monye":"1000"}
{"timestamp":"2020-08-08T01:03.01z","category":"手機","areaName":"深圳","monye":"2000"}
{"timestamp":"2020-08-08T01:04.01z","category":"手機","areaName":"深圳","monye":"2200"}

三、Apache Druid 配置DataSource 數據源

1) Start

2) Connect

3) Pase Data

4) Pase Time

5) Transform【可跳過】

6) Filter 【可跳過】

7) Configure Schema【重點配置

8) Partition

9) Tune

10) Pulish

Max parse exceptions: 2147483647

11) Edit Json spec

{
  "type": "kafka",
  "dataSchema": {
    "dataSource": "fast_sales",
    "parser": {
      "type": "string",
      "parseSpec": {
        "format": "json",
        "timestampSpec": {
          "column": "timestamp",
          "format": "iso"
        },
        "dimensionsSpec": {
          "dimensions": [
            "areaName",
            "category"
          ]
        }
      }
    },
    "metricsSpec": [
      {
        "type": "count",
        "name": "count"
      },
      {
        "type": "longSum",
        "name": "sum_monye",
        "fieldName": "monye",
        "expression": null
      }
    ],
    "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "DAY",
      "queryGranularity": "MINUTE",
      "rollup": true,
      "intervals": null
    },
    "transformSpec": {
      "filter": null,
      "transforms": []
    }
  },
  "tuningConfig": {
    "type": "kafka",
    "maxRowsInMemory": 1000000,
    "maxBytesInMemory": 0,
    "maxRowsPerSegment": 5000000,
    "maxTotalRows": null,
    "intermediatePersistPeriod": "PT10M",
    "basePersistDirectory": "/usr/local/imply-3.0.4/var/tmp/1609509057384-0",
    "maxPendingPersists": 0,
    "indexSpec": {
      "bitmap": {
        "type": "concise"
      },
      "dimensionCompression": "lz4",
      "metricCompression": "lz4",
      "longEncoding": "longs"
    },
    "buildV9Directly": true,
    "reportParseExceptions": false,
    "handoffConditionTimeout": 0,
    "resetOffsetAutomatically": true,
    "segmentWriteOutMediumFactory": null,
    "workerThreads": null,
    "chatThreads": null,
    "chatRetries": 8,
    "httpTimeout": "PT10S",
    "shutdownTimeout": "PT80S",
    "offsetFetchPeriod": "PT30S",
    "intermediateHandoffPeriod": "P2147483647D",
    "logParseExceptions": true,
    "maxParseExceptions": 2147483647,
    "maxSavedParseExceptions": 0,
    "skipSequenceNumberAvailabilityCheck": false
  },
  "ioConfig": {
    "topic": "fast_sales",
    "replicas": 1,
    "taskCount": 1,
    "taskDuration": "PT3600S",
    "consumerProperties": {
      "bootstrap.servers": "node-01:9092,node-02:9092,node-03:9092"
    },
    "pollTimeout": 100,
    "startDelay": "PT5S",
    "period": "PT30S",
    "useEarliestOffset": false,
    "completionTimeout": "PT1800S",
    "lateMessageRejectionPeriod": null,
    "earlyMessageRejectionPeriod": null,
    "stream": "fast_sales",
    "useEarliestSequenceNumber": false,
    "type": "kafka"
  },
  "context": null,
  "suspended": false
}

四、查詢示例說明

1)數據源

2)回憶向Kafka輸入數據有,如下:

{"timestamp":"2020-08-08T01:03.00z","category":"手機","areaName":"北京","monye":"1450"}
{"timestamp":"2020-08-08T01:03.00z","category":"手機","areaName":"北京","monye":"1450"}
{"timestamp":"2020-08-08T01:03.00z","category":"家電","areaName":"北京","monye":"1550"}

{"timestamp":"2020-08-08T01:03.00z","category":"手機","areaName":"深圳","monye":"1000"}
{"timestamp":"2020-08-08T01:03.01z","category":"手機","areaName":"深圳","monye":"2000"}
{"timestamp":"2020-08-08T01:04.01z","category":"手機","areaName":"深圳","monye":"2200"}

-- 查詢所有數據

-- 按時間範圍查詢數據

-- 查詢輸入數據總記錄數

-- 按地域、商品類別分類,統計銷售總金額 

-- 按地域分組,計算消費總額

-- 按商品品類分組,計算消費總額

-- 先摟時間範圍過濾,再按地域、商品品類分組,計算消費總額

 


文章最後,給大家推薦一些受歡迎的技術博客鏈接

  1. JAVA相關的深度技術博客鏈接
  2. Flink 相關技術博客鏈接
  3. Spark 核心技術鏈接
  4. 設計模式 —— 深度技術博客鏈接
  5. 機器學習 —— 深度技術博客鏈接
  6. Hadoop相關技術博客鏈接
  7. 超全乾貨--Flink思維導圖,花了3周左右編寫、校對
  8. 深入JAVA 的JVM核心原理解決線上各種故障【附案例】
  9. 請談談你對volatile的理解?--最近小李子與面試官的一場“硬核較量”
  10. 聊聊RPC通信,經常被問到的一道面試題。源碼+筆記,包懂
  11. 深入聊聊Java 垃圾回收機制【附原理圖及調優方法】

歡迎掃描下方的二維碼或 搜索 公衆號“大數據高級架構師”,我們會有更多、且及時的資料推送給您,歡迎多多交流!

                                           

       

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章