{
"type": "kafka", //攝取類型
"dataSchema": {
"dataSource": "ea-test", //數據源名
"parser": {
"type": "string",
"parseSpec": {
"format": "json", //數據格式
"timestampSpec": {
"column": "time",
"format": "millis"
},
"dimensionsSpec": { //維度列
"dimensions": ["eventId", "appId"],
"dimensionExclusions": ["time"]
}
}
},
"metricsSpec": [{
"name": "eventCount", //指標列
"fieldName": "eventId",
"type": "count" //預聚合算法
}],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE"
},
"transformSpec": {
"filter": { //數據過濾
"type": "selector",
"dimension": "appId",
"value": "testAppId1"
}
}
},
"tuningConfig": {
"type": "kafka",
"maxRowsInMemory": "100000",
"maxRowsPerSegment": 5000000,
"workerThreads": 2,
"reportParseExceptions": true
},
"ioConfig": {
"topic": "ea_test_13",
"consumerProperties": {
"bootstrap.servers": "host1:port1,host2:port2,host3:post3",
"group.id": "kafka_group1"
},
"useEarliestOffset": true, //kafka topic中數據從起始獲取
"taskCount": 1,
"replicas": 1,
"taskDuration": "PT1H" //task執行時長
}
}
druid 攝取任務通過http請求overlord節點註冊任務。
數據可以通過superset展示,注意在druid中的數據是預聚合之後的數據,而不是原始數據。
druid默認採用零時區。需要在middleManager配置文件中配置+8時區,同時superset dataSource中需要時區偏移8,新建查詢任務時可以指定time的origin爲16:00。這樣聚合的數據纔會準確。