概述

Druid查詢是通過HTTP REST方式發送查詢請求，查詢的描述寫在一個JSON文件中，可以處理查詢請求的服務包括Broker、Historical和Realtime，這幾個服務節點都提供了相同的查詢接口，但一般是將查詢請求發送至Broker節點，由Broker節點根據查詢的數據源來轉發至Historical或者RealTime節點。

另外，目前已有很多開源的使用其他語言查詢Druid數據的包。具體可參考：http://druid.io/docs/latest/development/libraries.html

Druid自帶的JSON+HTTP的查詢方式，使用的數據源爲lxw1234。

執行查詢（這裏指定的是Broker Node的地址）：

curl -X POST 'http://node2:8092/druid/v2/?pretty' -H 'content-type: application/json' -d @query.json

Druid關於Query的官方文檔地址在：http://druid.io/docs/latest/querying/querying.html

查詢分類：

基本的查詢有三類：聚合查詢（Aggregation Queries）、元數據查詢（Metadata Queries）和搜索查詢（Search Queries）。

聚合查詢（Aggregation Queries）
- Timeseries
- TopN
- GroupBy
元數據查詢（Metadata Queries）
- TimeBoundary
- SegmentMetadata
- DatasourceMetadata
搜索查詢（Search Queries）
- Search

1、聚合查詢（Aggregation Queries）

聚合查詢就是指標數據根據一定的規則，在一個或多個維度上進行聚合。

分爲三類：

Timeseries
TopN
GroupBy

1.1 Timeseries

Timeseries查詢根據指定的時間區間及時間間隔進行聚合查詢，在查詢中還可以指定過濾條件，需要聚合的指標列、等。

timeseries 查詢包括如下的字段：

字段名	描述	是否必須
queryType	查詢類型，這裏只有填寫timeseries查詢	是
dataSource	要查詢的數據源	是
intervals	查詢的時間範圍，默認是ISO-8601格式	是
granularity	查詢結果進行聚合的時間粒度（時間間隔）	是
aggregations	聚合的類型、字段及結果顯示的名稱	是
postAggregations	後期聚合	否
filter	過濾條件	否
descending	是否降序	否
context	指定一些查詢參數	否

timeseries輸出每個時間粒度內指定條件的統計信息，通過filter指定條件過濾，通過aggregations和postAggregations指定聚合方式。timeseries不能輸出維度信息,granularity支持all,none,second,minute,hour,day,week,month,year等維度。

一個簡單的Timeseries查詢配置文件如下：

{
    "queryType": "timeseries",
    "dataSource": "lxw1234",
    "intervals": [ "2015-11-15/2015-11-18" ],
    "granularity": "day",
    "aggregations": [
        {"type": "longSum", "fieldName": "count", "name": "total_count"}
    ]
}

運行結果：

Zero-filling：
一般情況下，使用Timeseries查詢按天彙總，而某一天沒有數據（被過濾掉了），那麼在結果中會顯示該天的彙總結果爲0。比如上面的數據，假設2015-11-15這一天沒有符合條件的數據，那麼結果會變成：

{
  "timestamp" : "2015-11-15T00:00:00.000Z",
  "result" : {
    "total_count" : 0
  }
}

如果不希望這種數據出現在結果中，那麼可以使用context選項來去掉它，context是用來指定一些查詢參數，配置如下：

"context" : {
    "skipEmptyBuckets": "true"
 }

1.2 TopN（TopN queries）

TopN就是基於一個維度GroupBy，然後按照彙總後的指標排序，取TopN.

在Druid中，TopN查詢要比相同實現方式的GroupBy+Ordering效率快。

實現原理上，其實也就是分而治之，比如取Top10,由每個任務節點各自取Top10，然後統一發送至Broker，由Broker從各個節點的Top10中，再彙總出最終的Top10.

TopN 查詢包括如下的字段：

字段名	描述	是否必須
queryType	查詢類型，這裏只有填寫timeseries查詢	是
dataSource	要查詢的數據源	是
intervals	查詢的時間範圍，默認是ISO-8601格式	是
granularity	查詢結果進行聚合的時間粒度（時間間隔）	是
dimension	進行TopN查詢的維度，一個TopN查詢只能有一個維度	是
threshold	TopN中的N值	是
metric	進行統計並排序的metric	是
aggregations	聚合的類型、字段及結果顯示的名稱	是
postAggregations	後期聚合	否
filter	過濾條件	否
context	指定一些查詢參數	否

一個簡單的TopN查詢配置文件：

{
  "queryType": "topN",
  "dataSource": "lxw1234",
  "granularity": "day",
  "dimension": "cookieid",
  "metric": "total_count",
  "threshold" : 3,
  "aggregations": [
    {"type": "longSum", "fieldName": "count", "name": "total_count"}
  ],
  "intervals": ["2015-11-17/2015-11-18"]
}

該查詢查出每天pv最多的Top 3 cookieid，查詢結果：

注意：metric：是TopN專屬
metric 配置方式：

"metric":"<metric_name>" 默認情況是升序排序的
 
"metric" : {
    "type" : "numeric", //指定按照numeric 降序排序
    "metric" : "<metric_name>"
}
 
"metric" : {
    "type" : "inverted", //指定按照numeric 升序排序
    "metric" : "<metric_name>"
}
 
"metric" : {
    "type" : "lexicographic", //指定按照字典序排序
    "metric" : "<metric_name>"
}
 
"metric" : {
    "type" : "alphaNumeric", //指定按照數字排序
    "metric" : "<metric_name>"
}

1.3 GroupBy

GroupBy聚合查詢就是在多個維度上，將指標聚合。Druid中建議，能用TimeseriesQueries和TopN實現的查詢儘量不要用GroupBy，因爲GroupBy的性能要差一些。

// TODO
參考：http://lxw1234.com/archives/2015/11/561.htm

2、元數據查詢（Metadata Queries）

2.1 時間範圍查詢（Time Boundary Queries）

時間範圍查詢用來查詢一個數據源的最小和最大時間點。

{
    "queryType" : "timeBoundary",
    "dataSource": "lxw1234"
}

查詢結果：

[ {
  "timestamp" : "2015-11-15T00:00:00.000+08:00",
  "result" : {
    "minTime" : "2015-11-15T00:00:00.000+08:00",
    "maxTime" : "2015-11-18T23:59:59.000+08:00"
  }
} ]

另外，還有個bound選項，用來指定返回最大時間點還是最小時間點，如果不指定，則兩個都返回：

{
    "queryType" : "timeBoundary",
    "dataSource": "lxw1234",
    "bound": "maxTime"
}

此時只返回最大時間點：

[ {
  "timestamp" : "2015-11-18T23:59:59.000+08:00",
  "result" : {
    "maxTime" : "2015-11-18T23:59:59.000+08:00"
  }
} ]

2.2 Segments元數據查詢（Segment Metadata Queries）

Segments元數據查詢可以查詢到每個Segment的以下信息：

列名
Segment中所有列的基數（Cardinality），非STRING類型的列爲null；
每個列的預計大小（Bytes）；
該Segment的時間跨度；
列的類型；
該Segment的預估總大小；
Segment ID；

查詢配置：

{
  "queryType":"segmentMetadata",
  "dataSource":"lxw1234",
  "intervals":["2015-11-15/2015-11-19"]
}

查詢結果（只取了一個Segment）：

{
  "id" : "lxw1234_2015-11-17T00:00:00.000+08:00_2015-11-18T00:00:00.000+08:00_2015-11-18T16:53:02.158+08:00_1",
  "intervals" : [ "2015-11-17T00:00:00.000+08:00/2015-11-18T00:00:00.000+08:00" ],
  "columns" : {
    "__time" : {
      "type" : "LONG",
      "size" : 46837800,
      "cardinality" : null,
      "errorMessage" : null
    },
    "cookieid" : {
      "type" : "STRING",
      "size" : 106261532,
      "cardinality" : 1134359,
      "errorMessage" : null
    },
    "count" : {
      "type" : "LONG",
      "size" : 37470240,
      "cardinality" : null,
      "errorMessage" : null
    },
    "ip" : {
      "type" : "STRING",
      "size" : 63478131,
      "cardinality" : 735562,
      "errorMessage" : null
    }
  },
  "size" : 272782823
}

2.3 數據源元數據查詢（Data Source Metadata Queries）

這個查詢只是返回該數據源的最後一次有數據進入的時間。

比如，查詢配置文件：

{
    "queryType" : "dataSourceMetadata",
    "dataSource": "lxw1234"
}

結果爲：

[ {
  "timestamp" : "2015-11-18T23:59:59.000+08:00",
  "result" : {
    "maxIngestedEventTime" : "2015-11-18T23:59:59.000+08:00"
  }
} ]

3、搜索查詢（Search Queries）

select 類似於sql中select操作，select用來查看druid中的存儲的數據，並支持按照指定過濾器和時間段查看指定維度和metric，能通過descending字段指定排序順序，並支持分頁拉取，但不支持aggregations和postAggregations。

json 實例如下：

{
  "queryType": "select",
  "dataSource": "app_auto_prem_qd_pp3", 
  "granularity": "all", 
  "intervals": "1917-08-25T08:35:20+00:00/2017-08-25T08:35:20+00:00",
  "dimensions": [
      "status",
      "is_new_car"
  ], 
  "pagingSpec":{
  "pagingIdentifiers":{},
  "threshold":
  },
  "context" : {
   "skipEmptyBuckets" : "true"
  }
}

相當於SQL語句

select status,is_new_car from app_auto_prem_qd_pp3 limit 3

Druid查詢

目錄

概述

查詢分類：

1、聚合查詢（Aggregation Queries）

1.1 Timeseries

1.2 TopN（TopN queries）

1.3 GroupBy

2、元數據查詢（Metadata Queries）

2.1 時間範圍查詢（Time Boundary Queries）

2.2 Segments元數據查詢（Segment Metadata Queries）

2.3 數據源元數據查詢（Data Source Metadata Queries）

3、搜索查詢（Search Queries）

Scala Collection筆記

Scala中大箭頭的應用場景

Scala 基礎--對比 Java

Scala 方法與函數筆記

java獲取其他接口返回的json數據【工具類】

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結