Elasticsearch的聚合搜索用於對數據做一些複雜的分析統計,主要分爲指標聚合、桶聚合、管道聚合、矩陣聚合。其中指標聚合、桶聚合最常使用。
本文測試數據採用官方測試數據庫shakespeare(莎士比亞),可在Elasticsearch官網中下載到。此外本文內容均參考官方文檔內容。
1 指標聚合
1.1 Max Aggregation
Max Aggregation用於查找最大值,例如查找shakespeare索引中line_id
最大的文檔:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"max_line_id": {
"max": {
"field": "line_id"
}
}
}
}
max_line_id
爲結果名,也可以爲其它字符串,max_line_id
下面的鍵爲聚合方式,其max
代表爲Max Aggregation聚合,並需要指定field
爲需要進行聚合的文檔字段。
類似於MySQL中的select max(line_id) from shakespeare
。
查詢結果爲:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"max_line_id" : {
"value" : 111396.0
}
}
}
其查詢結果位於aggregations
中,即最大值爲111396。
1.2 Min Aggregation
和Max Aggregation相反,Min Aggregation用於查找最小值,例如查找shakespeare索引中line_id
最小的文檔:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"min_line_id": {
"min": {
"field": "line_id"
}
}
}
}
最後查詢結果同樣在aggregations
中。
1.3 Avg Aggregation
Avg Aggregation用於計算平均數,例如計算shakespeare索引中line_id
字段的平均數:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"avg_line_id": {
"avg": {
"field": "line_id"
}
}
}
}
查詢結果同樣在aggregations
中。
1.4 Sum Aggregation
Sum Aggregation用於計算總和,例如計算shakespeare索引中line_id
字段的平均數:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"sum_line_id": {
"sum": {
"field": "line_id"
}
}
}
}
1.5 Cardinality Aggregation
Cardinality Aggregation用於基數統計,其作用是先執行類似SQL中的distinct
去重操作,然後統計其集合長度。例如下列查詢中會統計出所有角色的數量:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"player_sum": {
"cardinality": {
"field": "play_name.keyword"
}
}
}
}
查詢結果:
{
# 省略其它字段
"aggregations" : {
"player_sum" : {
"value" : 36
}
}
}
表示有36個角色。
1.6 Stats Aggregation
Stats Aggregation即基本統計,會返回count
、max
、min
、avg
、sum
統計數據,例如查詢line_id
相關數據:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_stats": {
"stats": {
"field": "line_id"
}
}
}
}
查詢結果:
{
# 省略其它字段
"aggregations" : {
"line_id_stats" : {
"count" : 110486,
"min" : 4.0,
"max" : 111396.0,
"avg" : 55715.89386890647,
"sum" : 6.15582625E9
}
}
}
1.7 Extended Stats Aggregation
Extended Stats Aggregation比Stats Aggregation多了4個字段:平方和、方差、標準差、平均值加減兩個標準差的區間,例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_stats": {
"extended_stats": {
"field": "line_id"
}
}
}
}
查詢結果:
{
# 省略其它字段
"aggregations" : {
"line_id_stats" : {
"count" : 110486,
"min" : 4.0,
"max" : 111396.0,
"avg" : 55715.89386890647,
"sum" : 6.15582625E9,
"sum_of_squares" : 4.57201930511864E14,
"variance" : 1.0338374861198297E9,
"std_deviation" : 32153.34331169668,
"std_deviation_bounds" : {
"upper" : 120022.58049229984,
"lower" : -8590.792754486894
}
}
}
}
1.8 Percentiles Aggregation
Percentiles Aggregation用於百分位統計,具體操作是將某個字段的數據從大到小排序,並計算相應的累計百分位,某一百分位所對應的數據的值就是這一百分位的百分位數。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_percent": {
"percentiles": {
"field": "line_id",
"percents": [1, 5, 25, 50, 75, 95, 99]
}
}
}
}
查詢結果:
{
# 省略其它字段
"aggregations" : {
"line_id_percent" : {
"values" : {
"1.0" : 1115.3600000000001,
"5.0" : 5575.834045307443,
"25.0" : 27887.286615736997,
"50.0" : 55711.257765161325,
"75.0" : 83561.89545235902,
"95.0" : 105830.47105865781,
"99.0" : 110287.32171428572
}
}
}
}
1.9 Value Count Aggregation
Value Count Aggregation可按字段統計文檔數量,例如下面統計包含line_id
字段的文檔數量:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_count": {
"value_count": {
"field": "line_id"
}
}
}
}
查詢結果:
{
# 省略其它字段
"aggregations" : {
"line_id_count" : {
"value" : 110486
}
}
}
2 桶聚合
桶聚合類似於SQL中的GROUP BY
,即遍歷文檔內容,根據的文檔內容將其放到不同的桶中。
2.1 Terms Aggregation
Terms Aggregation用於分組聚合,例如根據play_name
字段對不同的文檔進行分組,然後統計每組文檔的數量,相當於select count(*) from shakespeare group by play_name
。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"per_player": {
"terms": {
"field": "play_name.keyword",
"size": 10
}
}
}
}
field
相當於GROUP BY
後面指定的字段,size
字段表示僅查詢出數量前10的桶。
查詢結果:
{
# 省略其它字段
"aggregations" : {
"per_player" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 72631,
"buckets" : [
{
"key" : "Hamlet",
"doc_count" : 4219
},
{
"key" : "Coriolanus",
"doc_count" : 3958
},
{
"key" : "Cymbeline",
"doc_count" : 3927
},
{
"key" : "Richard III",
"doc_count" : 3911
},
{
"key" : "Antony and Cleopatra",
"doc_count" : 3815
},
{
"key" : "Othello",
"doc_count" : 3742
},
{
"key" : "King Lear",
"doc_count" : 3735
},
{
"key" : "Troilus and Cressida",
"doc_count" : 3682
},
{
"key" : "A Winters Tale",
"doc_count" : 3469
},
{
"key" : "Henry VIII",
"doc_count" : 3397
}
]
}
}
}
2.2 Filter Aggregation
Filter Aggregation爲過濾器聚合搜索,可以把符合過濾器中條件的文檔劃分到不同的桶中。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"per_player": {
"filter": {
"term": {
"text_entry": "apple"
}
},
"aggs": {
"player": {
"terms": {
"field": "play_name.keyword",
"size": 10
}
}
}
}
}
}
上述查詢可以找出text_entry
包含單詞apple
的文檔,並按play_name
進行分組統計。
查詢結果:
{
# 省略其它字段
"aggregations" : {
"per_player" : {
"doc_count" : 10,
"player" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Taming of the Shrew",
"doc_count" : 2
},
{
"key" : "Twelfth Night",
"doc_count" : 2
},
{
"key" : "A Midsummer nights dream",
"doc_count" : 1
},
{
"key" : "Henry IV",
"doc_count" : 1
},
{
"key" : "King Lear",
"doc_count" : 1
},
{
"key" : "Loves Labours Lost",
"doc_count" : 1
},
{
"key" : "Merchant of Venice",
"doc_count" : 1
},
{
"key" : "The Tempest",
"doc_count" : 1
}
]
}
}
}
}
2.3 Filters Aggregation
Filters Aggregation相比Filter Aggregation,可以使用多個過濾器。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"per_player": {
"filters": {
"filters": [
{"match": { "text_entry": "apple" } }
]
},
"aggs": {
"player": {
"terms": {
"field": "play_name.keyword",
"size": 10
}
}
}
}
}
}
filters
數組中可以定義多個過濾器。
2.4 Range Aggregation
Range Aggregation是範圍聚合,用於反饋數據的分佈情況,例如對line_id
按照0至10000,10000到50000,50000以上進行範圍聚合,結果如下:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"id_range": {
"range": {
"field": "line_id",
"ranges": [
{ "from": 0, "to": 10000 },
{ "from": 10000, "to": 50000},
{ "from": 50000 }
]
}
}
}
}
查詢結果:
{
# 省略其它字段
"aggregations" : {
"id_range" : {
"buckets" : [
{
"key" : "0.0-10000.0",
"from" : 0.0,
"to" : 10000.0,
"doc_count" : 9909
},
{
"key" : "10000.0-50000.0",
"from" : 10000.0,
"to" : 50000.0,
"doc_count" : 39664
},
{
"key" : "50000.0-*",
"from" : 50000.0,
"doc_count" : 60913
}
]
}
}
}