關於常用聚合函數,ES提供了很多,具體查看官方文檔,本文在ES 聚合查詢的基礎上,相關測試數據也在ES 聚合查詢中.
1、range聚合
1.1 統計各個價格範圍的食品銷售情況,代碼如下:
GET food/_search?size=0 { "aggs": { "price_range": { "range": { "field": "Price", "ranges": [ { "from": 0, "to": 100 }, { "from": 100, "to": 200 }, { "from": 200, "to": 300 }, { "from": 300, "to": 400 } ] } } } }
1.2 統計每個月的食品銷售情況
GET food/_search?size=0 { "aggs": { "price_range": { "range": { "field": "CreateTime", "ranges": [ { "from": "2022-05-01 00:00:00", "to": "2022-06-01 00:00:00" }, { "from": "2022-06-01 00:00:00", "to": "2022-07-01 00:00:00" }, { "from": "2022-07-01 00:00:00", "to": "2022-08-01 00:00:00" }, { "from": "2022-08-01 00:00:00", "to": "2022-09-01 00:00:00" } ] } } } }
2、Histogram 柱狀圖統計 官方文檔
2.1 統計各個價位區間的食品銷售數量 間隔是100效果和1.1類似
GET food/_search?size=0 { "aggs": { "price_histogram": { "histogram": { "field": "Price", "interval": 100 } } } }
2.2 統計各個價位區間的食品銷售數量 間隔是100 要求過濾掉所有區間能銷售量爲0的桶結果
GET food/_search?size=0 { "aggs": { "price_histogram": { "histogram": { "field": "Price", "interval": 100, "min_doc_count": 1 } } } }
2.3 統計各個價位區間的食品銷售數量 間隔是100 如果區間內存在空值,統一用250替代
這裏需要新增一條價格爲空的數據方便演示,代碼如下:
PUT food/_doc/8 { "CreateTime":"2022-04-10 13:11:11", "Desc":"獼猴桃 對身體很有好處", "Level":"高級水果", "Name":"獼猴桃", "Price":"", "Tags":["性價比","水果","保健"], "Type":"水果" }
查詢代碼如下:
GET food/_search?size=0 { "aggs": { "price_histogram": { "histogram": { "field": "Price", "interval": 100, "missing": 250 } } } }
搜索結果如下:
{ "took" : 106, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "price_histogram" : { "buckets" : [ { "key" : 0.0, "doc_count" : 4 }, { "key" : 100.0, "doc_count" : 2 }, { "key" : 200.0, "doc_count" : 1 }, { "key" : 300.0, "doc_count" : 1 } ] } } }
注意原先結果裏面200-300區間是沒有數據的,這個時候插入了一條價格爲空的數據,且制定了miss條件爲250,es會將所有的價格爲空的值用250替換,所以結果中200-300範圍的count爲1.
2.4 key關鍵字
這裏key關鍵字的用法只是改變了桶聚合值得展示形式通過key value形式展示,這裏不在贅述.
3、Date-Histogram 官方文檔
3.1 按照日期進行聚合,統計每個月所有食品得銷量
GET food/_search?size=0 { "aggs": { "adate_histogram": { "date_histogram": { "field": "CreateTime", "calendar_interval": "month", //每隔一個月進行統計 "format": "yyyy-MM", //日期展示按照 年月展示, "min_doc_count": 1 //過濾掉count爲0的數據 } } } }
這裏用calendar_interval做時間間隔,但是需要注意其支持的單位如下:minute=>
1m,
hour=>
1h,
day=>
1d,
week=>
1w,
month=>
1M,
quarter=>
1q,
year=>
1y 最小支持到分鐘,最大支持到年.
3.2 按照時間進行聚合,統計沒毫秒所有食品得銷量
GET food/_search?size=0 { "aggs": { "adate_histogram": { "date_histogram": { "field": "CreateTime", "fixed_interval": "1ms", //每隔1毫秒進行統計 "min_doc_count": 1 //過濾掉count爲0的數據 } } } }
這裏用fixed_interval做時間間隔,但是需要注意其支持的單位如下:ms,s,m,h,d 最小支持到毫秒,最大支持到天.
這裏有個嚴重的問題,使用毫秒進行分桶時,會造成es檢索出大量數據,造成es卡死,寫入收到嚴重影響,所以要慎用,使用前必須用query或者filter等等進行時間限制
3.3 統計今年一年內的每個月食品的銷售情況
這裏注意,上面的按照月份統計,如果1月份沒有數據,es進行分桶時並不會展示1月份的數據,那麼顯然不符合需求,所以需要讓1月份以0顯示出來,代碼如下
GET food/_search?size=0 { "aggs": { "adate_histogram": { "date_histogram": { "field": "CreateTime", "calendar_interval": "month", //每隔一個月進行統計 "min_doc_count": 0, "extended_bounds": { "min": "2022-01-01 00:00:00", "max": "2022-12-31 00:00:00" } } } } }
搜索結果如下:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "adate_histogram" : { "buckets" : [ { "key_as_string" : "2022-01-01 00:00:00", "key" : 1640995200000, "doc_count" : 0 }, { "key_as_string" : "2022-02-01 00:00:00", "key" : 1643673600000, "doc_count" : 0 }, { "key_as_string" : "2022-03-01 00:00:00", "key" : 1646092800000, "doc_count" : 0 }, { "key_as_string" : "2022-04-01 00:00:00", "key" : 1648771200000, "doc_count" : 1 }, { "key_as_string" : "2022-05-01 00:00:00", "key" : 1651363200000, "doc_count" : 0 }, { "key_as_string" : "2022-06-01 00:00:00", "key" : 1654041600000, "doc_count" : 3 }, { "key_as_string" : "2022-07-01 00:00:00", "key" : 1656633600000, "doc_count" : 4 }, { "key_as_string" : "2022-08-01 00:00:00", "key" : 1659312000000, "doc_count" : 0 }, { "key_as_string" : "2022-09-01 00:00:00", "key" : 1661990400000, "doc_count" : 0 }, { "key_as_string" : "2022-10-01 00:00:00", "key" : 1664582400000, "doc_count" : 0 }, { "key_as_string" : "2022-11-01 00:00:00", "key" : 1667260800000, "doc_count" : 0 }, { "key_as_string" : "2022-12-01 00:00:00", "key" : 1669852800000, "doc_count" : 0 } ] } } }
這裏結果就是按照1月份到12月份,按照每個月份進行數量的統計.
注意:這裏extended_bounds和min_doc_count的參數的混合使用,當使用extended_bounds進行間隔空白填充時,min_doc_count必須爲0,上面說了min_doc_count是爲了過濾count爲0的風筒,如果min_doc_count爲1就會過濾掉extended_bounds產生的空白填充,這就自相矛盾了.
3.4 統計今年一年內的每個月食品的銷售情況,並按每個月的銷售數量進行排序
GET food/_search?size=0 { "aggs": { "adate_histogram": { "date_histogram": { "field": "CreateTime", "calendar_interval": "month", //每隔一個月進行統計 "min_doc_count": 0, "extended_bounds": { "min": "2022-01-01 00:00:00", "max": "2022-12-31 00:00:00" }, "order": { "_count": "asc" } } } } }
3.5 統計今年每個月的食物的銷售數量,並且計算每個月相對於上一個月的累計值(商品價格的疊加)
GET food/_search?size=0 { "aggs": { "date_histogram": { "date_histogram": { "field": "CreateTime", "calendar_interval": "month", //每隔一個月進行統計 "min_doc_count": 0, "extended_bounds": { "min": "2022-01-01 00:00:00", "max": "2022-12-31 00:00:00" } }, "aggs": { "sum_agg": { "sum": { "field": "Price" } }, "cumulative_sum_agg":{ "cumulative_sum": { "buckets_path": "sum_agg" } } } } } }
搜索結果如下:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "date_histogram" : { "buckets" : [ { "key_as_string" : "2022-01-01 00:00:00", "key" : 1640995200000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-02-01 00:00:00", "key" : 1643673600000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-03-01 00:00:00", "key" : 1646092800000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-04-01 00:00:00", "key" : 1648771200000, "doc_count" : 1, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-05-01 00:00:00", "key" : 1651363200000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 0.0 } }, { "key_as_string" : "2022-06-01 00:00:00", "key" : 1654041600000, "doc_count" : 3, "sum_agg" : { "value" : 89.32999992370605 }, "cumulative_sum_agg" : { "value" : 89.32999992370605 } }, { "key_as_string" : "2022-07-01 00:00:00", "key" : 1656633600000, "doc_count" : 4, "sum_agg" : { "value" : 511.43998622894287 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-08-01 00:00:00", "key" : 1659312000000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-09-01 00:00:00", "key" : 1661990400000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-10-01 00:00:00", "key" : 1664582400000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-11-01 00:00:00", "key" : 1667260800000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } }, { "key_as_string" : "2022-12-01 00:00:00", "key" : 1669852800000, "doc_count" : 0, "sum_agg" : { "value" : 0.0 }, "cumulative_sum_agg" : { "value" : 600.7699861526489 } } ] } } }
從結果就可以看出,在計算出每個月銷量的同時,計算出了每個月的銷售額,並且通過cumulative_sum計算除了當前月份和前面所有月份的累計銷售額.
4、Auto-interval date histogram 官方文檔
自動直方圖,自動直方圖會按照指定的桶數量去計算interval,在某些場景下使用還是用方便的,比如統計今年每個月的食物的銷售情況,就可以指定桶數量爲180,代碼如下:
GET food/_search?size=0 { "aggs": { "auto_date_histogram_aggs": { "auto_date_histogram": { "field": "CreateTime", "buckets":"12", "format":"yyyy-MM-dd " } } } }
結果如下:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "auto_date_histogram_aggs" : { "buckets" : [ { "key_as_string" : "2022-04-01", "key" : 1648771200000, "doc_count" : 1 }, { "key_as_string" : "2022-05-01", "key" : 1651363200000, "doc_count" : 0 }, { "key_as_string" : "2022-06-01", "key" : 1654041600000, "doc_count" : 3 }, { "key_as_string" : "2022-07-01", "key" : 1656633600000, "doc_count" : 4 } ], "interval" : "1M" } } }
注意結果中Interval爲1M,就是es根據桶數量自動推算出來的.
4、Percentiles 餅圖統計
指定百分比計算值的範圍,分別統計百分之20、百分之40、百分之60、百分之80、百分之99的商品的價格在什麼值
GET food/_search?size=0 { "aggs": { "percentiles_agg": { "percentiles": { "field": "Price", "percents": [ 20, 40, 60, 80, 99 ] } } } }
結果如下:
{ "took" : 29, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "percentiles_agg" : { "values" : { "20.0" : 11.109999656677246, "40.0" : 28.309999942779555, "60.0" : 89.91000061035156, "80.0" : 120.10999908447278, "99.0" : 300.1099853515625 } } } }
結果顯示百分之20的商品價格在11以內,百分之40的價格在28以內,百分之99的價格在300以內.
常用於計算接口的可靠性,假設接口相應在100ms以內,算合格,那麼這裏的百分之99對應的值,必須在100以內,纔算達標,以此類推.
5、Percentile ranks 餅圖統計
這個和Percentiles相反,兩者都是餅圖統計的一種,它可以計算指定範圍所佔的百分比,而Percentiles指定百分比計算範圍
GET food/_search?size=0 { "aggs": { "percentile_ranks_agg": { "percentile_ranks": { "field": "Price", "values": [ 100, 200, 300, 400 ] } } } }
結果如下:
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "percentile_ranks_agg" : { "values" : { "100.0" : 71.33613394088104, "200.0" : 85.69857243009302, "300.0" : 100.0, "400.0" : 100.0 } } } }
這裏就計算出100以內佔百分之71.....
到這裏結束,其餘查閱官方文檔.