ES 常用聚合函數 ES 聚合查詢 ES 聚合查詢

關於常用聚合函數,ES提供了很多,具體查看官方文檔,本文在ES 聚合查詢的基礎上,相關測試數據也在ES 聚合查詢中.

1、range聚合

1.1 統計各個價格範圍的食品銷售情況,代碼如下:

GET food/_search?size=0
{
  "aggs": {
    "price_range": {
      "range": {
        "field": "Price",
        "ranges": [
          {
            "from": 0,
            "to": 100
          },
          {
            "from": 100,
            "to": 200
          },  
          {
            "from": 200,
            "to": 300
          }, 
          {
            "from": 300,
            "to": 400
          }
        ]
      }
    }
  }
}

 

1.2 統計每個月的食品銷售情況

GET food/_search?size=0
{
  "aggs": {
    "price_range": {
      "range": {
        "field": "CreateTime",
        "ranges": [
          {
            "from": "2022-05-01 00:00:00",
            "to": "2022-06-01 00:00:00"
          },
          {
            "from": "2022-06-01 00:00:00",
            "to": "2022-07-01 00:00:00"
          },  
          {
            "from": "2022-07-01 00:00:00",
            "to": "2022-08-01 00:00:00"
          }, 
          {
            "from": "2022-08-01 00:00:00",
            "to": "2022-09-01 00:00:00"
          }
        ]
      }
    }
  }
}

 

2、Histogram  柱狀圖統計 官方文檔

2.1 統計各個價位區間的食品銷售數量 間隔是100效果和1.1類似

GET food/_search?size=0
{
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "Price",
        "interval": 100
      }
    }
  }
}

 

2.2 統計各個價位區間的食品銷售數量 間隔是100 要求過濾掉所有區間能銷售量爲0的桶結果

GET food/_search?size=0
{
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "Price",
        "interval": 100,
        "min_doc_count": 1
      }
    }
  }
}

 

2.3 統計各個價位區間的食品銷售數量 間隔是100 如果區間內存在空值,統一用250替代

這裏需要新增一條價格爲空的數據方便演示,代碼如下:

PUT food/_doc/8
{
  "CreateTime":"2022-04-10 13:11:11",
  "Desc":"獼猴桃 對身體很有好處",
  "Level":"高級水果",
  "Name":"獼猴桃",
  "Price":"",
  "Tags":["性價比","水果","保健"],
  "Type":"水果"
}

查詢代碼如下:

GET food/_search?size=0
{
  "aggs": {
    "price_histogram": {
      "histogram": {
        "field": "Price",
        "interval": 100,
        "missing": 250
      }
    }
  }
}

搜索結果如下:

{
  "took" : 106,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "price_histogram" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 4
        },
        {
          "key" : 100.0,
          "doc_count" : 2
        },
        {
          "key" : 200.0,
          "doc_count" : 1
        },
        {
          "key" : 300.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

注意原先結果裏面200-300區間是沒有數據的,這個時候插入了一條價格爲空的數據,且制定了miss條件爲250,es會將所有的價格爲空的值用250替換,所以結果中200-300範圍的count爲1.

 

2.4 key關鍵字

這裏key關鍵字的用法只是改變了桶聚合值得展示形式通過key value形式展示,這裏不在贅述.

 

3、Date-Histogram  官方文檔

 

3.1 按照日期進行聚合,統計每個月所有食品得銷量

GET food/_search?size=0
{
  "aggs": {
    "adate_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "calendar_interval": "month", //每隔一個月進行統計
        "format": "yyyy-MM", //日期展示按照  年月展示,
        "min_doc_count": 1 //過濾掉count爲0的數據
      }
    }
  }
}

這裏用calendar_interval做時間間隔,但是需要注意其支持的單位如下:minute=>1m,hour=>1h,day=>1d,week=>1w,month=>1M,quarter=>1q,year=>1y 最小支持到分鐘,最大支持到年.

 

3.2 按照時間進行聚合,統計沒毫秒所有食品得銷量

GET food/_search?size=0
{
  "aggs": {
    "adate_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "fixed_interval": "1ms", //每隔1毫秒進行統計
        "min_doc_count": 1 //過濾掉count爲0的數據
      }
    }
  }
}

這裏用fixed_interval做時間間隔,但是需要注意其支持的單位如下:ms,s,m,h,d 最小支持到毫秒,最大支持到天.

這裏有個嚴重的問題,使用毫秒進行分桶時,會造成es檢索出大量數據,造成es卡死,寫入收到嚴重影響,所以要慎用,使用前必須用query或者filter等等進行時間限制

 

3.3 統計今年一年內的每個月食品的銷售情況

這裏注意,上面的按照月份統計,如果1月份沒有數據,es進行分桶時並不會展示1月份的數據,那麼顯然不符合需求,所以需要讓1月份以0顯示出來,代碼如下

GET food/_search?size=0
{
  "aggs": {
    "adate_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "calendar_interval": "month", //每隔一個月進行統計
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2022-01-01 00:00:00",
          "max": "2022-12-31 00:00:00"
        }
      }
    }
  }
}

搜索結果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "adate_histogram" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01 00:00:00",
          "key" : 1640995200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-02-01 00:00:00",
          "key" : 1643673600000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-03-01 00:00:00",
          "key" : 1646092800000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-04-01 00:00:00",
          "key" : 1648771200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2022-05-01 00:00:00",
          "key" : 1651363200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-06-01 00:00:00",
          "key" : 1654041600000,
          "doc_count" : 3
        },
        {
          "key_as_string" : "2022-07-01 00:00:00",
          "key" : 1656633600000,
          "doc_count" : 4
        },
        {
          "key_as_string" : "2022-08-01 00:00:00",
          "key" : 1659312000000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-09-01 00:00:00",
          "key" : 1661990400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-10-01 00:00:00",
          "key" : 1664582400000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-11-01 00:00:00",
          "key" : 1667260800000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-12-01 00:00:00",
          "key" : 1669852800000,
          "doc_count" : 0
        }
      ]
    }
  }
}

這裏結果就是按照1月份到12月份,按照每個月份進行數量的統計.

注意:這裏extended_bounds和min_doc_count的參數的混合使用,當使用extended_bounds進行間隔空白填充時,min_doc_count必須爲0,上面說了min_doc_count是爲了過濾count爲0的風筒,如果min_doc_count爲1就會過濾掉extended_bounds產生的空白填充,這就自相矛盾了.

 

3.4 統計今年一年內的每個月食品的銷售情況,並按每個月的銷售數量進行排序

GET food/_search?size=0
{
  "aggs": {
    "adate_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "calendar_interval": "month", //每隔一個月進行統計
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2022-01-01 00:00:00",
          "max": "2022-12-31 00:00:00"
        },
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

 

3.5  統計今年每個月的食物的銷售數量,並且計算每個月相對於上一個月的累計值(商品價格的疊加)

GET food/_search?size=0
{
  "aggs": {
    "date_histogram": {
      "date_histogram": {
        "field": "CreateTime",
        "calendar_interval": "month", //每隔一個月進行統計
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2022-01-01 00:00:00",
          "max": "2022-12-31 00:00:00"
        }
      },
      "aggs": {
        "sum_agg": {
          "sum": {
            "field": "Price"
          }
        },
        "cumulative_sum_agg":{
          "cumulative_sum": {
            "buckets_path": "sum_agg"
          }
        }
      }
    }
  }
}

搜索結果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "date_histogram" : {
      "buckets" : [
        {
          "key_as_string" : "2022-01-01 00:00:00",
          "key" : 1640995200000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-02-01 00:00:00",
          "key" : 1643673600000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-03-01 00:00:00",
          "key" : 1646092800000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-04-01 00:00:00",
          "key" : 1648771200000,
          "doc_count" : 1,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-05-01 00:00:00",
          "key" : 1651363200000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 0.0
          }
        },
        {
          "key_as_string" : "2022-06-01 00:00:00",
          "key" : 1654041600000,
          "doc_count" : 3,
          "sum_agg" : {
            "value" : 89.32999992370605
          },
          "cumulative_sum_agg" : {
            "value" : 89.32999992370605
          }
        },
        {
          "key_as_string" : "2022-07-01 00:00:00",
          "key" : 1656633600000,
          "doc_count" : 4,
          "sum_agg" : {
            "value" : 511.43998622894287
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-08-01 00:00:00",
          "key" : 1659312000000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-09-01 00:00:00",
          "key" : 1661990400000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-10-01 00:00:00",
          "key" : 1664582400000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-11-01 00:00:00",
          "key" : 1667260800000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        },
        {
          "key_as_string" : "2022-12-01 00:00:00",
          "key" : 1669852800000,
          "doc_count" : 0,
          "sum_agg" : {
            "value" : 0.0
          },
          "cumulative_sum_agg" : {
            "value" : 600.7699861526489
          }
        }
      ]
    }
  }
}

從結果就可以看出,在計算出每個月銷量的同時,計算出了每個月的銷售額,並且通過cumulative_sum計算除了當前月份和前面所有月份的累計銷售額.

 

4、Auto-interval date histogram 官方文檔

自動直方圖,自動直方圖會按照指定的桶數量去計算interval,在某些場景下使用還是用方便的,比如統計今年每個月的食物的銷售情況,就可以指定桶數量爲180,代碼如下:

GET food/_search?size=0
{
  "aggs": {
    "auto_date_histogram_aggs": {
      "auto_date_histogram": {
        "field": "CreateTime",
        "buckets":"12",
        "format":"yyyy-MM-dd "
      }
    }
  }
}

結果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "auto_date_histogram_aggs" : {
      "buckets" : [
        {
          "key_as_string" : "2022-04-01",
          "key" : 1648771200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2022-05-01",
          "key" : 1651363200000,
          "doc_count" : 0
        },
        {
          "key_as_string" : "2022-06-01",
          "key" : 1654041600000,
          "doc_count" : 3
        },
        {
          "key_as_string" : "2022-07-01",
          "key" : 1656633600000,
          "doc_count" : 4
        }
      ],
      "interval" : "1M"
    }
  }
}

注意結果中Interval爲1M,就是es根據桶數量自動推算出來的.

 

4、Percentiles 餅圖統計

指定百分比計算值的範圍,分別統計百分之20、百分之40、百分之60、百分之80、百分之99的商品的價格在什麼值

GET food/_search?size=0
{
  "aggs": {
    "percentiles_agg": {
      "percentiles": {
        "field": "Price",
        "percents": [
          20,
          40,
          60,
          80,
          99
        ]
      }
    }
  }
}

結果如下:

{
  "took" : 29,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "percentiles_agg" : {
      "values" : {
        "20.0" : 11.109999656677246,
        "40.0" : 28.309999942779555,
        "60.0" : 89.91000061035156,
        "80.0" : 120.10999908447278,
        "99.0" : 300.1099853515625
      }
    }
  }
}

結果顯示百分之20的商品價格在11以內,百分之40的價格在28以內,百分之99的價格在300以內.

常用於計算接口的可靠性,假設接口相應在100ms以內,算合格,那麼這裏的百分之99對應的值,必須在100以內,纔算達標,以此類推.

 

5、Percentile ranks 餅圖統計

這個和Percentiles相反,兩者都是餅圖統計的一種,它可以計算指定範圍所佔的百分比,而Percentiles指定百分比計算範圍

GET food/_search?size=0
{
  "aggs": {
    "percentile_ranks_agg": {
       "percentile_ranks": {
         "field": "Price",
         "values": [
           100,
           200,
           300,
           400
         ]
       }
    }
  }
}

結果如下:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "percentile_ranks_agg" : {
      "values" : {
        "100.0" : 71.33613394088104,
        "200.0" : 85.69857243009302,
        "300.0" : 100.0,
        "400.0" : 100.0
      }
    }
  }
}

這裏就計算出100以內佔百分之71.....

 

 

到這裏結束,其餘查閱官方文檔.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章