Elasticsearch 7.x 常用指標聚合、桶聚合搜索RESTful API

Elasticsearch的聚合搜索用於對數據做一些複雜的分析統計，主要分爲指標聚合、桶聚合、管道聚合、矩陣聚合。其中指標聚合、桶聚合最常使用。

本文測試數據採用官方測試數據庫shakespeare（莎士比亞），可在Elasticsearch官網中下載到。此外本文內容均參考官方文檔內容。

1 指標聚合

1.1 Max Aggregation

Max Aggregation用於查找最大值，例如查找shakespeare索引中line_id最大的文檔：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "max_line_id": {
      "max": {
        "field": "line_id"
      }
    }
  }
}

max_line_id爲結果名，也可以爲其它字符串，max_line_id下面的鍵爲聚合方式，其max代表爲Max Aggregation聚合，並需要指定field爲需要進行聚合的文檔字段。
類似於MySQL中的select max(line_id) from shakespeare。
查詢結果爲：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_line_id" : {
      "value" : 111396.0
    }
  }
}

其查詢結果位於aggregations中，即最大值爲111396。

1.2 Min Aggregation

和Max Aggregation相反，Min Aggregation用於查找最小值，例如查找shakespeare索引中line_id最小的文檔：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "min_line_id": {
      "min": {
        "field": "line_id"
      }
    }
  }
}

最後查詢結果同樣在aggregations中。

1.3 Avg Aggregation

Avg Aggregation用於計算平均數，例如計算shakespeare索引中line_id字段的平均數：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "avg_line_id": {
      "avg": {
        "field": "line_id"
      }
    }
  }
}

查詢結果同樣在aggregations中。

1.4 Sum Aggregation

Sum Aggregation用於計算總和，例如計算shakespeare索引中line_id字段的平均數：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "sum_line_id": {
      "sum": {
        "field": "line_id"
      }
    }
  }
}

1.5 Cardinality Aggregation

Cardinality Aggregation用於基數統計，其作用是先執行類似SQL中的distinct去重操作，然後統計其集合長度。例如下列查詢中會統計出所有角色的數量：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "player_sum": {
      "cardinality": {
        "field": "play_name.keyword"
      }
    }
  }
}

查詢結果：

{
  # 省略其它字段
  "aggregations" : {
    "player_sum" : {
      "value" : 36
    }
  }
}

表示有36個角色。

1.6 Stats Aggregation

Stats Aggregation即基本統計，會返回count、max、min、avg、sum統計數據，例如查詢line_id相關數據：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_stats": {
      "stats": {
        "field": "line_id"
      }
    }
  }
}

查詢結果：

{
  # 省略其它字段
  "aggregations" : {
    "line_id_stats" : {
      "count" : 110486,
      "min" : 4.0,
      "max" : 111396.0,
      "avg" : 55715.89386890647,
      "sum" : 6.15582625E9
    }
  }
}

1.7 Extended Stats Aggregation

Extended Stats Aggregation比Stats Aggregation多了4個字段：平方和、方差、標準差、平均值加減兩個標準差的區間，例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_stats": {
      "extended_stats": {
        "field": "line_id"
      }
    }
  }
}

查詢結果：

{
  # 省略其它字段
  "aggregations" : {
    "line_id_stats" : {
      "count" : 110486,
      "min" : 4.0,
      "max" : 111396.0,
      "avg" : 55715.89386890647,
      "sum" : 6.15582625E9,
      "sum_of_squares" : 4.57201930511864E14,
      "variance" : 1.0338374861198297E9,
      "std_deviation" : 32153.34331169668,
      "std_deviation_bounds" : {
        "upper" : 120022.58049229984,
        "lower" : -8590.792754486894
      }
    }
  }
}

1.8 Percentiles Aggregation

Percentiles Aggregation用於百分位統計，具體操作是將某個字段的數據從大到小排序，並計算相應的累計百分位，某一百分位所對應的數據的值就是這一百分位的百分位數。例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_percent": {
      "percentiles": {
        "field": "line_id",
        "percents": [1, 5, 25, 50, 75, 95, 99]
      }
    }
  }
}

查詢結果：

{
  # 省略其它字段
  "aggregations" : {
    "line_id_percent" : {
      "values" : {
        "1.0" : 1115.3600000000001,
        "5.0" : 5575.834045307443,
        "25.0" : 27887.286615736997,
        "50.0" : 55711.257765161325,
        "75.0" : 83561.89545235902,
        "95.0" : 105830.47105865781,
        "99.0" : 110287.32171428572
      }
    }
  }
}

1.9 Value Count Aggregation

Value Count Aggregation可按字段統計文檔數量，例如下面統計包含line_id字段的文檔數量：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_count": {
      "value_count": {
        "field": "line_id"
      }
    }
  }
}

查詢結果：

{
  # 省略其它字段
  "aggregations" : {
    "line_id_count" : {
      "value" : 110486
    }
  }
}

2 桶聚合

桶聚合類似於SQL中的GROUP BY，即遍歷文檔內容，根據的文檔內容將其放到不同的桶中。

2.1 Terms Aggregation

Terms Aggregation用於分組聚合，例如根據play_name字段對不同的文檔進行分組，然後統計每組文檔的數量，相當於select count(*) from shakespeare group by play_name。例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "terms": {
        "field": "play_name.keyword",
        "size": 10
      }
    }
  }
}

field相當於GROUP BY後面指定的字段，size字段表示僅查詢出數量前10的桶。
查詢結果：

{
  # 省略其它字段
  "aggregations" : {
    "per_player" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 72631,
      "buckets" : [
        {
          "key" : "Hamlet",
          "doc_count" : 4219
        },
        {
          "key" : "Coriolanus",
          "doc_count" : 3958
        },
        {
          "key" : "Cymbeline",
          "doc_count" : 3927
        },
        {
          "key" : "Richard III",
          "doc_count" : 3911
        },
        {
          "key" : "Antony and Cleopatra",
          "doc_count" : 3815
        },
        {
          "key" : "Othello",
          "doc_count" : 3742
        },
        {
          "key" : "King Lear",
          "doc_count" : 3735
        },
        {
          "key" : "Troilus and Cressida",
          "doc_count" : 3682
        },
        {
          "key" : "A Winters Tale",
          "doc_count" : 3469
        },
        {
          "key" : "Henry VIII",
          "doc_count" : 3397
        }
      ]
    }
  }
}

2.2 Filter Aggregation

Filter Aggregation爲過濾器聚合搜索，可以把符合過濾器中條件的文檔劃分到不同的桶中。例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "filter": {
        "term": {
          "text_entry": "apple"
        }
      },
      "aggs": {
        "player": {
          "terms": {
            "field": "play_name.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

上述查詢可以找出text_entry包含單詞apple的文檔，並按play_name進行分組統計。
查詢結果：

{
  # 省略其它字段
  "aggregations" : {
    "per_player" : {
      "doc_count" : 10,
      "player" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "Taming of the Shrew",
            "doc_count" : 2
          },
          {
            "key" : "Twelfth Night",
            "doc_count" : 2
          },
          {
            "key" : "A Midsummer nights dream",
            "doc_count" : 1
          },
          {
            "key" : "Henry IV",
            "doc_count" : 1
          },
          {
            "key" : "King Lear",
            "doc_count" : 1
          },
          {
            "key" : "Loves Labours Lost",
            "doc_count" : 1
          },
          {
            "key" : "Merchant of Venice",
            "doc_count" : 1
          },
          {
            "key" : "The Tempest",
            "doc_count" : 1
          }
        ]
      }
    }
  }
}

2.3 Filters Aggregation

Filters Aggregation相比Filter Aggregation，可以使用多個過濾器。例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "filters": {
        "filters": [
          {"match": { "text_entry": "apple" } }
        ]
      }, 
      "aggs": {
        "player": {
          "terms": {
            "field": "play_name.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

filters數組中可以定義多個過濾器。

2.4 Range Aggregation

Range Aggregation是範圍聚合，用於反饋數據的分佈情況，例如對line_id按照0至10000，10000到50000，50000以上進行範圍聚合，結果如下：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "id_range": {
      "range": {
        "field": "line_id",
        "ranges": [
          { "from": 0, "to": 10000 },
          { "from": 10000, "to": 50000},
          { "from": 50000 }
        ]
      }
    }
  }
}

查詢結果：

{
  # 省略其它字段
  "aggregations" : {
    "id_range" : {
      "buckets" : [
        {
          "key" : "0.0-10000.0",
          "from" : 0.0,
          "to" : 10000.0,
          "doc_count" : 9909
        },
        {
          "key" : "10000.0-50000.0",
          "from" : 10000.0,
          "to" : 50000.0,
          "doc_count" : 39664
        },
        {
          "key" : "50000.0-*",
          "from" : 50000.0,
          "doc_count" : 60913
        }
      ]
    }
  }
}

Elasticsearch 7.x 常用指標聚合、桶聚合搜索RESTful API

1 指標聚合

1.1 Max Aggregation

1.2 Min Aggregation

1.3 Avg Aggregation

1.4 Sum Aggregation

1.5 Cardinality Aggregation

1.6 Stats Aggregation

1.7 Extended Stats Aggregation

1.8 Percentiles Aggregation

1.9 Value Count Aggregation

2 桶聚合

2.1 Terms Aggregation

2.2 Filter Aggregation

2.3 Filters Aggregation

2.4 Range Aggregation

軟件測試——黑盒測試基本方法

編譯原理——文法的基本概念

操作系統知識總結——進程

2020年蘑菇街春招Java後端開發實習崗面經（一面+二面+HR面）

ZooKeeper Leader服務器選舉流程

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結