Elasticsearch 7.x 常用指標聚合、桶聚合搜索RESTful API

Elasticsearch的聚合搜索用於對數據做一些複雜的分析統計,主要分爲指標聚合桶聚合管道聚合矩陣聚合。其中指標聚合桶聚合最常使用。

本文測試數據採用官方測試數據庫shakespeare(莎士比亞),可在Elasticsearch官網中下載到。此外本文內容均參考官方文檔內容。

1 指標聚合

1.1 Max Aggregation

Max Aggregation用於查找最大值,例如查找shakespeare索引中line_id最大的文檔:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "max_line_id": {
      "max": {
        "field": "line_id"
      }
    }
  }
}

max_line_id爲結果名,也可以爲其它字符串,max_line_id下面的鍵爲聚合方式,其max代表爲Max Aggregation聚合,並需要指定field爲需要進行聚合的文檔字段。
類似於MySQL中的select max(line_id) from shakespeare
查詢結果爲:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_line_id" : {
      "value" : 111396.0
    }
  }
}

其查詢結果位於aggregations中,即最大值爲111396。

1.2 Min Aggregation

和Max Aggregation相反,Min Aggregation用於查找最小值,例如查找shakespeare索引中line_id最小的文檔:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "min_line_id": {
      "min": {
        "field": "line_id"
      }
    }
  }
}

最後查詢結果同樣在aggregations中。

1.3 Avg Aggregation

Avg Aggregation用於計算平均數,例如計算shakespeare索引中line_id字段的平均數:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "avg_line_id": {
      "avg": {
        "field": "line_id"
      }
    }
  }
}

查詢結果同樣在aggregations中。

1.4 Sum Aggregation

Sum Aggregation用於計算總和,例如計算shakespeare索引中line_id字段的平均數:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "sum_line_id": {
      "sum": {
        "field": "line_id"
      }
    }
  }
}
1.5 Cardinality Aggregation

Cardinality Aggregation用於基數統計,其作用是先執行類似SQL中的distinct去重操作,然後統計其集合長度。例如下列查詢中會統計出所有角色的數量:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "player_sum": {
      "cardinality": {
        "field": "play_name.keyword"
      }
    }
  }
}

查詢結果:

{
  # 省略其它字段
  "aggregations" : {
    "player_sum" : {
      "value" : 36
    }
  }
}

表示有36個角色。

1.6 Stats Aggregation

Stats Aggregation即基本統計,會返回countmaxminavgsum統計數據,例如查詢line_id相關數據:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_stats": {
      "stats": {
        "field": "line_id"
      }
    }
  }
}

查詢結果:

{
  # 省略其它字段
  "aggregations" : {
    "line_id_stats" : {
      "count" : 110486,
      "min" : 4.0,
      "max" : 111396.0,
      "avg" : 55715.89386890647,
      "sum" : 6.15582625E9
    }
  }
}
1.7 Extended Stats Aggregation

Extended Stats Aggregation比Stats Aggregation多了4個字段:平方和、方差、標準差、平均值加減兩個標準差的區間,例如:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_stats": {
      "extended_stats": {
        "field": "line_id"
      }
    }
  }
}

查詢結果:

{
  # 省略其它字段
  "aggregations" : {
    "line_id_stats" : {
      "count" : 110486,
      "min" : 4.0,
      "max" : 111396.0,
      "avg" : 55715.89386890647,
      "sum" : 6.15582625E9,
      "sum_of_squares" : 4.57201930511864E14,
      "variance" : 1.0338374861198297E9,
      "std_deviation" : 32153.34331169668,
      "std_deviation_bounds" : {
        "upper" : 120022.58049229984,
        "lower" : -8590.792754486894
      }
    }
  }
}
1.8 Percentiles Aggregation

Percentiles Aggregation用於百分位統計,具體操作是將某個字段的數據從大到小排序,並計算相應的累計百分位,某一百分位所對應的數據的值就是這一百分位的百分位數。例如:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_percent": {
      "percentiles": {
        "field": "line_id",
        "percents": [1, 5, 25, 50, 75, 95, 99]
      }
    }
  }
}

查詢結果:

{
  # 省略其它字段
  "aggregations" : {
    "line_id_percent" : {
      "values" : {
        "1.0" : 1115.3600000000001,
        "5.0" : 5575.834045307443,
        "25.0" : 27887.286615736997,
        "50.0" : 55711.257765161325,
        "75.0" : 83561.89545235902,
        "95.0" : 105830.47105865781,
        "99.0" : 110287.32171428572
      }
    }
  }
}
1.9 Value Count Aggregation

Value Count Aggregation可按字段統計文檔數量,例如下面統計包含line_id字段的文檔數量:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_count": {
      "value_count": {
        "field": "line_id"
      }
    }
  }
}

查詢結果:

{
  # 省略其它字段
  "aggregations" : {
    "line_id_count" : {
      "value" : 110486
    }
  }
}

2 桶聚合

桶聚合類似於SQL中的GROUP BY,即遍歷文檔內容,根據的文檔內容將其放到不同的桶中。

2.1 Terms Aggregation

Terms Aggregation用於分組聚合,例如根據play_name字段對不同的文檔進行分組,然後統計每組文檔的數量,相當於select count(*) from shakespeare group by play_name。例如:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "terms": {
        "field": "play_name.keyword",
        "size": 10
      }
    }
  }
}

field相當於GROUP BY後面指定的字段,size字段表示僅查詢出數量前10的桶。
查詢結果:

{
  # 省略其它字段
  "aggregations" : {
    "per_player" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 72631,
      "buckets" : [
        {
          "key" : "Hamlet",
          "doc_count" : 4219
        },
        {
          "key" : "Coriolanus",
          "doc_count" : 3958
        },
        {
          "key" : "Cymbeline",
          "doc_count" : 3927
        },
        {
          "key" : "Richard III",
          "doc_count" : 3911
        },
        {
          "key" : "Antony and Cleopatra",
          "doc_count" : 3815
        },
        {
          "key" : "Othello",
          "doc_count" : 3742
        },
        {
          "key" : "King Lear",
          "doc_count" : 3735
        },
        {
          "key" : "Troilus and Cressida",
          "doc_count" : 3682
        },
        {
          "key" : "A Winters Tale",
          "doc_count" : 3469
        },
        {
          "key" : "Henry VIII",
          "doc_count" : 3397
        }
      ]
    }
  }
}
2.2 Filter Aggregation

Filter Aggregation爲過濾器聚合搜索,可以把符合過濾器中條件的文檔劃分到不同的桶中。例如:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "filter": {
        "term": {
          "text_entry": "apple"
        }
      },
      "aggs": {
        "player": {
          "terms": {
            "field": "play_name.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

上述查詢可以找出text_entry包含單詞apple的文檔,並按play_name進行分組統計。
查詢結果:

{
  # 省略其它字段
  "aggregations" : {
    "per_player" : {
      "doc_count" : 10,
      "player" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "Taming of the Shrew",
            "doc_count" : 2
          },
          {
            "key" : "Twelfth Night",
            "doc_count" : 2
          },
          {
            "key" : "A Midsummer nights dream",
            "doc_count" : 1
          },
          {
            "key" : "Henry IV",
            "doc_count" : 1
          },
          {
            "key" : "King Lear",
            "doc_count" : 1
          },
          {
            "key" : "Loves Labours Lost",
            "doc_count" : 1
          },
          {
            "key" : "Merchant of Venice",
            "doc_count" : 1
          },
          {
            "key" : "The Tempest",
            "doc_count" : 1
          }
        ]
      }
    }
  }
}
2.3 Filters Aggregation

Filters Aggregation相比Filter Aggregation,可以使用多個過濾器。例如:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "filters": {
        "filters": [
          {"match": { "text_entry": "apple" } }
        ]
      }, 
      "aggs": {
        "player": {
          "terms": {
            "field": "play_name.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

filters數組中可以定義多個過濾器。

2.4 Range Aggregation

Range Aggregation是範圍聚合,用於反饋數據的分佈情況,例如對line_id按照0至10000,10000到50000,50000以上進行範圍聚合,結果如下:

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "id_range": {
      "range": {
        "field": "line_id",
        "ranges": [
          { "from": 0, "to": 10000 },
          { "from": 10000, "to": 50000},
          { "from": 50000 }
        ]
      }
    }
  }
}

查詢結果:

{
  # 省略其它字段
  "aggregations" : {
    "id_range" : {
      "buckets" : [
        {
          "key" : "0.0-10000.0",
          "from" : 0.0,
          "to" : 10000.0,
          "doc_count" : 9909
        },
        {
          "key" : "10000.0-50000.0",
          "from" : 10000.0,
          "to" : 50000.0,
          "doc_count" : 39664
        },
        {
          "key" : "50000.0-*",
          "from" : 50000.0,
          "doc_count" : 60913
        }
      ]
    }
  }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章