(一)概述
在前面關於ES的一系列文章中,已經介紹了ES的概念、常用操作、JavaAPI以及實際的一個小demo,但是在真實的應用場景中,還有可能會有更高階的一些用法,今天主要介紹兩種相對來說會更難一些的操作,聚合查詢。該文檔基於ElasticSearch7.6,將介紹restful查詢語法以及JavaApi。
閱讀本文需要你有ElasticSearch的基礎。
(二)前期數據準備
這裏準備了包含姓名、年齡、教室、性別和成績五個字段的數據
PUT /test4
{
"mappings" : {
"properties" : {
"name" : {
"type" : "text"
},
"age":{
"type": "integer"
},
"classroom":{
"type": "keyword"
},
"gender":{
"type": "keyword"
},
"grade":{
"type": "integer"
}
}
}
}
PUT /test4/_bulk
{"index": {"_id": 1}}
{"name":"張三","age":18,"classroom":"1","gender":"男","grade":80}
{"index": {"_id": 2}}
{"name":"李四","age":20,"classroom":"2","gender":"男","grade":60}
{"index": {"_id": 3}}
{"name":"王五","age":20,"classroom":"2","gender":"女","grade":70}
{"index": {"_id": 4}}
{"name":"趙六","age":19,"classroom":"1","gender":"女","grade":90}
{"index": {"_id": 5}}
{"name":"毛七","age":20,"classroom":"1","gender":"男","grade":90}
(三)聚合查詢
ES中的聚合操作提供了強大的分組及數理計算的能力,ES中聚合從大體上可以分爲四種方式:
1、Metrics Aggregation 提供了諸如Max,Min,Avg的數值計算能力
2、Bucket Aggregation 提供了分桶的能力,簡單來講就是將一類相同的數據聚合到一起
3、Pipeline Aggregation 管道聚合,對其他聚合進行二次聚合
4、Matrix Aggregation 對多個字段進行操作並返回矩陣結果
ES官網提供了全部聚合查詢文檔,這篇文章將介紹常用的幾種聚合查詢的語法以及JavaApi:
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.6/java-rest-high-aggregation-builders.html#_metrics_aggregations
(四)Metrics Aggregation
4.1 AVG
avg用於計算聚合文檔中提取的數值的平均值,restful查詢語法如下:
POST /test4/_search
{
"aggs": {
"avg_grade": {
"avg": {
"field": "grade"
}
}
}
}
查詢得到的結果如下:
接着是JavaApi,核心在於使用AggregationBuilders的avg方法,第七行代碼對應於上面的操作。
@Test
public void testAvg() throws Exception {
//封裝了獲取RestHighLevelClient的方法
RestHighLevelClient client=ElasticSearchClient.getClient();
SearchRequest request = new SearchRequest("test4");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.aggregation(AggregationBuilders.avg("agg_grade").field("grade")).size(0);
request.source(searchSourceBuilder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
//注意這裏要把Aggregation類型轉化爲ParsedAvg類型
ParsedAvg aggregation = search.getAggregations().get("agg_grade");
System.out.println(aggregation.getValue()); //返回78.0
}
接下來就直接貼代碼了
4.2 Min
獲取聚合數據的最小值:
POST /test4/_search
{
"aggs": {
"min_grade": {
"min": {
"field": "grade"
}
}
}
}
@Test
public void testMin() throws Exception {
//封裝了獲取RestHighLevelClient的方法
RestHighLevelClient client=ElasticSearchClient.getClient();
SearchRequest request = new SearchRequest("test4");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.aggregation(AggregationBuilders.min("min_grade").field("grade")).size(0);
request.source(searchSourceBuilder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
ParsedMin aggregation = search.getAggregations().get("min_grade");
System.out.println(aggregation.getValue());
}
4.3 Max
獲取聚合數據的最大值:
POST /test4/_search
{
"aggs": {
"max_grade": {
"max": {
"field": "grade"
}
}
}
}
@Test
public void testMax() throws Exception {
//封裝了獲取RestHighLevelClient的方法
RestHighLevelClient client=ElasticSearchClient.getClient();
SearchRequest request = new SearchRequest("test4");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.aggregation(AggregationBuilders.max("max_grade").field("grade")).size(0);
request.source(searchSourceBuilder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
ParsedMax aggregation = search.getAggregations().get("max_grade");
System.out.println(aggregation.getValue());
}
4.4 Sum
獲取聚合數據的和:
POST /test4/_search
{
"aggs": {
"sum_grade": {
"sum": {
"field": "grade"
}
}
}
}
@Test
public void testSum() throws Exception {
//封裝了獲取RestHighLevelClient的方法
RestHighLevelClient client=ElasticSearchClient.getClient();
SearchRequest request = new SearchRequest("test4");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.aggregation(AggregationBuilders.sum("sum_grade").field("grade")).size(0);
request.source(searchSourceBuilder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
ParsedSum aggregation = search.getAggregations().get("sum_grade");
System.out.println(aggregation.getValue());
}
4.5 Stats
stats集成了上面的所有計算操作。
POST /test4/_search
{
"aggs": {
"stats_grade": {
"stats": {
"field": "grade"
}
}
}
}
@Test
public void testStats() throws Exception {
//封裝了獲取RestHighLevelClient的方法
RestHighLevelClient client=ElasticSearchClient.getClient();
SearchRequest request = new SearchRequest("test4");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.aggregation(AggregationBuilders.stats("sum_grade").field("grade")).size(0);
request.source(searchSourceBuilder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
ParsedStats aggregation = search.getAggregations().get("sum_grade");
System.out.println(aggregation.getMax());
System.out.println(aggregation.getAvg());
System.out.println(aggregation.getCount());
System.out.println(aggregation.getMin());
System.out.println(aggregation.getSum());
}
(五)Bucket Aggregation
桶聚合是按照某個字段將同類型的數據聚合爲一類,最常用對桶聚合就是terms聚合了。
5.1 terms
terms查詢類似於group by,返回查詢字段分組後的值以及數量,比如我對classroom字段terms查詢
POST /test4/_search
{
"aggs": {
"classroom_term": {
"terms": {
"field": "classroom"
}
}
}
}
返回值就是classroom的分組後的值以及每個組的數量:classroom是1的有3條記錄,classroom是2的有2條記錄
"aggregations" : {
"classroom_term" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 3
},
{
"key" : "2",
"doc_count" : 2
}
]
}
我們也可以對多個字段進行terms分組,比如我現在對classroom和gender兩個字段進行分組:
POST /test4/_search
{
"aggs": {
"classroom_term": {
"terms": {
"field": "classroom"
},
"aggs": {
"gender": {
"terms": {
"field": "gender"
}
}
}
}
}
}
最後對返回值就是classroom和gender分組後的值和數量:
classroom是1,gender是男有兩條;
classroom是1,gender是女有一條;
classroom是2,gender是男有一條;
classroom是2,gender是女有一條;
"aggregations" : {
"classroom_term" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 3,
"gender" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "男",
"doc_count" : 2
},
{
"key" : "女",
"doc_count" : 1
}
]
}
},
{
"key" : "2",
"doc_count" : 2,
"gender" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "女",
"doc_count" : 1
},
{
"key" : "男",
"doc_count" : 1
}
]
}
}
]
}
對應的JavaApi使用如下:
@Test
public void testTerms() throws Exception {
//封裝了獲取RestHighLevelClient的方法
RestHighLevelClient client=ElasticSearchClient.getClient();
SearchRequest request = new SearchRequest("test4");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.aggregation(AggregationBuilders.terms("classroom_term").field("classroom")
.subAggregation(AggregationBuilders.terms("gender").field("gender")));
request.source(searchSourceBuilder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
//獲取數據時首先對classroom分桶,再對gender分桶
Terms classroomTerm = search.getAggregations().get("classroom_term");
for(Terms.Bucket classroomBucket:classroomTerm.getBuckets()){
Terms genderTerm=classroomBucket.getAggregations().get("gender");
for (Terms.Bucket genderBucket:genderTerm.getBuckets()){
System.out.println("classRoom:"+classroomBucket.getKeyAsString()+"gender:"+genderBucket.getKeyAsString()+"count:"+genderBucket.getDocCount());
}
}
}
這裏比較難理解對是獲取數據時的處理,聚合查詢時有個桶的概念,在獲取數據時需要遍歷獲取桶,以上面的代碼爲例,先獲取到classroom的桶,再遍歷classroom的桶獲取gender的桶,從桶中獲取到具體的內容。看下圖:
5.2 range
range查詢可以統計出每個數據區間內的數量:比如我要統計分數爲*~70,70~85,80~*的數據,就可以通過下面的方式:
POST /test4/_search
{
"aggs": {
"grade_range": {
"range": {
"field": "grade",
"ranges": [
{"to":70},
{"from":70,"to":85},
{"from":85}
]
}
}
}
}
JavaAPI如下:
@Test
public void testRange() throws Exception {
//封裝了獲取RestHighLevelClient的方法
RestHighLevelClient client=ElasticSearchClient.getClient();
SearchRequest request = new SearchRequest("test4");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.aggregation(AggregationBuilders.range("grade_range").field("grade")
.addUnboundedTo(70).addRange(70,85).addUnboundedFrom(85));
request.source(searchSourceBuilder);
SearchResponse search = client.search(request, RequestOptions.DEFAULT);
//獲取數據時首先對classroom分桶,再對gender分桶
Range gradeRange = search.getAggregations().get("grade_range");
for(Range.Bucket gradeBucket:gradeRange.getBuckets()){
System.out.println("key:"+gradeBucket.getKey()+"count:"+gradeBucket.getDocCount());
}
}
(六)總結
至此,關於ES的聚合查詢一些常用方法就講解完畢了,ES提供的其他更多方法可以直接在官方文檔中看,講解的十分詳細。我是魚仔,我們下期再見!
轉自:https://javayz.blog.csdn.net/article/details/119855339
ES6的資料https://segmentfault.com/a/1190000015220491