ELK系列——terms分組後的結果數據再分組（七）

原創

猛波波

2020-02-21 06:50

前言：最近遇到了一些需求，需要統計分組後的結果數據再分組的數據，查遍資料絞盡腦汁的想解決方案。可也沒有一個很好地解決方案，但最後也還是找到一個不算太好但能解決問題的辦法。分享給大家。

需求背景：有一批設備可以播放廣告，es存了設備的播放記錄，播一次一條記錄。

大概需求是這樣：

1，統計每個設備播了多少次。（這個so easy啦，直接terms分組不就可以啦，但是有個問題，es默認terms的結果只展示value最多的10000條。）

2，統計每種播放次數下有多少設備，也就是播放1次的有多少設備，2次的有多少。（這個需求我想了很久，是否能通過一次語句實現，但最後還是沒想出來。）

第一個需求的解決方案：利用composite + after實現（類似於es導出原數據的scroll）。

GET advertisement-2019.11.11/_search
{
  "aggs": {
    "groupcid": {
      "composite": {
        "size": 10, 
        "sources": [
          {
            "id": {
              "terms": {
                "field": "device_uuid"
              }
            }
          }
        ]
      }
    }
  },
  "size": 0
}

首次導不需要指定after

結果：

{
  "took" : 1989,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "groupcid" : {
      "after_key" : {
        "id" : "110100303038473932000c4f6b18a0f7"
      },
      "buckets" : [
        {
          "key" : {
            "id" : "002b0020504d530b363531340100267d"
          },
          "doc_count" : 7
        },
        {
          "key" : {
            "id" : "003d0020504d53111b3236370100267d"
          },
          "doc_count" : 20
        },
        {
          "key" : {
            "id" : "004800204d415717463437330100267d"
          },
          "doc_count" : 3
        },
        {
          "key" : {
            "id" : "010004393438470f803737370150267d"
          },
          "doc_count" : 18
        },
        {
          "key" : {
            "id" : "0a0004373138470b003932370100267d"
          },
          "doc_count" : 15
        },
        {
          "key" : {
            "id" : "0a0011373138470b003932370100267d"
          },
          "doc_count" : 22
        },
        {
          "key" : {
            "id" : "10000a393138470f003534360003267d"
          },
          "doc_count" : 2
        },
        {
          "key" : {
            "id" : "1101003030384739320007590717a0a7"
          },
          "doc_count" : 20
        },
        {
          "key" : {
            "id" : "110100303038473932000bcd53cf311d"
          },
          "doc_count" : 6
        },
        {
          "key" : {
            "id" : "110100303038473932000c4f6b18a0f7"
          },
          "doc_count" : 18
        }
      ]
    }
  }
}

可以看到，es按照分組字段進行了排序，且帶有了一個after_key，這個key就是下次請求要帶上的開始的key。

第二次查詢：

GET advertisement-2019.11.11/_search
{
  "aggs": {
    "groupcid": {
      "composite": {
        "size": 10, 
        "sources": [
          {
            "id": {
              "terms": {
                "field": "device_uuid"
              }
            }
          }
        ],
        "after": {
          "id" : "110100303038473932000c4f6b18a0f7"
        }
      }
    }
  },
  "size": 0
}

看到這大家應該明白了吧，但是想要批量拉，手動是不可能的，肯定是需要代碼來做的。

第二個需求的解決方案：其實這個需求很明確也很簡單，無非就是把第一個需求的結果再分組就ok。但我查了很多es的api，都沒發現能一步到位的方法（如果有人知道，歡迎分享）。無奈只能想辦法把第一個需求的結果集存起來再分組，這就是解決方案。

那麼寫一下第一個需求的java實現吧，實現批量拉並寫到map集合。

public void test() throws Exception{
        long total = 0L;
        TreeMap<Long, Long> hashMap = new TreeMap<>();
        CompositeAggregation compositeAgg = deal(null);
        Map<String, Object> afterKey = compositeAgg.afterKey();

        for (int i = 0; i < compositeAgg.getBuckets().size(); i++) {
            long docCount = compositeAgg.getBuckets().get(i).getDocCount();
            total += docCount;
            if (hashMap.containsKey(docCount)){
                hashMap.put(docCount,hashMap.get(docCount)+1);
            }else {
                hashMap.put(docCount,1L);
            }
        }
		
		// 一直循環處理，直到沒有數據
        while (true){
            CompositeAggregation compositeAggregation = deal(afterKey);
            afterKey = compositeAggregation.afterKey();

            System.out.println(compositeAggregation.getBuckets().size());
            for (int i = 0; i < compositeAggregation.getBuckets().size(); i++) {
                long docCount = compositeAggregation.getBuckets().get(i).getDocCount();
                total += docCount;
                if (hashMap.containsKey(docCount)){
                    hashMap.put(docCount,hashMap.get(docCount)+1);
                }else {
                    hashMap.put(docCount,1L);
                }
            }

            if (null == afterKey || null == afterKey.get("id")){
                break;
            }
        }
        System.out.println(hashMap.size());
        System.out.println(total);
        Iterator<Map.Entry<Long, Long>> iterator = hashMap.entrySet().iterator();
        while (iterator.hasNext()){
            Map.Entry<Long, Long> next = iterator.next();
            System.out.println(next.getKey()+"-"+next.getValue());
        }
    }

    private CompositeAggregation deal(Map afterKey) throws Exception{
        SearchRequest searchRequest = new SearchRequest("advertisement-2019.11.11");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        List<CompositeValuesSourceBuilder<?>> sources
                = Collections.singletonList(new TermsValuesSourceBuilder("id").field("device_uuid").missingBucket(true).order("asc"));
				
	    CompositeAggregationBuilder composite = null;
        // 每次處理7000條數據
        if(afterKey == null){
            composite = AggregationBuilders.composite("composite", sources).size(7000);
        }else{
            composite = AggregationBuilders.composite("composite", sources).aggregateAfter(afterKey).size(7000);
        }
       
        searchSourceBuilder.aggregation(composite);
        searchRequest.source(searchSourceBuilder);

        SearchResponse search = esFactory.getClient().search(searchRequest,RequestOptions.DEFAULT);

        CompositeAggregation compositeAgg = search.getAggregations().get("composite");

        return compositeAgg;
    }

猛波波

發佈了33 篇原創文章 · 獲贊 7 · 訪問量 1萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

ELK系列——terms分組後的結果數據再分組（七）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

再談23種設計模式（3）：行爲型模式（學習筆記）

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

kafka系列——kafka集羣搭建（一）

object field starting or ending with a [.] makes object resolution ambiguous: []

emqx服務器的集羣搭建（三）

kafka系列——kafka實用命令（二）

emqx服務器的權限驗證（四）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結