前言:最近遇到了一些需求,需要統計分組後的結果數據再分組的數據,查遍資料絞盡腦汁的想解決方案。可也沒有一個很好地解決方案,但最後也還是找到一個不算太好但能解決問題的辦法。分享給大家。
需求背景:有一批設備可以播放廣告,es存了設備的播放記錄,播一次一條記錄。
大概需求是這樣:
1,統計每個設備播了多少次。(這個so easy啦,直接terms分組不就可以啦,但是有個問題,es默認terms的結果只展示value最多的10000條。)
2,統計每種播放次數下有多少設備,也就是播放1次的有多少設備,2次的有多少。(這個需求我想了很久,是否能通過一次語句實現,但最後還是沒想出來。)
第一個需求的解決方案:利用composite + after實現(類似於es導出原數據的scroll)。
GET advertisement-2019.11.11/_search
{
"aggs": {
"groupcid": {
"composite": {
"size": 10,
"sources": [
{
"id": {
"terms": {
"field": "device_uuid"
}
}
}
]
}
}
},
"size": 0
}
首次導不需要指定after
結果:
{
"took" : 1989,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"groupcid" : {
"after_key" : {
"id" : "110100303038473932000c4f6b18a0f7"
},
"buckets" : [
{
"key" : {
"id" : "002b0020504d530b363531340100267d"
},
"doc_count" : 7
},
{
"key" : {
"id" : "003d0020504d53111b3236370100267d"
},
"doc_count" : 20
},
{
"key" : {
"id" : "004800204d415717463437330100267d"
},
"doc_count" : 3
},
{
"key" : {
"id" : "010004393438470f803737370150267d"
},
"doc_count" : 18
},
{
"key" : {
"id" : "0a0004373138470b003932370100267d"
},
"doc_count" : 15
},
{
"key" : {
"id" : "0a0011373138470b003932370100267d"
},
"doc_count" : 22
},
{
"key" : {
"id" : "10000a393138470f003534360003267d"
},
"doc_count" : 2
},
{
"key" : {
"id" : "1101003030384739320007590717a0a7"
},
"doc_count" : 20
},
{
"key" : {
"id" : "110100303038473932000bcd53cf311d"
},
"doc_count" : 6
},
{
"key" : {
"id" : "110100303038473932000c4f6b18a0f7"
},
"doc_count" : 18
}
]
}
}
}
可以看到,es按照分組字段進行了排序,且帶有了一個after_key,這個key就是下次請求要帶上的開始的key。
第二次查詢:
GET advertisement-2019.11.11/_search
{
"aggs": {
"groupcid": {
"composite": {
"size": 10,
"sources": [
{
"id": {
"terms": {
"field": "device_uuid"
}
}
}
],
"after": {
"id" : "110100303038473932000c4f6b18a0f7"
}
}
}
},
"size": 0
}
看到這大家應該明白了吧,但是想要批量拉,手動是不可能的,肯定是需要代碼來做的。
第二個需求的解決方案:其實這個需求很明確也很簡單,無非就是把第一個需求的結果再分組就ok。但我查了很多es的api,都沒發現能一步到位的方法(如果有人知道,歡迎分享)。無奈只能想辦法把第一個需求的結果集存起來再分組,這就是解決方案。
那麼寫一下第一個需求的java實現吧,實現批量拉並寫到map集合。
public void test() throws Exception{
long total = 0L;
TreeMap<Long, Long> hashMap = new TreeMap<>();
CompositeAggregation compositeAgg = deal(null);
Map<String, Object> afterKey = compositeAgg.afterKey();
for (int i = 0; i < compositeAgg.getBuckets().size(); i++) {
long docCount = compositeAgg.getBuckets().get(i).getDocCount();
total += docCount;
if (hashMap.containsKey(docCount)){
hashMap.put(docCount,hashMap.get(docCount)+1);
}else {
hashMap.put(docCount,1L);
}
}
// 一直循環處理,直到沒有數據
while (true){
CompositeAggregation compositeAggregation = deal(afterKey);
afterKey = compositeAggregation.afterKey();
System.out.println(compositeAggregation.getBuckets().size());
for (int i = 0; i < compositeAggregation.getBuckets().size(); i++) {
long docCount = compositeAggregation.getBuckets().get(i).getDocCount();
total += docCount;
if (hashMap.containsKey(docCount)){
hashMap.put(docCount,hashMap.get(docCount)+1);
}else {
hashMap.put(docCount,1L);
}
}
if (null == afterKey || null == afterKey.get("id")){
break;
}
}
System.out.println(hashMap.size());
System.out.println(total);
Iterator<Map.Entry<Long, Long>> iterator = hashMap.entrySet().iterator();
while (iterator.hasNext()){
Map.Entry<Long, Long> next = iterator.next();
System.out.println(next.getKey()+"-"+next.getValue());
}
}
private CompositeAggregation deal(Map afterKey) throws Exception{
SearchRequest searchRequest = new SearchRequest("advertisement-2019.11.11");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
List<CompositeValuesSourceBuilder<?>> sources
= Collections.singletonList(new TermsValuesSourceBuilder("id").field("device_uuid").missingBucket(true).order("asc"));
CompositeAggregationBuilder composite = null;
// 每次處理7000條數據
if(afterKey == null){
composite = AggregationBuilders.composite("composite", sources).size(7000);
}else{
composite = AggregationBuilders.composite("composite", sources).aggregateAfter(afterKey).size(7000);
}
searchSourceBuilder.aggregation(composite);
searchRequest.source(searchSourceBuilder);
SearchResponse search = esFactory.getClient().search(searchRequest,RequestOptions.DEFAULT);
CompositeAggregation compositeAgg = search.getAggregations().get("composite");
return compositeAgg;
}