注:本文基於Elasticsearch 6.1.2編寫
最近遇到這樣一個需求,要通過Elasticsearch將Doc根據A字段降序,然後獲得B字段的值,最終根據B字段的值再去做Pipeline Aggregation。
先嚐試了Max Aggregation,但是Max Aggregation只能獲得A字段的最大值。
然後嘗試了Top Hits Aggregation,但是Top Hits Aggregation的結果無法被Pipeline Aggregation使用。
最終嘗試Scripted Metric Aggregation成功。下面舉例說明
比如現在我們有一堆股票價格數據,我們現在需要獲得股票每天的收盤價比前一天的差值(Delta)。下面先倒入一段股票數據,date字段代表時間戳,price字段代表當時的價格:
POST /_bulk {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-01T10:00:00","price":10} {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-01T10:30:00","price":15} {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-02T10:00:00","price":20} {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-02T10:30:00","price":19} {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-03T10:00:00","price":30} {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-03T10:30:00","price":35} {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-04T10:00:00","price":40} {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-04T10:30:00","price":20} {"index":{"_index":"stock-price","_type":"data"}} {"date":"2018-01-05T10:00:00","price":10}
先分解一下看這個查詢如何實現:
- 把股票數據按照“天”分bucket,這個會用到Date Histogram Aggregation
- 獲得每個bucket裏的最後一次的價格數據,這個會用到Scripted Metric Aggregation
- 最後根據算每個bucket的差值,這個會用到Serial Differencing Aggregation
下面是查詢代碼:
GET /stock-price/_search { "size": 0, "aggs": { "minute_histo": { "date_histogram": { "field": "date", "interval": "day" }, "aggs": { "latest_price": { "scripted_metric": { "init_script": "params._agg.tmp_rs = ['latest_date': 0, 'latest_price' : -1];", "map_script": "def tmp_rs = params._agg.tmp_rs; boolean newer = doc['date'].value.millis > tmp_rs['latest_date']; if (newer) { tmp_rs['latest_date'] = doc['date'].value.millis; tmp_rs['latest_price'] = doc.price.value; }", "combine_script": "return params._agg.tmp_rs;", "reduce_script": "long rs_date = 0; long rs_price = -1; for (a in params._aggs) { if (a == null) { continue; } boolean newer = a['latest_date'] > rs_date; if (newer) { rs_date = a['latest_date']; rs_price = a['latest_price']; } } return rs_price;" } }, "delta_price": { "serial_diff": { "buckets_path": "latest_price.value", "lag": 1 } } } } } }
最後得到的結果是:
{ ... "aggregations": { "minute_histo": { "buckets": [ { "key_as_string": "2018-01-01T00:00:00.000Z", "key": 1514764800000, "doc_count": 2, "latest_price": { "value": 15 } }, { "key_as_string": "2018-01-02T00:00:00.000Z", "key": 1514851200000, "doc_count": 2, "latest_price": { "value": 19 }, "delta_price": { "value": 4.0 } }, { "key_as_string": "2018-01-03T00:00:00.000Z", "key": 1514937600000, "doc_count": 2, "latest_price": { "value": 35 }, "delta_price": { "value": 16.0 } }, { "key_as_string": "2018-01-04T00:00:00.000Z", "key": 1515024000000, "doc_count": 2, "latest_price": { "value": 20 }, "delta_price": { "value": -15.0 } }, { "key_as_string": "2018-01-05T00:00:00.000Z", "key": 1515110400000, "doc_count": 1, "latest_price": { "value": 10 }, "delta_price": { "value": -10.0 } } ] } } }