Elasticsearch中將Doc根據A字段排序獲得第一個Doc的B字段值的方法

注:本文基於Elasticsearch 6.1.2編寫

最近遇到這樣一個需求,要通過Elasticsearch將Doc根據A字段降序,然後獲得B字段的值,最終根據B字段的值再去做Pipeline Aggregation

先嚐試了Max Aggregation,但是Max Aggregation只能獲得A字段的最大值。

然後嘗試了Top Hits Aggregation,但是Top Hits Aggregation的結果無法被Pipeline Aggregation使用。

最終嘗試Scripted Metric Aggregation成功。下面舉例說明

比如現在我們有一堆股票價格數據,我們現在需要獲得股票每天的收盤價比前一天的差值(Delta)。下面先倒入一段股票數據,date字段代表時間戳,price字段代表當時的價格:

POST /_bulk

{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-01T10:00:00","price":10}
{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-01T10:30:00","price":15}
{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-02T10:00:00","price":20}
{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-02T10:30:00","price":19}
{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-03T10:00:00","price":30}
{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-03T10:30:00","price":35}
{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-04T10:00:00","price":40}
{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-04T10:30:00","price":20}
{"index":{"_index":"stock-price","_type":"data"}}
{"date":"2018-01-05T10:00:00","price":10}

先分解一下看這個查詢如何實現:

  1. 把股票數據按照“天”分bucket,這個會用到Date Histogram Aggregation
  2. 獲得每個bucket裏的最後一次的價格數據,這個會用到Scripted Metric Aggregation
  3. 最後根據算每個bucket的差值,這個會用到Serial Differencing Aggregation

下面是查詢代碼:

GET /stock-price/_search

{
  "size": 0,
  "aggs": {
    "minute_histo": {
      "date_histogram": {
        "field": "date",
        "interval": "day"
      },
      "aggs": {
        "latest_price": {
          "scripted_metric": {
            "init_script": "params._agg.tmp_rs = ['latest_date': 0, 'latest_price' : -1];",
            "map_script": "def tmp_rs = params._agg.tmp_rs; boolean newer = doc['date'].value.millis > tmp_rs['latest_date']; if (newer) { tmp_rs['latest_date'] = doc['date'].value.millis; tmp_rs['latest_price'] = doc.price.value; }",
            "combine_script": "return params._agg.tmp_rs;",
            "reduce_script": "long rs_date = 0; long rs_price = -1; for (a in params._aggs) {  if (a == null) { continue; } boolean newer = a['latest_date'] > rs_date;   if (newer) {     rs_date = a['latest_date'];     rs_price = a['latest_price'];   } } return rs_price;"
          }
        },
        "delta_price": {
          "serial_diff": {
            "buckets_path": "latest_price.value",
            "lag": 1
          }
        }
      }
    }
  }
}

最後得到的結果是:

{
  ...
  "aggregations": {
    "minute_histo": {
      "buckets": [
        {
          "key_as_string": "2018-01-01T00:00:00.000Z",
          "key": 1514764800000,
          "doc_count": 2,
          "latest_price": {
            "value": 15
          }
        },
        {
          "key_as_string": "2018-01-02T00:00:00.000Z",
          "key": 1514851200000,
          "doc_count": 2,
          "latest_price": {
            "value": 19
          },
          "delta_price": {
            "value": 4.0
          }
        },
        {
          "key_as_string": "2018-01-03T00:00:00.000Z",
          "key": 1514937600000,
          "doc_count": 2,
          "latest_price": {
            "value": 35
          },
          "delta_price": {
            "value": 16.0
          }
        },
        {
          "key_as_string": "2018-01-04T00:00:00.000Z",
          "key": 1515024000000,
          "doc_count": 2,
          "latest_price": {
            "value": 20
          },
          "delta_price": {
            "value": -15.0
          }
        },
        {
          "key_as_string": "2018-01-05T00:00:00.000Z",
          "key": 1515110400000,
          "doc_count": 1,
          "latest_price": {
            "value": 10
          },
          "delta_price": {
            "value": -10.0
          }
        }
      ]
    }
  }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章