ES中的聚合搜索可以理解爲關係型數據庫中的group by,將具有相同條件的數據分組,並分析每一組數據的不同表現。
high-level concepts
GET /cars/transactions/_search?search_type=countYou’ll notice that we used the
{
"aggs" : { 這是一個聚合查詢
"colors" : { 此聚合查詢的名字(自己定義)
"terms" : {
"field" : "color" 定義聚合條件。以color分組
}
}
}
}
count
search_type. Because
we don’t care about search results—the aggregation totals—the count
search_type
will be faster because it omits the fetch phase.在講query 執行時,elasticsearch會分爲兩個階段,query階段,fetch階段。我們並不需要查詢結果,只需要知道統計結果,所以省去了fetch階段,search_type=count使聚合查詢更高效
{
...
"hits": {
"hits": [] 沒有數據是因爲我們search_type=count 並沒有fetch階段
},
"aggregations": {
"colors": { 你定義的聚合查詢的名字
"buckets": [
{
"key": "red", 紅色分組
"doc_count": 4 符合此條件的文檔數
},
{
"key": "blue",
"doc_count": 2
},
{
"key": "green",
"doc_count": 2
}
]
}
}
}
adding a metric to the mix
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": { 最外層是aggs,用來包裹住我們的統計條件
"avg_price": { 統計名稱
"avg": {
"field": "price" 我們將計算每組的price平均值
}
}
}
}
}
}
buckets inside buckets
分組數據的嵌套,group by color,make 先按 color分組,再按make分組
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { 注意它的順序。他統計的平均值,是緊接的上一個條件的統計值
"avg": {
"field": "price"
}
},
"make": {
"terms": {
"field": "make"
}
}
}
}
}
}
one final modification
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { "avg": { "field": "price" }
},
"make" : {
"terms" : {
"field" : "make"
},
"aggs" : { 添加第二個聚合統計 統計的是以color和make分組後的數據
"min_price" : { "min": { "field": "price"} }, 最低價格
"max_price" : { "max": { "field": "price"} } 最高價格
}
}
}
}
}
}
building bar charts 創建柱形圖
{As you can see, our query is built around the
"aggs":{
"price":{
"histogram":{
"field": "price",
"interval": 20000 間隔2000 所得出來的結果是[0-19999,20000-399999,40000-59999,60000-79999]
},
"aggs":{
"revenue": {
"sum": {
"field" : "price"
}
}
}
}
}
}
price
aggregation,
which contains a histogram
bucket.
This bucket requires a numeric field to calculate buckets on, and an interval size. The interval defines how "wide" each bucket is. An interval of 20000 means we will have the ranges [0-19999,
20000-39999, ...]
.If search
is the most popular activity in Elasticsearch, building date histograms must be the second most popular. Why
would you want to use a date histogram?
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"sales": {
"date_histogram": {
"field": "sold",
"interval": "month",
"format": "yyyy-MM-dd"
}
}
}
}
returning empty buckets
Yep, that’s right. We are missing a few months! By default, thedate_histogram
(and histogram
too)
returns only buckets that have a nonzero document count.某些月份缺失了,因爲沒有數據,但更多的時候我們需要顯示,即使沒有數據。
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"sales": {
"date_histogram": {
"field": "sold",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count" : 0, 既然全部的月份都顯示出來了爲什麼還要定義min_doc_count呢?原因:but by default Elasticsearch will return only buckets that are between the minimum and maximum value in your data.默認只返回最大值最小值啊
"extended_bounds" : { this parameter forces the entire year to be returned 全部的月份都要顯示出來
"min" : "2014-01-01",
"max" : "2014-12-31"
}
}
}
}
}
extended example
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"sales": {
"date_histogram": {
"field": "sold",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "2014-01-01",
"max" : "2014-12-31"
}
},
"aggs": {
"per_make_sum": {
"terms": {
"field": "make"
},
"aggs": {
"sum_price": {
"sum": { "field": "price" }
}
}
},
"total_sum": {
"sum": { "field": "price" }
}
}
}
}
}
scoping aggregations
GET /cars/transactions/_searchquery與aggs是同級別的
{
"query" : {
"match" : {
"make" : "ford"
}
},
"aggs" : {
"colors" : {
"terms" : {
"field" : "color"
}
}
}
}
global bucket
GET /cars/transactions/_search?search_type=count
{
"query" : {
"match" : {
"make" : "ford"
}
},
"aggs" : {
"single_avg_price": {
"avg" : { "field" : "price" } all doc match ford
},
"all": {
"global" : {}, global bucket has no parameters
"aggs" : {
"avg_price": {
"avg" : { "field" : "price" } 這個操作針對所有的數據,而不是match ford的數據
}
}
}
}
}
filtered query
GET /cars/transactions/_search?search_type=count
{
"query" : {
"filtered": {
"filter": {
"range": {
"price": {
"gte": 10000
}
}
}
}
},
"aggs" : {
"single_avg_price": {
"avg" : { "field" : "price" }
}
}
}
filter bucket
{
"query":{
"match": {
"make": "ford"
}
},
"aggs":{
"recent_sales": {
"filter": { 把filter用在aggs裏。
"range": {
"sold": {
"from": "now-1M"
}
}
},
"aggs": {
"average_price":{
"avg": {
"field": "price" 計算即符合match 又符合filter的price 平均值
}
}
}
}
}
}
post filter
You may be thinking to yourself, "hmm…is there a way to filter just the search results but not the aggregation?" The answer is to use apost_filter
.這個filter只對查詢數據有效,對聚合操作無效,請使用post_filter
GET /cars/transactions/_search?search_type=count
{
"query": {
"match": {
"make": "ford"
}
},
"post_filter": {
"term" : {
"color" : "green"
}
},
"aggs" : {
"all_colors": {
"terms" : { "field" : "color" }
}
}
}
recap
重點回顧在filtered中的filter 即會影響搜索結果,也會影響聚合結果
在aggs種的filter 只會影響聚合結果
在query中的post_filter只會影響搜索結果。
sorting multivalue buckets
對聚合結果進行排序,默認按照每個聚合結果中的doc_count降序排序。
intrinsic sorts
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"_count" : "asc" 按照doc_count 升序排序
}
}
}
}
}
We introduce an order
object into
the aggregation, which allows us to sort on one of several values:
-
_count
- Sort by document count. Works with
terms
,histogram
,date_histogram
. _term
- Sort by the string value of a term alphabetically. Works only with
terms
. _key
- Sort by the numeric value of each bucket’s key (conceptually similar to
_term
). Works only withhistogram
anddate_histogram
.
sorting by a metric
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"avg_price" : "asc"
}
},
"aggs": {
"avg_price": {
"avg": {"field": "price"}
}
}
}
}
}
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"stats.variance" : "asc"
}
},
"aggs": {
"stats": {
"extended_stats": {"field": "price"}This lets you override the sort order with any metric, simply by referencing the name of the metric. Some metrics, however, emit multiple values. Theextended_stats
metric is a good example: it provides half a dozen individual metrics.
}
}
}
}
}
sorting based on "deep" metrics
finding distinct counts
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"distinct_colors" : {
"cardinality" : {
"field" : "color"
}
}
}
}