metric aggregation
- 單值分析:只輸出一個分析結果
- max,min,avg,sum
- Cardinality 去重,類似與distinct count
- 多值分析:輸出多個分析結果
- stats,extended stats
- percentile,percentile rank
- top hits
POST employees/_search
{
"size":0,
"aggs": {
"max_salary": {
"max": {
"field": "salary"
}
},
"min_salary": {
"min": {
"field": "salary"
}
},
"avg_salary": {
"avg": {
"field": "salary"
}
}
}
}
bucket
按照一定的規則,將文檔分配到不同的桶中,從而達到分類的目的,ES提供的一些常見的Bucket Aggregation
- Terms
- 數字類型
- Range / Data Range
- Histogram /Date Histogram
支持嵌套,也就是說在桶裏在做分桶
terms aggregation
- 字段需要打開fielddata,才能進行Terms Aggregation
- keyword默認支持doc_values
- text需要在mapping中enable,會按照分詞後的結果進行分
POST employees/_search
{
"size": 0,
"aggs":{
"jobs":{
"terms":{
"field": "job.keyword"
}
}
}
}
POST employees/_search
{
"size":0,
"aggs":{
"jobs":{
"terms": {
"field": "job"
}
}
}
}
# 對text字段打開fielddata,才支持terms aggregation
PUT /employees/_mapping
{
"properties":{
"jobs":{
"type":"text",
"fielddata":true
}
}
}
對job.keyword和job進行terms聚合,分桶的總數並不一樣
POST employees/_search
{
"size":0,
"aggs":{
"cardinate":{
"cardinate": {
"field": "job.keyword"
}
}
}
}
POST employees/_search
{
"size":0,
"aggs":{
"cardinate":{
"cardinate": {
"field": "job"
}
}
}
}
執行上面兩個發現結果不一致,使用爲,對job進行了分詞,cardinate去重後結果不一樣
優化terms聚合性能,可以打開eager_global_ordinals,
應用場景:頻繁的需要聚合,對性能要求高,不斷有新的文檔添加
PUT index
{
"mappings": {
"properties": {
"foo":{
"type": "key",
"eager_global_ordinals":true
}
}
}
}
直方圖分桶
# 工資0到兩萬,一5000爲一個區間分桶
POST /employees/_search
{
"size": 0,
"aggs":{
"salary_histrogram":{
"histogram": {
"field": "salary",
"interval": 5000,
"extended_bounds": {
"min": 0,
"max": 20000
}
}
}
}
}
嵌套
# 嵌套聚合1 按照工作職位進行分桶,並統計工資信息
POST emplogyees/_search
{
"size": 0,
"aggs":{
"job_salary_stats":{
"terms": {
"field": "job.keyword"
},
"aggs":{
"salary":{
"stats": {
"field": "salary"
}
}
}
}
}
}
POST emplogyees/_search
{
"size": 0,
"aggs":{
"job_salary_stats":{
"terms": {
"field": "job.keyword"
},
"aggs":{
"gender_stats":{
"terms": {
"field": "gender"
}
},
"aggs":{
"salary_stats":{
"stats": {
"field": "salary"
}
}
}
}
}
}
}
Pipeline
- min_bucket :求之前結果中最小的值,通過關鍵字buckets_path指定路徑
# 在員工工種中,找出平均工資最低的工種
POST /employees/_search
{
"size": 0,
"aggs":{
"jobs":{
"terms": {
"field": "job.keyword"
},
"aggs":{
"avg_salary":{
"avg":{
"field": "salary"
}
}
},
"min_salary_by_job":{
"min_bucket":{
"buckets_path":"jobs>avg_salary"
}
}
}
}
}
- avg_bucket 在之前結果中找到平均值
- stats_bucket 統計值
- percentiles_bucket 百分數統計
parent pipeline
- derivative
- cumulative_sum
- moving_fn
聚合的作用範圍
ES聚合分析默認的作用範圍是query的查詢結果集。
ES還支持filter、post filter 、global三種作用範圍
聚合分析中的排序
在聚合分析中加入order關鍵字
#排序 order
#count and key
POST employees/_search
{
"size": 0,
"query": {
"range": {
"age": {
"gte": 20
}
}
},
"aggs": {
"jobs": {
"terms": {
"field":"job.keyword",
"order":[
{"_count":"asc"},
{"_key":"desc"}
]
}
}
}
}