在elasticsearch中對數據進行聚合,想得出每個來源的數據量,mapping:
"news_source": {
"type": "text"
},
"related_freq": {
"type": "integer"
},
查詢條件:
GET /event_news/_search
{
"size": 0,
"aggs": {
"news_source_info":{
"terms": {
"field": "news_source"
},
"aggs": {
"total_sum": {
"sum": {"field": "related_freq"}
}
}
}
}
}
結果報錯
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_abstract] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "event_news",
"node": "9nWATfvgTJiLmbYg-RpPQw",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
}
},
"status": 400
}
恩,需要將字段fielddata設置爲true纔可以,進行修改:
PUT /event_news/_mapping
{"properties":{"news_source":{"type":"text","fielddata":true}}}
現在的mapping
"news_source": {
"type": "text",
"fielddata": true
},
"related_freq": {
"type": "integer"
},
再次查詢,結果:
{
"aggregations" : {
"news_source_info" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 24907577,
"buckets" : [
{
"key" : "財",
"doc_count" : 8965316,
"total_sum" : {
"value" : 1.34136E7
}
},
{
"key" : "經",
"doc_count" : 7768020,
"total_sum" : {
"value" : 1.037061E7
}
},
{
"key" : "新",
"doc_count" : 7578602,
"total_sum" : {
"value" : 1.0302178E7
}
},
{
"key" : "浪",
"doc_count" : 6764223,
"total_sum" : {
"value" : 8648774.0
}
}
....
}
怎麼每個字段都被分開了?原來使用text類型存儲時會使用分詞器分割好的數據存儲,這樣只好在設置一個keyword類型的字段
PUT /event_news/_mapping
{"properties":{"news_source":{"type":"text","fields": {"raw": {"type": "keyword"}},"fielddata":true}}}
mapping:
"news_source": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"fielddata": true
},
"related_freq": {
"type": "integer"
}
查詢:
GET /event_news/_search
{
"size": 0,
"aggs": {
"news_source_info":{
"terms": {
"field": "news_source.raw"
},
"aggs": {
"total_sum": {
"sum": {"field": "related_freq"}
}
}
}
}
}
結果:
"news_source_info" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 576,
"buckets" : [
{
"key" : "長城網",
"doc_count" : 47,
"total_sum" : {
"value" : 51.0
}
},
{
"key" : "新華網",
"doc_count" : 45,
"total_sum" : {
"value" : 1305.0
}
},
{
"key" : "中工網",
"doc_count" : 38,
"total_sum" : {
"value" : 303.0
}
}
}
這樣就可以了