elasticsearch對text類型進行聚合時遇到的問題

在elasticsearch中對數據進行聚合,想得出每個來源的數據量,mapping:

  "news_source": {
     "type": "text"
   },
   "related_freq": {
     "type": "integer"
   },

查詢條件:

GET /event_news/_search
{
  "size": 0, 
  "aggs": {
    "news_source_info":{
      "terms": {
        "field": "news_source"
      },
      "aggs": {
        "total_sum": {
          "sum": {"field": "related_freq"}
        }
      }
    }
  }
}

結果報錯

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_abstract] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "event_news",
        "node": "9nWATfvgTJiLmbYg-RpPQw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

恩,需要將字段fielddata設置爲true纔可以,進行修改:

PUT /event_news/_mapping
{"properties":{"news_source":{"type":"text","fielddata":true}}}

現在的mapping

  "news_source": {
     "type": "text",
     "fielddata": true
   },
   "related_freq": {
     "type": "integer"
   },

再次查詢,結果:

{
  "aggregations" : {
    "news_source_info" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 24907577,
      "buckets" : [
        {
          "key" : "財",
          "doc_count" : 8965316,
          "total_sum" : {
            "value" : 1.34136E7
          }
        },
        {
          "key" : "經",
          "doc_count" : 7768020,
          "total_sum" : {
            "value" : 1.037061E7
          }
        },
        {
          "key" : "新",
          "doc_count" : 7578602,
          "total_sum" : {
            "value" : 1.0302178E7
          }
        },
        {
          "key" : "浪",
          "doc_count" : 6764223,
          "total_sum" : {
            "value" : 8648774.0
         }
         }
         ....
}

怎麼每個字段都被分開了?原來使用text類型存儲時會使用分詞器分割好的數據存儲,這樣只好在設置一個keyword類型的字段

PUT /event_news/_mapping
{"properties":{"news_source":{"type":"text","fields":  {"raw": {"type": "keyword"}},"fielddata":true}}}

mapping:

  "news_source": {
     "type": "text",
     "fields": {
       "raw": {
         "type": "keyword"
       }
     },
     "fielddata": true
   },
     "related_freq": {
     "type": "integer"
   }

查詢:

GET /event_news/_search
{
  "size": 0, 
  "aggs": {
    "news_source_info":{
      "terms": {
        "field": "news_source.raw"
      },
      "aggs": {
        "total_sum": {
          "sum": {"field": "related_freq"}
        }
      }
    }

  }
}

結果:

    "news_source_info" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 576,
      "buckets" : [
        {
          "key" : "長城網",
          "doc_count" : 47,
          "total_sum" : {
            "value" : 51.0
          }
        },
        {
          "key" : "新華網",
          "doc_count" : 45,
          "total_sum" : {
            "value" : 1305.0
          }
        },
        {
          "key" : "中工網",
          "doc_count" : 38,
          "total_sum" : {
            "value" : 303.0
          }
        }
     }

這樣就可以了

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章