elasticsearch對text類型進行聚合時遇到的問題

原創

2020-06-01 13:13

在elasticsearch中對數據進行聚合，想得出每個來源的數據量，mapping:

  "news_source": {
     "type": "text"
   },
   "related_freq": {
     "type": "integer"
   },

查詢條件:

GET /event_news/_search
{
  "size": 0, 
  "aggs": {
    "news_source_info":{
      "terms": {
        "field": "news_source"
      },
      "aggs": {
        "total_sum": {
          "sum": {"field": "related_freq"}
        }
      }
    }
  }
}

結果報錯

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_abstract] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "event_news",
        "node": "9nWATfvgTJiLmbYg-RpPQw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [news_source] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

恩，需要將字段fielddata設置爲true纔可以,進行修改：

PUT /event_news/_mapping
{"properties":{"news_source":{"type":"text","fielddata":true}}}

現在的mapping

  "news_source": {
     "type": "text",
     "fielddata": true
   },
   "related_freq": {
     "type": "integer"
   },

再次查詢，結果：

{
  "aggregations" : {
    "news_source_info" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 24907577,
      "buckets" : [
        {
          "key" : "財",
          "doc_count" : 8965316,
          "total_sum" : {
            "value" : 1.34136E7
          }
        },
        {
          "key" : "經",
          "doc_count" : 7768020,
          "total_sum" : {
            "value" : 1.037061E7
          }
        },
        {
          "key" : "新",
          "doc_count" : 7578602,
          "total_sum" : {
            "value" : 1.0302178E7
          }
        },
        {
          "key" : "浪",
          "doc_count" : 6764223,
          "total_sum" : {
            "value" : 8648774.0
         }
         }
         ....
}

怎麼每個字段都被分開了？原來使用text類型存儲時會使用分詞器分割好的數據存儲，這樣只好在設置一個keyword類型的字段

PUT /event_news/_mapping
{"properties":{"news_source":{"type":"text","fields":  {"raw": {"type": "keyword"}},"fielddata":true}}}

mapping:

  "news_source": {
     "type": "text",
     "fields": {
       "raw": {
         "type": "keyword"
       }
     },
     "fielddata": true
   },
     "related_freq": {
     "type": "integer"
   }

查詢：

GET /event_news/_search
{
  "size": 0, 
  "aggs": {
    "news_source_info":{
      "terms": {
        "field": "news_source.raw"
      },
      "aggs": {
        "total_sum": {
          "sum": {"field": "related_freq"}
        }
      }
    }

  }
}

結果：

    "news_source_info" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 576,
      "buckets" : [
        {
          "key" : "長城網",
          "doc_count" : 47,
          "total_sum" : {
            "value" : 51.0
          }
        },
        {
          "key" : "新華網",
          "doc_count" : 45,
          "total_sum" : {
            "value" : 1305.0
          }
        },
        {
          "key" : "中工網",
          "doc_count" : 38,
          "total_sum" : {
            "value" : 303.0
          }
        }
     }

這樣就可以了

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

elasticsearch對text類型進行聚合時遇到的問題

杭州的 IT 崩盤了麼？

開源高性能結構化日誌模塊NanoLog

Python 潮流週刊#55：分享 9 個高質量的技術類信息源！

Azure Virtual Network (22) 多訂閱使用Azure DNS解析問題 Windows Azure Platform 系列文章目錄

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

mysql Cant create more than max_prepared_stmt_count statements錯誤

mysql too many connections錯誤

python測試框架 unittest 配合 flask 使用

使用go net 實現簡單的redis通信協議

go-kratos 微服務框架 bm 模塊使用

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結