elasticsearch的空值處理

本文基於es7.1版本。

針對空值的測試,使用瞭如下幾種值:null、“null”、“”、[ ];

測試代碼太長,先說結論,對於所有類型,null、“”、[ ]均可以被索引,但是無法檢索。對於部分數據類型,由於“null”不能轉換爲對應的類型,因此索引時會報錯,但是對於keywork、text等可以索引string類型的字段,“null”被視作普通的string,可被索引與檢索。不可以被直接檢索的原因,套用es權威指南中的一句原話: If a field has no values, how is it stored in an inverted index?現實是,空值字段在倒排索引中沒有存儲,it isn’t stored at all。

需要注意的是,如果是基於es2.x版本,可使用exists,或者missing來檢索非null/null值。分別等同於關係數據庫中的is not null 和is null。但是missing在7.1版本中已不可用。直接使用會報錯:“no [query] registered for [missing]”。

在程序設計時,爲了給null值設置默認值,可使用null_value屬性。類似於關係數據庫中的default默認值,但又有不同,這個請繼續往下看第3點。但是需要注意的是,如下三點:

1,在es中,只有顯示設置null時,null_value纔會生效,設置空數組如[ ],空字符串如""均不生效。
2,null_value默認值應該匹配數據類型。例如,date類型不能設置字符串默認值。
3,null_value僅可以讓字段以null_value值被倒排索引存儲,以便可以讓此文檔被檢索。並不會替換_source中的實際json文檔值。

創建測試對象:

PUT ac_blog1
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text"
      },
      "body":{
        "type": "text"
      },
      "author":{
        "type": "keyword"
      },
      "views":{
        "type": "long"
      }
    }
  }
}

錄入數據:

POST ac_blog1/_doc
{
  "views":null
}
POST ac_blog1/_doc
{
  "views":[]
}
POST ac_blog1/_doc
{
  "views":""
}

測試一下,獲取全部數據:

GET ac_blog1/_search
{
  "query": {
    "match_all": {}
  },
  "size":100
}

響應:

{
  "took" : 355,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "ac_blog1",
        "_type" : "_doc",
        "_id" : "HFBiSW0Bf1cVbYphJHEo",
        "_score" : 1.0,
        "_source" : {
          "views" : null
        }
      },
      {
        "_index" : "ac_blog1",
        "_type" : "_doc",
        "_id" : "HVBiSW0Bf1cVbYphPHEa",
        "_score" : 1.0,
        "_source" : {
          "views" : [ ]
        }
      },
      {
        "_index" : "ac_blog1",
        "_type" : "_doc",
        "_id" : "HlBiSW0Bf1cVbYphRXGX",
        "_score" : 1.0,
        "_source" : {
          "views" : ""
        }
      }
    ]
  }
}

可見文檔數據都已被索引。下面來查一下:

測試null的情況:

GET ac_blog1/_search
{
  "query": {
    "term": {
      "views":null
    }
  }
}

響應:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "field name is null or empty"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "field name is null or empty"
  },
  "status": 400
}

測試[ ]的情況:

GET ac_blog1/_search
{
  "query": {
    "term": {
      "views":[]
    }
  }
}

響應:

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "[term] query does not support array of values",
        "line": 4,
        "col": 15
      }
    ],
    "type": "parsing_exception",
    "reason": "[term] query does not support array of values",
    "line": 4,
    "col": 15
  },
  "status": 400
}

測試""的情況:

GET ac_blog1/_search
{
  "query": {
    "term": {
      "views":""
    }
  }
}

響應:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {\n  \"term\" : {\n    \"views\" : {\n      \"value\" : \"\",\n      \"boost\" : 1.0\n    }\n  }\n}",
        "index_uuid": "f_2YYPS6RAaew5bXcQwlzQ",
        "index": "ac_blog1"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "ac_blog1",
        "node": "oJRDxfVrQlGOJ9eqCGozDg",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"term\" : {\n    \"views\" : {\n      \"value\" : \"\",\n      \"boost\" : 1.0\n    }\n  }\n}",
          "index_uuid": "f_2YYPS6RAaew5bXcQwlzQ",
          "index": "ac_blog1",
          "caused_by": {
            "type": "number_format_exception",
            "reason": "empty String"
          }
        }
      }
    ]
  },
  "status": 400
}

因爲views爲null類型,無法測試“null”的情況,會報錯null無法轉換爲long類型,這個顯而易見是es做的處理,並不是底層lucene的功能。換用keyword類型的author來測試:

POST ac_blog1/_doc
{
  "author":"null"
}
GET ac_blog1/_search
{
  "query": {
    "term": {
      "author":"null"
    }
  }
}

響應:

{
  "took" : 416,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "ac_blog1",
        "_type" : "_doc",
        "_id" : "H1BoSW0Bf1cVbYphtHF9",
        "_score" : 0.2876821,
        "_source" : {
          "author" : "null"
        }
      }
    ]
  }
}

以上。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章