elasticsearch數據類型--nested

原創

2020-06-25 21:43

前面寫到了object數據類型，這裏來說說nested。按照官方文檔的說法，nested是特殊的object類型，彌補了object對象不能單獨作爲整體進行檢索的缺陷。爲了達到這個目的，nested在es內部是作爲隱藏文檔存儲的。下面來詳細說明。

先來看如何創建一個包含nested類型的index：

PUT test_nested
{
  "settings": {
      "mapping.nested_fields.limit":4,
      "mapping.nested_objects.limit":2
  },
  "mappings": {
    "properties": {
      "region":{
        "type": "keyword"
      },
      "addr":{
        "type": "nested",
        "properties": {
          "nation":{
            "type":"keyword"
          },
          "citys":{
            "type":"nested",
            "properties":{
              "city":{
                "type":"keyword"
              }
            }
          },
          "street":{
            "type":"text"
          },
          "postc":{
            "type":"keyword"
          }
        }
      },
      "manager":{
        "dynamic":false,
        "enabled":true,
        "type": "nested", 
        "properties": {
          "age":{
            "type":"integer"
          },
          "name":{
            "dynamic":false,
            "enabled":true,
            "type":"nested",
            "properties":{
              "first":{
                "type":"keyword"
              },
              "last":{
                "type":"keyword"
              }
            }
          }
        }
      }
    }
  }
}

對上面dsl的屬性做一些解釋:

mapping.nested_fields.limit：限制了單個索引內nested類型字段的數量。因爲nested類型對性能影響較大，對內存消耗較多，爲了保證僅在適當的情況下使用（對象數組需要被獨立檢索），做了這個限制。enabled屬性不影響此屬性。對於上面test_nested這個index，有addr，addr.city，manager，manager.name四個nested字段，剛好滿足這裏的mapping.nested_fields.limit=4的限制。這個是針對index的mappings的限制

mapping.nested_objects.limit：限制了單個記錄document內存儲的nested對象數組元素的數量。因爲nested類型對性能影響較大，內存消耗較多，爲了避免out of memory錯誤，有必要做這個限制。需要注意的是，enabled屬性設置爲false的情況下，因爲對象數組內容不被索引，因此enabled爲flase的nested類型對象，不計入此限制。這個是針對document記錄的限制。

"type": "nested"：表明字段爲nested類型。如果不指定的話，默認爲object。

"dynamic":false：設置不能動態添加field。

"enabled":true：設置可被索引，可以檢索，否則不被索引，不能作爲查詢條件被檢索。

要注意的是，這裏有四個nested字段，addr，addr.city，manager，manager.name。在計算mapping.nested_objects.limit，每一個nested字段的數組中的一個實例，都會作爲一計入mapping.nested_objects.limit限制。

mapping.nested_fields.limit與mapping.nested_objects.limit的區別在於，前者是針對index的mappings的限制，後者是針對document的限制。如果把index視作類型定義，document視作index的實例，則前者限制類型定義，後者限制對象實例。

寫入點數據：

POST test_nested/_doc
{
  "region":"hot",
  "manager":[
    {"age":23,
      "name":[
        {"first":"alice","last":"smith"}
      ]
    }
  ]
}


POST test_nested/_doc
{
  "region":"hot",
  "manager":[
    {"age":23,
      "name":[
        {"first":"alice","last":"smith"},
        {"first":"alice1","last":"smith1"}
      ]
    },
    {"age":24,
      "name":[
        {"first":"alice2","last":"smith2"},
        {"first":"alice21","last":"smith21"}
      ]
    }
  ]
}

上面兩個新增文檔的dsl，第一個會成功，第二個會因爲違反mapping.nested_objects.limit=2的限制失敗。因爲第一個在manager下有一個數組元素，在manager.name下有一個數組元素，剛好是兩個。第二個dsl，有兩個manager數組元素，4個manager.name元素，加起來是6個，超過了mapping.nested_objects.limit的限制。

下面再來嘗試一下first爲alice，last爲smith1的問題：

GET test_nested/_search
{
  "query": {
    "nested": {
      "path": "manager.name",
      "query": {
        "bool": {
          "must": [
            {"match":{"manager.name.first":"alice"}},
            {"match":{"manager.name.last":"smith1"}}
          ]
        }
      }
    }
  }
}

上面這個檢索不到數據，因爲並沒有一個manager.name數組對象的first爲alice，last爲smith1。那麼first爲alice，last爲smith呢？看下面這個：

GET test_nested/_search
{
  "query": {
    "nested": {
      "path": "manager.name",
      "query": {
        "bool": {
          "must": [
            {"match":{"manager.name.first":"alice"}},
            {"match":{"manager.name.last":"smith"}}
          ]
        }
      }
    }
  }
}

這個是可以檢索到數據的。

nested對象有自己的限制：

queried with the nested query.
analyzed with the nested and reverse_nested aggregations.
sorted with nested sorting.
retrieved and highlighted with nested inner hits.

nested和前面說到的join數據類型通常作爲實現關係數據庫中關係模型的es解決方案來使用，這種實現爲 index time join。當然相比普通index對象，性能稍差。另外一種實現關係模型的方法爲query time join。不同實體實現爲不同index，在一個index裏保存兩者之間的關係，由應用程序分別檢索兩個index，需要兩次檢索。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

elasticsearch數據類型--nested

DAPPER 事務 TRANSACTION

[ERROR] InnoDB: Ignoring the redo log due to missing MLOG_CHECKPOINT between the checkpoint

elasticsearch aggregations 之一：引入buckets、metrics

elasticsearch數據類型--nested

elasticsearch的空值處理

elasticsearch shard--refresh

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結