ElasticSearch 父子文檔使用簡記

一. ES parent-child 文檔簡介

ES 提供了類似數據庫中 Join 聯結的實現,可以通過 Join 類型的字段維護父子關係的數據,其父文檔和子文檔可以單獨維護。

二. 父子文檔的索引創建與數據插入

ES 父子文檔的創建可以分爲下面三步:

  • 創建索引 Mapping,指明數據類型爲 join 與父子文檔名
  • 插入父文檔
  • 插入子文檔

下面針對每一步做演示。

1. 創建索引

假設我們有一個博客系統,每篇博客下有若干條評論,那麼博客 blog 與評論 comment 就構成了一個父子關係。

父子文檔的創建方爲:

  • 指定字段類型爲 join
  • 通過 relations 指定父子關係

示例如下:

# blog 爲父文檔,comment 爲子文檔
PUT blog_index
{
  "mappings": {
    "properties": {
      "blog_comment_join": {
        "type": "join",
        "relations": {
          "blog": "comment"
        }
      }
    }
  }
}

2. 插入父文檔

PUT blog_index/_doc/1
{
  "title": "First Blog",
  "author": "Ahri",
  "content": "This is my first blog",
  "blog_comment_join": {
    "name": "blog"
  }
}


PUT blog_index/_doc/2
{
  "title": "Second Blog",
  "author": "EZ",
  "content": "This is my second blog",
  "blog_comment_join": "blog"
}

3. 插入子文檔

插入子文檔時需要注意一點:

  • routing 設置:子文檔必須要與父文檔存儲在同一分片上,因此子文檔的 routing 應該設置爲父文檔 ID 或者與父文檔保持一致

示例代碼如下:

PUT blog_index/_doc/comment-1?routing=1&refresh
{
  "user": "Tom",
  "content": "Good blog",
  "comment_date": "2020-01-01 10:00:00",
  "blog_comment_join": {
    "name": "comment",
    "parent": 1
  }
}

PUT blog_index/_doc/comment-2?routing=1&refresh
{
  "user": "Jhon",
  "content": "Good Job",
  "comment_date": "2020-02-01 10:00:00",
  "blog_comment_join": {
    "name": "comment",
    "parent": 1
  }
}

PUT blog_index/_doc/comment-3?routing=2&refresh
{
  "user": "Jack",
  "content": "Great job",
  "comment_date": "2020-01-01 10:00:00",
  "blog_comment_join": {
    "name": "comment",
    "parent": 2
  }
}

4. 其他

除了上面常見的父子文檔類型,ES Join 還支持 多子文檔多級父子文檔 的設置。如下:

構建多個子文檔

Join 類型一個父文檔可以配置多個子文檔,創建方式如下:

PUT my_index
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
          "question": ["answer", "comment"]  
        }
      }
    }
  }
}

構建多級父子關係

PUT my_index
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
          "question": ["answer", "comment"],  
          "answer": "vote" 
        }
      }
    }
  }
}

上面創建的父子文檔層級如下圖所示:

在這裏插入圖片描述

三. 父子文檔的查詢

基於父子文檔的查詢主要有三種:

  • parent_id:基於父文檔 ID 查詢所有的子文檔
  • has_parent:查詢符合條件的父文檔的所有子文檔
  • has_child:查詢符合條件的子文檔的所有父文檔

下面是具體查詢示例:

【1】parent_id 查詢
# 查詢 ID1 父文檔的所有子文檔
GET blog_index_parent_child/_search
{
  "query": {
    "parent_id": {
      "type": "comment",
      "id": 1
    }
  }
}

# 結果返回
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.44183275,
    "hits" : [
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "comment-1",
        "_score" : 0.44183275,
        "_routing" : "1",
        "_source" : {
          "user" : "Tom",
          "content" : "Good blog",
          "comment_date" : "2020-01-01 10:00:00",
          "blog_comment_join" : {
            "name" : "comment",
            "parent" : 1
          }
        }
      },
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "comment-2",
        "_score" : 0.44183275,
        "_routing" : "1",
        "_source" : {
          "user" : "Jhon",
          "content" : "Good Job",
          "comment_date" : "2020-02-01 10:00:00",
          "blog_comment_join" : {
            "name" : "comment",
            "parent" : 1
          }
        }
      }
    ]
  }
}

【2】has_parent 查詢
# 查詢 title 包含 first 的父文檔的所有子文檔
GET blog_index/_search
{
  "query": {
    "has_parent": {
      "parent_type": "blog",
      "query": {
        "match": {
          "title": "first"
        }
      }
    }
  }
}
# 結果返回
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "comment-1",
        "_score" : 1.0,
        "_routing" : "1",
        "_source" : {
          "user" : "Tom",
          "content" : "Good blog",
          "comment_date" : "2020-01-01 10:00:00",
          "blog_comment_join" : {
            "name" : "comment",
            "parent" : 1
          }
        }
      },
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "comment-2",
        "_score" : 1.0,
        "_routing" : "1",
        "_source" : {
          "user" : "Jhon",
          "content" : "Good Job",
          "comment_date" : "2020-02-01 10:00:00",
          "blog_comment_join" : {
            "name" : "comment",
            "parent" : 1
          }
        }
      }
    ]
  }
}

【3】has_child 查詢
# 查詢 user 包含 Jack 的所有子文檔的父文檔
GET blog_index/_search
{
  "query": {
    "has_child": {
      "type": "comment",
      "query": {
        "match": {
          "user": "Jack"
        }
      }
    }
  }
}
# 結果返回
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "Second Blog",
          "author" : "EZ",
          "content" : "This is my second blog",
          "blog_comment_join" : "blog"
        }
      }
    ]
  }
}


四. Nested 對象 VS 父子文檔

下面是極客時間課程《Elasticsearch核心技術與實戰》中給出的對比:

在這裏插入圖片描述

一般來說大多數數據還是讀多寫少的,因此大多數時候還是優先使用 Nested 對象。


老鐵都看到這了來一波點贊、評論、關注三連可好

我是 AhriJ鄒同學,前後端、小程序、DevOps 都搞的炸棧工程師。博客持續更新,如果覺得寫的不錯,歡迎來一波老鐵三連,不好的話也歡迎指正,互相學習,共同進步。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章