Elasticsearch處理數據關聯關係

關係型數據庫的範式化設計：範式化設計(Normalization)的主要目的是減少不必要的更新，但是一個完全範式化設計的數據會經常面臨查詢緩慢的問題(數據庫越範式化，需要Join的表就越多)

反範式化設計(Denormalization)：數據扁平，不使用關聯關係，而是在文檔中保存冗餘的數據拷貝

優點：無需處理Join操作，數據讀取性能好（Elasticsearch通過壓縮_source字段，減少磁盤的開銷）
缺點：不適合在數據頻繁修改的場景

關係型數據庫一般會考慮Normalize數據，在Elasticsearch，往往考慮Denormalize數據（Denormalize的好處：讀的速度快/無需表連接/無需行鎖）

Elasticsearch並不擅長處理關聯關係,一般採取以下四種方式處理

對象類型
嵌套對象(Nested Object)
父子關聯關係(Parent/Child)
應用端關聯

對比

	Nested Object	Parent/Child
優點	文檔存儲在一起，讀取性能高	父子文檔可以獨立更新
缺點	更新嵌套子文檔時，需要更新整個文檔	需要額外的內存維護關係，讀取性能相對差

對象類型

案例一：文章和作者的信息（1:1關係）

DELETE articles
#設置articles的mappings信息
PUT /articles  
{  
  "mappings": {  
    "properties": {  
      "content": {  
        "type": "text"  
      },  
      "time": {  
        "type": "date"  
      },  
      "author": {  
        "properties": {
          "userid": {  
            "type": "long"  
          },  
          "username": {  
            "type": "keyword"  
          }  
        }  
      }  
    }  
  }  
} 
#插入一條測試數據
PUT articles/_doc/1  
{  
  "content":"Elasticsearch Helloworld！",  
  "time":"2020-01-01T00:00:00",  
  "author":{  
    "userid":1001,  
    "username":"liu"
  }  
} 
#查詢
POST articles/_search  
{  
  "query": {  
    "bool": {  
      "must": [  
        {"match": {  
          "content": "Elasticsearch"  
        }},  
        {"match": {  
          "author.username": "liu"  
        }}  
      ]  
    }  
  }  
}

案例二：文章和作者的信息（1:n關係）（有問題！）

DELETE articles
#設置articles的mappings信息
PUT /articles  
{  
  "mappings": {  
    "properties": {  
      "content": {  
        "type": "text"  
      },  
      "time": {  
        "type": "date"  
      },  
      "author": {  
        "properties": {
          "userid": {  
            "type": "long"  
          },  
          "username": {  
            "type": "keyword"  
          }  
        }  
      }  
    }  
  }  
} 
POST articles/_search
#插入一條測試數據
PUT articles/_doc/1  
{  
  "content":"Elasticsearch Helloworld！",  
  "time":"2020-01-01T00:00:00",  
  "author":[{  
    "userid":1001,  
    "username":"liu"
  },{
    "userid":1002,  
    "username":"jia"
  }]
} 
#查詢(這樣也能查到！爲什麼出現這種結果呢？)
POST articles/_search  
{  
  "query": {  
    "bool": {  
      "must": [  
        {"match": {  
          "author.userid": "1001"  
        }},  
        {"match": {  
          "author.username": "jia"  
        }}  
      ]  
    }  
  }  
}

當使用對象保存有數組的文檔時，我們發現會查詢到不需要的結果，原因是什麼呢？

存儲時，內部對象的邊界並沒有考慮在內，JSON格式被處理成扁平式鍵值對的結構，當對多個字段進行查詢時，導致了意外的搜索結果

"content":"Elasticsearch Helloworld！"
"time":"2020-01-01T00:00:00"
"author.userid":["1001","1002"]
"author.username":["liu","jia"]

使用嵌套對象(Nested Object)可以解決這個問題

嵌套對象

允許對象數組中的對象被獨立索引，使用Nested和properties關鍵字將所有author索引到多個分隔的文檔，在內部，Nested文檔會被保存在兩個Lucene文檔中，在查詢時做Join處理

案例一：文章和作者的信息（1:n關係）

DELETE articles
#設置articles的mappings信息
PUT /articles  
{  
  "mappings": {  
    "properties": {  
      "content": {  
        "type": "text"  
      },  
      "time": {  
        "type": "date"  
      },  
      "author": {  
        "type": "nested", 
        "properties": {
          "userid": {  
            "type": "long"  
          },  
          "username": {  
            "type": "keyword"  
          }  
        }  
      }  
    }  
  }  
} 
POST articles/_search
#插入一條測試數據
PUT articles/_doc/1  
{  
  "content":"Elasticsearch Helloworld！",  
  "time":"2020-01-01T00:00:00",  
  "author":[{  
    "userid":1001,  
    "username":"liu"
  },{
    "userid":1002,  
    "username":"jia"
  }]
} 
#查詢(這樣也能查到！爲什麼出現這種結果呢？)
POST articles/_search  
{  
  "query": {  
    "bool": {  
      "must": [  
        {"nested": {
          "path": "author",
          "query": {  
            "bool": {  
              "must": [  
                {"match": {  
                  "author.userid": "1001"  
                }},  
                {"match": {  
                  "author.username": "jia"  
                }}  
              ]  
            }  
          }
        }}
      ]  
    }  
  }  
}

父子關聯關係

對象和Nested對象都存在一定的侷限性，每次更新需要重新索引整個對象，Elasticsearch提供了類似關係型數據庫中Join的實現，可以通過維護Parent/Child的關係，從而分離兩個對象，父文檔和子文檔是兩個獨立的文檔，更新父文檔無需重新索引子文檔，子文檔被添加，更新或刪除也不會影響到父文檔和其他的子文檔

案例：文章和作者的信息（1:n關係）

DELETE articles
#設置articles的mappings信息
PUT /articles  
{  
  "mappings": {  
    "properties": {  
      "article_author_relation": {  
        "type": "join",  
        "relations": {  
          "article": "author"  
        }
      },
      "content": {  
        "type": "text"  
      },  
      "time": {  
        "type": "date"  
      }
    }  
  }  
} 
#索引父文檔
PUT articles/_doc/article1
{  
  "article_author_relation":{
    "name":"article"
  },
  "content":"Elasticsearch Helloworld！",  
  "time":"2020-01-01T00:00:00"
} 
#索引子文檔
PUT articles/_doc/author1?routing=article1
{  
  "article_author_relation":{
    "name":"author",
    "parent":"article1"
  },
  "userid":"1001",  
  "username":"jia"
} 
PUT articles/_doc/author2?routing=article1
{  
  "article_author_relation":{
    "name":"author",
    "parent":"article1"
  },
  "userid":"1002",  
  "username":"liu"
} 
GET articles/_doc/article1
POST articles/_search
#根據parent_id父文檔id查詢子文檔
POST articles/_search  
{  
  "query": {
    "parent_id":{
      "type":"author",
      "id":"article1"
    }
  }  
}
#has_child返回父文檔
POST articles/_search  
{  
  "query": {
    "has_child":{
      "type":"author",
      "query": {
        "match": {
          "username": "liu"
        }
      }
    }
  }  
}
#has_parent返回子文檔
POST articles/_search  
{  
  "query": {
    "has_parent":{
      "parent_type":"article",
      "query": {
        "match": {
          "content": "elasticsearch"
        }
      }
    }
  }  
}

liujiazhong_pro

發佈了22 篇原創文章 · 獲贊 17 · 訪問量 856

私信關注

Elasticsearch處理數據關聯關係

對比

對象類型

嵌套對象

父子關聯關係

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

java由於越界導致的報錯

Spring Cloud Gateway聚合Swagger文檔

什麼是IaaS、PaaS和SaaS？

JVM內存模型(運行時數據區)

Elasticsearch分片本質與集羣的故障轉移

JVM中的雙親委派機制

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結