關係型數據庫的範式化設計:範式化設計(Normalization)的主要目的是減少不必要的更新,但是一個完全範式化設計的數據會經常面臨查詢緩慢的問題(數據庫越範式化,需要Join的表就越多)
反範式化設計(Denormalization):數據扁平,不使用關聯關係,而是在文檔中保存冗餘的數據拷貝
- 優點:無需處理Join操作,數據讀取性能好(Elasticsearch通過壓縮_source字段,減少磁盤的開銷)
- 缺點:不適合在數據頻繁修改的場景
關係型數據庫一般會考慮Normalize數據,在Elasticsearch,往往考慮Denormalize數據(Denormalize的好處:讀的速度快/無需表連接/無需行鎖)
Elasticsearch並不擅長處理關聯關係,一般採取以下四種方式處理
- 對象類型
- 嵌套對象(Nested Object)
- 父子關聯關係(Parent/Child)
- 應用端關聯
對比
Nested Object | Parent/Child | |
---|---|---|
優點 | 文檔存儲在一起,讀取性能高 | 父子文檔可以獨立更新 |
缺點 | 更新嵌套子文檔時,需要更新整個文檔 | 需要額外的內存維護關係,讀取性能相對差 |
對象類型
案例一:文章和作者的信息(1:1關係)
DELETE articles
#設置articles的mappings信息
PUT /articles
{
"mappings": {
"properties": {
"content": {
"type": "text"
},
"time": {
"type": "date"
},
"author": {
"properties": {
"userid": {
"type": "long"
},
"username": {
"type": "keyword"
}
}
}
}
}
}
#插入一條測試數據
PUT articles/_doc/1
{
"content":"Elasticsearch Helloworld!",
"time":"2020-01-01T00:00:00",
"author":{
"userid":1001,
"username":"liu"
}
}
#查詢
POST articles/_search
{
"query": {
"bool": {
"must": [
{"match": {
"content": "Elasticsearch"
}},
{"match": {
"author.username": "liu"
}}
]
}
}
}
案例二:文章和作者的信息(1:n關係)(有問題!)
DELETE articles
#設置articles的mappings信息
PUT /articles
{
"mappings": {
"properties": {
"content": {
"type": "text"
},
"time": {
"type": "date"
},
"author": {
"properties": {
"userid": {
"type": "long"
},
"username": {
"type": "keyword"
}
}
}
}
}
}
POST articles/_search
#插入一條測試數據
PUT articles/_doc/1
{
"content":"Elasticsearch Helloworld!",
"time":"2020-01-01T00:00:00",
"author":[{
"userid":1001,
"username":"liu"
},{
"userid":1002,
"username":"jia"
}]
}
#查詢(這樣也能查到!爲什麼出現這種結果呢?)
POST articles/_search
{
"query": {
"bool": {
"must": [
{"match": {
"author.userid": "1001"
}},
{"match": {
"author.username": "jia"
}}
]
}
}
}
當使用對象保存有數組的文檔時,我們發現會查詢到不需要的結果,原因是什麼呢?
存儲時,內部對象的邊界並沒有考慮在內,JSON格式被處理成扁平式鍵值對的結構,當對多個字段進行查詢時,導致了意外的搜索結果
"content":"Elasticsearch Helloworld!"
"time":"2020-01-01T00:00:00"
"author.userid":["1001","1002"]
"author.username":["liu","jia"]
使用嵌套對象(Nested Object)可以解決這個問題
嵌套對象
允許對象數組中的對象被獨立索引,使用Nested和properties關鍵字將所有author索引到多個分隔的文檔 ,在內部,Nested文檔會被保存在兩個Lucene文檔中,在查詢時做Join處理
案例一:文章和作者的信息(1:n關係)
DELETE articles
#設置articles的mappings信息
PUT /articles
{
"mappings": {
"properties": {
"content": {
"type": "text"
},
"time": {
"type": "date"
},
"author": {
"type": "nested",
"properties": {
"userid": {
"type": "long"
},
"username": {
"type": "keyword"
}
}
}
}
}
}
POST articles/_search
#插入一條測試數據
PUT articles/_doc/1
{
"content":"Elasticsearch Helloworld!",
"time":"2020-01-01T00:00:00",
"author":[{
"userid":1001,
"username":"liu"
},{
"userid":1002,
"username":"jia"
}]
}
#查詢(這樣也能查到!爲什麼出現這種結果呢?)
POST articles/_search
{
"query": {
"bool": {
"must": [
{"nested": {
"path": "author",
"query": {
"bool": {
"must": [
{"match": {
"author.userid": "1001"
}},
{"match": {
"author.username": "jia"
}}
]
}
}
}}
]
}
}
}
父子關聯關係
對象和Nested對象都存在一定的侷限性,每次更新需要重新索引整個對象,Elasticsearch提供了類似關係型數據庫中Join的實現,可以通過維護Parent/Child的關係,從而分離兩個對象,父文檔和子文檔是兩個獨立的文檔,更新父文檔無需重新索引子文檔,子文檔被添加,更新或刪除也不會影響到父文檔和其他的子文檔
案例:文章和作者的信息(1:n關係)
DELETE articles
#設置articles的mappings信息
PUT /articles
{
"mappings": {
"properties": {
"article_author_relation": {
"type": "join",
"relations": {
"article": "author"
}
},
"content": {
"type": "text"
},
"time": {
"type": "date"
}
}
}
}
#索引父文檔
PUT articles/_doc/article1
{
"article_author_relation":{
"name":"article"
},
"content":"Elasticsearch Helloworld!",
"time":"2020-01-01T00:00:00"
}
#索引子文檔
PUT articles/_doc/author1?routing=article1
{
"article_author_relation":{
"name":"author",
"parent":"article1"
},
"userid":"1001",
"username":"jia"
}
PUT articles/_doc/author2?routing=article1
{
"article_author_relation":{
"name":"author",
"parent":"article1"
},
"userid":"1002",
"username":"liu"
}
GET articles/_doc/article1
POST articles/_search
#根據parent_id父文檔id查詢子文檔
POST articles/_search
{
"query": {
"parent_id":{
"type":"author",
"id":"article1"
}
}
}
#has_child返回父文檔
POST articles/_search
{
"query": {
"has_child":{
"type":"author",
"query": {
"match": {
"username": "liu"
}
}
}
}
}
#has_parent返回子文檔
POST articles/_search
{
"query": {
"has_parent":{
"parent_type":"article",
"query": {
"match": {
"content": "elasticsearch"
}
}
}
}
}