ElasticSearch 中的 Mapping

公號：碼農充電站pro
主頁：https://codeshellme.github.io

1，ES 中的 Mapping

ES 中的 Mapping 相當於傳統數據庫中的表定義，它有以下作用：

定義索引中的字段的名字。
定義索引中的字段的類型，比如字符串，數字等。
定義索引中的字段是否建立倒排索引。

一個 Mapping 是針對一個索引中的 Type 定義的：

ES 中的文檔都存儲在索引的 Type 中
在 ES 7.0 之前，一個索引可以有多個 Type，所以一個索引可擁有多個 Mapping
在 ES 7.0 之後，一個索引只能有一個 Type，所以一個索引只對應一個 Mapping

通過下面語法可以獲取一個索引的 Mapping 信息：

GET index_name/_mapping

2，ES 字段的 mapping 參數

字段的 mapping 可以設置很多參數，如下：

analyzer：指定分詞器，只有 text 類型的數據支持。
enabled：如果設置成 false，表示數據僅做存儲，不支持搜索和聚合分析（數據保存在 _source 中）。
- 默認值爲 true。
index：字段是否建立倒排索引。
- 如果設置成 false，表示不建立倒排索引（節省空間），同時數據也無法被搜索，但依然支持聚合分析，數據也會出現在 _source 中。
- 默認值爲 true。
norms：字段是否支持算分。
- 如果字段只用來過濾和聚合分析，而不需要被搜索（計算算分），那麼可以設置爲 false，可節省空間。
- 默認值爲 true。
doc_values：如果確定不需要對字段進行排序或聚合，也不需要從腳本訪問字段值，則可以將其設置爲 false，以節省磁盤空間。
- 默認值爲 true。
fielddata：如果要對 text 類型的數據進行排序和聚合分析，則將其設置爲 true。
- 默認爲 false。
store：默認值爲 false，數據存儲在 _source 中。
- 默認情況下，字段值被編入索引以使其可搜索，但它們不會被存儲。這意味着可以查詢字段，但無法檢索原始字段值。
- 在某些情況下，存儲字段是有意義的。例如，有一個帶有標題、日期和非常大的內容字段的文檔，只想檢索標題和日期，而不必從一個大的源字段中提取這些字段。
boost：可增強字段的算分。
coerce：是否開啓數據類型的自動轉換，比如字符串轉數字。
- 默認是開啓的。
dynamic：控制 mapping 的自動更新，取值有 true，false，strict。
eager_global_ordinals
fields：多字段特性。
- 讓一個字段擁有多個子字段類型，使得一個字段能夠被多個不同的索引方式進行索引。
copy_to
format
ignore_above
ignore_malformed
index_options
index_phrases
index_prefixes
meta
normalizer
null_value：定義 null 的值。
position_increment_gap
properties
search_analyzer
similarity
term_vector

2.1，fields 參數

讓一個字段擁有多個子字段類型，使得一個字段能夠被多個不同的索引方式進行索引。

示例 1：

PUT index_name
{
  "mappings": {         # 設置 mappings
    "properties": {     # 屬性，固定寫法
      "city": {         # 字段名
        "type": "text", # city 字段的類型爲 text
        "fields": {     # 多字段域，固定寫法
          "raw": {      # 子字段名稱
            "type":  "keyword"  # 子字段類型
          }
        }
      }
    }
  }
}

示例 2 ：

PUT index_name
{
  "mappings": {
    "properties": {
      "title": {               # 字段名稱
        "type": "text",        # 字段類型
        "analyzer": "english", # 字段分詞器
        "fields": {            # 多字段域，固定寫法
          "std": {             # 子字段名稱
            "type": "text",    # 子字段類型
            "analyzer": "standard"  # 子字段分詞器
           }
        }
      }
    }
  }
}

3，ES 字段的數據類型

ES 中字段的數據類型有以下這些：

簡單類型
- Numeric
- Boolean
- Date
- Text
- Keyword
- Binary
- 等
複雜類型
- Object
- Arrays
- Nested：一種對象數據類型。
- Join：爲同一索引中的文檔定義父/子關係。
特殊類型

text 類型與 keyword 類型

字符串數據可以定義成 text 或 keyword 類型，text 類型數據會做分詞處理，而 keyword 類型數據不會做分詞處理。

數組類型

對於數組類型 Arrays，ES 並沒有提供專門的數組類型，但是任何字段都可以包含多個相同類型的數據，比如：

["one", "two"] # 一個字符串數組
[1, 2]         # 一個整數數組
[1, [ 2, 3 ]]   # 相當於 [ 1, 2, 3 ]
[{ "name": "Mary", "age": 12 }, { "name": "John", "age": 10 }] # 一個對象數組

當在 Mapping 中查看這些數組的類型時，其實還是數組中的元素的類型，而不是一個數組類型。

3.1，Nested 類型

Nested 是一種對象類型，它保留了子字段之間的關係。

1，爲什麼需要 Nested 類型

假如我們有如下結構的數據：

POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[ # actors 是一個數組類型，數組中的元素是對象類型
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },
    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }
  ]
}

將數據插入 ES 之後，執行下面的查詢：

# 查詢電影信息
POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"actors.first_name": "Keanu"}},
        {"match": {"actors.last_name": "Hopper"}}
      ]
    }
  }
}

按照上面的查詢語句，我們想查詢的是 first_name=Keanu 且 last_name=Hopper 的數據，所以我們剛纔插入的 id 爲 1 的文檔應該不符合這個查詢條件。

但是在 ES 中執行上面的查詢語句，卻能查出 id 爲 1 的文檔。這是爲什麼呢？

這是因爲，ES 對於這種 actors 字段這樣的結構的數據，ES 並沒有考慮對象的邊界。

實際上，在 ES 內部，id 爲 1 的那個文檔是這樣存儲的：

"title":"Speed"
"actors.first_name":["Keanu","Dennis"]
"actors.last_name":["Reeves","Hopper"]

所以這種存儲方式，並不是我們想象的那樣。

如果我們查看 ES 默認爲上面（id 爲 1）結構的數據生成的 mappings，如下：

{
  "my_movies" : {
    "mappings" : {
      "properties" : {
        "actors" : {           # actors 內部又嵌套了一個 properties
          "properties" : {
            "first_name" : {   # 定義 first_name 的類型
              "type" : "text",
              "fields" : {
                "keyword" : {"type" : "keyword", "ignore_above" : 256}
              }
            },
            "last_name" : {    # 定義 last_name 的類型
              "type" : "text",
              "fields" : {
                "keyword" : {"type" : "keyword", "ignore_above" : 256}
              }
            }
          }
        }, # end actors
        "title" : {  
          "type" : "text",
          "fields" : {
            "keyword" : {"type" : "keyword", "ignore_above" : 256}
          }
        }
      }
    }
  }
}

那如何才能真正的表達一個對象類型呢？這就需要使用到 Nested 類型。

2，使用 Nested 類型

Nested 類型允許對象數組中的對象被獨立（看作一個整體）索引。

我們對 my_movies 索引設置這樣的 mappings：

DELETE my_movies
PUT my_movies
{
    "mappings" : {
    "properties" : {
      "actors" : {
        "type": "nested",  # 將 actors 設置爲 nested 類型
        "properties" : {   # 這時 actors 數組中的每個對象就是一個整體了
          "first_name" : {"type" : "keyword"},
          "last_name" : {"type" : "keyword"}
        }},
      "title" : {
        "type" : "text",
        "fields" : {"keyword":{"type":"keyword","ignore_above":256}}
      }
    }
  }
}

寫入數據後，在進行這樣的搜索，就不會搜索出數據了：

# 查詢電影信息
POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"actors.first_name": "Keanu"}},
        {"match": {"actors.last_name": "Hopper"}}
      ]
    }
  }
}

但是這樣的查詢也查不出數據：

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"actors.first_name": "Keanu"}},
        {"match": {"actors.last_name": "Reeves"}}
      ]
    }
  }
}

3，搜索 Nested 類型

這是因爲，查詢 Nested 類型的數據，要像下面這樣查詢：

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {          # nested 查詢
            "path": "actors",  # 自定 actors 字段路徑
            "query": {         # 查詢語句
              "bool": {
                "must": [
                  {"match": {"actors.first_name": "Keanu"}},
                  {"match": {"actors.last_name": "Hopper"}}
                ]
              }
            }
          } # end nested
        }
      ] # end must
    } # end bool
  }
}

4，聚合 Nested 類型

對 Nested 類型的數據進行聚合，示例：

# Nested Aggregation
POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actors": {            # 自定義聚合名稱
      "nested": {          # 指定 nested 類型
        "path": "actors"   # 聚合的字段名稱
      },
      "aggs": {            # 子聚合
        "actor_name": {    # 自定義子聚合名稱
          "terms": {       # terms 聚合
            "field": "actors.first_name",  # 子字段名稱
            "size": 10
          }
        }
      }
    }
  }
}

使用普通的聚合方式則無法工作：

POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actors": {     # 自定義聚合名稱
      "terms": {    # terms 聚合 
        "field": "actors.first_name",
        "size": 10
      }
    }
  }
}

3.2，Join 類型

Nested 類型的對象與其父/子級文檔的關係，使得每次文檔有更新的時候需要重建整個文檔（包括根對象和嵌套對象）的索引。

Join 數據類型（類似關係型數據庫中的 Join 操作）爲同一索引中的文檔定義父/子關係。

Join 數據類型可以維護一個父/子關係，從而分離兩個對象，它的優點是：

父文檔和子文檔是兩個完全獨立的文檔，這使得更新父文檔不會影響到子文檔，更新子文檔也不會影響到父文檔。

Nested 類型與 Join（Parent/Child）類型的優缺點對比：

1，定義 Join 類型

定義 Join 類型的語法如下：

DELETE my_blogs

# 設定 Parent/Child Mapping
PUT my_blogs
{
  "mappings": {
    "properties": {
      "blog_comments_relation": {  # 字段名稱
        "type": "join",            # 定義 join 類型
        "relations": {             # 定義父子關係
          "blog": "comment"        # blog 表示父級文檔，comment 表示子級文檔
        }
      },
      "content": {
        "type": "text"
      },
      "title": {
        "type": "keyword"
      }
    }
  }
}

2，插入 Join 數據

先插入兩個父文檔：

# 插入 blog1
PUT my_blogs/_doc/blog1
{
  "title":"Learning Elasticsearch",
  "content":"learning ELK @ geektime",
  "blog_comments_relation":{
    "name":"blog"  # name 爲 blog 表示父文檔
  }
}

# 插入 blog2
PUT my_blogs/_doc/blog2
{
  "title":"Learning Hadoop",
  "content":"learning Hadoop",
    "blog_comments_relation":{
    "name":"blog" # name 爲 blog 表示父文檔
  }
}

插入子文檔：

其中需要注意 routing 的值是父文檔 id；
這樣可以確保父子文檔被索引到相同的分片，從而確保 join 查詢的性能。

# 插入comment1
PUT my_blogs/_doc/comment1?routing=blog1 # routing 的值是父文檔 id
{                                        # 確保父子文檔被索引到相同的分片
  "comment":"I am learning ELK",
  "username":"Jack",
  "blog_comments_relation":{
    "name":"comment",  # name 爲 comment 表示子文檔
    "parent":"blog1"   # 指定父文檔的 id，表示子文檔屬於哪個父文檔
  }
}

# 插入 comment2
PUT my_blogs/_doc/comment2?routing=blog2 # routing 的值是父文檔 id
{                                        # 確保父子文檔被索引到相同的分片
  "comment":"I like Hadoop!!!!!",
  "username":"Jack",
  "blog_comments_relation":{
    "name":"comment", # name 爲 comment 表示子文檔
    "parent":"blog2"  # 指定父文檔的 id，表示子文檔屬於哪個父文檔
  }
}

# 插入 comment3
PUT my_blogs/_doc/comment3?routing=blog2 # routing 的值是父文檔 id
{                                        # 確保父子文檔被索引到相同的分片
  "comment":"Hello Hadoop",
  "username":"Bob",
  "blog_comments_relation":{
    "name":"comment", # name 爲 comment 表示子文檔
    "parent":"blog2"  # 指定父文檔的 id，表示子文檔屬於哪個父文檔
  }
}

3，parent_id 查詢

根據父文檔 id 來查詢父文檔，普通的查詢無法查出子文檔的信息：

GET my_blogs/_doc/blog2

如果想查到子文檔的信息，需要使用 parent_id 查詢：

POST my_blogs/_search
{
  "query": {
    "parent_id": {        # parent_id 查詢
      "type": "comment",  # comment 表示是子文檔，即是表示想查詢子文檔信息
      "id": "blog2"       # 指定父文檔的 id
    }                     # 這樣可以查詢到 blog2 的所有 comment
  }
}

4，has_child 查詢

has_child 查詢可以通過子文檔的信息，查到父文檔信息。

POST my_blogs/_search
{
  "query": {
    "has_child": {       # has_child 查詢
      "type": "comment", # 指定子文檔類型，表示下面的 query 中的信息要在 comment 子文檔中匹配
      "query" : {        
          "match": {"username" : "Jack"}
      }                  # 在子文檔中匹配信息，最終返回所有的相關父文檔信息
    }
  }
}

5，has_parent 查詢

has_parent 查詢可以通過父文檔的信息，查到子文檔信息。

POST my_blogs/_search
{
  "query": {
    "has_parent": {          # has_parent 查詢
      "parent_type": "blog", # 指定子文檔類型，表示下面的 query 中的信息要在 blog 父文檔中匹配
      "query" : {
          "match": {"title" : "Learning Hadoop"}
      }                      # 在父文檔中匹配信息，最終返回所有的相關子文檔信息
    }
  }
}

6，通過子文檔 id 查詢子文檔信息

普通的查詢無法查到：

GET my_blogs/_doc/comment3

需要指定 routing 參數，提供父文檔 id：

GET my_blogs/_doc/comment3?routing=blog2

7，更新子文檔信息

更新子文檔不會影響到父文檔。

示例：

# URI 中指定子文檔 id，並通過 routing 參數指定父文檔 id
PUT my_blogs/_doc/comment3?routing=blog2
{
    "comment": "Hello Hadoop??",
    "blog_comments_relation": {
      "name": "comment",
      "parent": "blog2"
    }
}

4，ES 動態 Mapping

ES 中的動態 Mapping 指的是：

在寫入新文檔的時候，如果索引不存在，ES 會自動創建索引。
動態 Mapping 使得我們可以不定義 Mapping，ES 會自動根據文檔信息，推斷出字段的類型。
但有時候也會推斷錯誤，不符合我們的預期，比如地理位置信息等。

ES 類型的自動識別規則如下：

5，修改文檔字段類型

字段類型是否能夠修改，分兩種情況：

對於新增字段：
- 如果 mappings._doc.dynamic 爲 ture，當有新字段寫入時，Mappings 會自動更新。
- 如果 mappings._doc.dynamic 爲 false，當有新字段寫入時，Mappings 不會更新；新增字段不會建立倒排索引，但是信息會出現在 _source 中。
- 如果 mappings._doc.dynamic 爲 strict，當有新字段寫入時，寫入失敗。
對於已有字段：
- 字段的類型不允許再修改。因爲如果修改了，會導致已有的信息無法被搜索。
- 如果希望修改字段類型，需要 Reindex 重建索引。

dynamic 有 3 種取值，使用下面 API 可以修改 dynamic 的值：

PUT index_name/_mapping
{
  "dynamic": false/true/strict
}

通過下面語法可以獲取一個索引的 Mapping：

GET index_name/_mapping

6，自定義 Mapping

自定義 Mapping 的語法如下：

PUT index_name
{
  "mappings" : {
    # 定義
  }
}

自定義 Mapping 的小技巧：

創建一個臨時索引，寫入一些測試數據
獲取該索引的 Mapping 值，修改後，使用它創建新的索引
刪除臨時索引

Mappings 有很多參數可以設置，可以參考這裏。

6.1，一個嵌套對象的 mappings

如果我們要在 ES 中插入如下結構的數據：

PUT blog/_doc/1
{
  "content":"I like Elasticsearch",
  "time":"2019-01-01T00:00:00",
  "user": { # 是一個對象類型
    "userid":1,
    "username":"Jack",
    "city":"Shanghai"
  }
}

其中的 user 字段是一個對象類型。

這種結構的數據對應的 mappings 應該像下面這樣定義：

PUT /blog
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      },
      "time": {
        "type": "date"
      },
      "user": {  # user 內部又嵌套了一個 properties
        "properties": {
          "city": {
            "type": "text"
          },
          "userid": {
            "type": "long"
          },
          "username": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

6.2，一個對象數組的 mappings

如果我們要在 ES 中插入如下結構的數據：

POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[ # actors 是一個數組類型，數組中的元素是對象類型
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },
    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }
  ]
}

其中的 actors 字段是一個數組類型，數組中的元素是對象類型。

像這種結構的數據對應的 mappings 應該像下面這樣定義：

PUT my_movies
{
  "mappings": {
	"properties": {
	  "actors": {         # actors 字段
		"properties": {   # 嵌入了一個 properties
		   "first_name": {"type": "keyword"},
		   "last_name": {"type": "keyword"}
		 }
		},
		"title": {
		   "type": "text",
		   "fields": {
			   "keyword": {
				   "type": "keyword",
				   "ignore_above": 256
				}
			}
		}
	}
  }
}

7，控制字段是否可被索引

可以通過設置字段的 index 值，來控制某些字段是否可被搜索。

index 有兩種取值：true / false，默認爲 true。

當某個字段的 index 值爲 false 時，ES 就不會爲該字段建立倒排索引（節省空間），該字段也不能被搜索（如果搜索的話會報錯）。

設置語法如下：

PUT index_name
{
    "mappings" : {          # 固定寫法
      "properties" : {      # 固定寫法
        "firstName" : {     # 字段名
          "type" : "text"
        },
        "lastName" : {      # 字段名
          "type" : "text"
        },
        "mobile" : {        # 字段名
          "type" : "text",
          "index": false    # 設置爲 false
        }
      }
    }
}

8，控制倒排索引項的內容

我們可以通過設置 index_options 的值來控制倒排索引項的內容，它有 4 種取值：

docs：只記錄文檔 id
freqs：記錄文檔 id 和 詞頻
positions：記錄文檔 id，詞頻 和 單詞 position
offsets：記錄文檔 id，詞頻，單詞 position 和 字符 offset

Text 類型的數據，index_options 的值默認爲 positions；其它類型的數據，index_options 的值默認爲 docs。

注意：對於 index_options 的默認值，不同版本的 ES，可能不一樣，請查看相應版本的文檔。

對於倒排索引項，其記錄的內容越多，佔用的空間也就越大，同時 ES 也會對字段進行更多的分析。

設置語法如下：

PUT index_name
{
  "mappings": {                      # 固定寫法
    "properties": {                  # 固定寫法
      "text": {                      # 字段名
        "type": "text",              # 字段的數據類型
        "index_options": "offsets"   # index_options 值
      }
    }
  }
}

9，設置 null 值可被搜索

默認情況下 null 和空數組[] 是不能夠被搜索的，比如下面的兩個文檔：

PUT my_index/_doc/1
{
  "status_code": null
}

PUT my_index/_doc/2
{
  "status_code": [] 
}

要想使得這兩個文檔能夠被搜索，需要設置 null_value 參數，如下：

PUT my_index
{
  "mappings": {
    "properties": {
      "status_code": {
        "type": "keyword",    # 只有 Keyword 類型的數據，才支持設置 null_value
        "null_value": "NULL"  # 將 null_value 設置爲 NULL，就可以通過 NULL 搜索了
      }
    }
  }
}

注意只有 Keyword 類型的數據，才支持設置 null_value，將 null_value 設置爲 NULL，就可以通過 NULL 搜索了，如下：

GET my-index/_search?q=status_code:NULL

10，索引模板

索引模板（Index Template）設置一個規則，自動生成索引的 Mappings 和 Settings。

索引模板有以下特性：

模板只在索引創建時起作用，修改模板不會影響已創建的索引。
可以設置多個索引模板，這些設置會被 merge 在一起。
可以設置 order 的數值，控制 merge 的過程。

多個模板時的 merge 規則，當一個索引被創建時：

使用 ES 默認的 mappings 和 settings。
使用 order 值低的模板。
使用 order 值高的模板，它會覆蓋 order 值低的模板。
使用用戶自帶的，指定的 mappings 和 settings，這個級別的最高，會覆蓋之前所有的。

對於相同字段的不同只會進行覆蓋，對於不同的字段會進行疊加依次使用。

索引模板示例：

PUT _template/template_1  # template_1 是自定義的索引模板的名稱
{
  "index_patterns": ["te*", "bar*"], # 匹配索引的規則，該模板會作用於這些索引名上
  "settings": {                      # settings 設置
    "number_of_shards": 1
  },
  "mappings": {                      # mappings 設置
    "_source": {
      "enabled": false
    },
    "properties": {
      "host_name": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date",
        "format": "EEE MMM dd HH:mm:ss Z yyyy"
      }
    }
  }
}

多個索引模板：

PUT /_template/template_1
{
    "index_patterns" : ["*"],
    "order" : 0,
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_source" : { "enabled" : false }
    }
}

PUT /_template/template_2
{
    "index_patterns" : ["te*"],
    "order" : 1,
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_source" : { "enabled" : true }
    }
}

11，動態模板

動態模板（Dynamic Template）用於設置某個指定索引中的字段的數據類型。

（本節完。）

推薦閱讀：

ElasticSearch URI 查詢

ElasticSearch DSL 查詢

ElasticSearch 文檔及操作

ElasticSearch 搜索模板與建議

ElasticSearch 聚合分析

歡迎關注作者公衆號，獲取更多技術乾貨。

ElasticSearch 中的 Mapping

1，ES 中的 Mapping

2，ES 字段的 mapping 參數

2.1，fields 參數

3，ES 字段的數據類型

3.1，Nested 類型

1，爲什麼需要 Nested 類型

2，使用 Nested 類型

3，搜索 Nested 類型

4，聚合 Nested 類型

3.2，Join 類型

1，定義 Join 類型

2，插入 Join 數據

3，parent_id 查詢

4，has_child 查詢

5，has_parent 查詢

6，通過子文檔 id 查詢子文檔信息

7，更新子文檔信息

4，ES 動態 Mapping

5，修改文檔字段類型

6，自定義 Mapping

6.1，一個嵌套對象的 mappings

6.2，一個對象數組的 mappings

7，控制字段是否可被索引

8，控制倒排索引項的內容

9，設置 null 值可被搜索

10，索引模板

11，動態模板

Window 安裝 Python 失敗 0x80070643，發生嚴重錯誤

《最新出爐》系列入門篇-Python+Playwright自動化測試-41-錄製視頻

Elastic App Search 快速構建 ES 應用

ElasticSearch 集羣的規劃部署與運維

ElasticSearch 分佈式集羣

ElasticSearch 中的 Mapping

ElasticSearch 集羣安全

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結