ES索引字段類型參數_7_4_2

字段映射可配置參數

序號	參數	說明
1	analyzer	分詞器(常見的有standard,english,lowercase等)
2	boost	文檔相關度計算分數因子
3	coerce	是否強制ES字段接受類型不匹配的值
4	copy_to	拷貝字段值到其他字段上
5	doc_values	字段是否以列式存儲
6	dynamic	是否啓用動態映射
7	eager_global_ordinals	是否使用詞元編號
8	enabled	字段是否啓用
9	fielddata	text類型字段配置
10	fields	多字段
11	format	日期類型格式
12	ignore_above	忽略字段索引閾值
13	ignore_malformed	忽略字段索引
14	index_options	索引配置參數
15	index_phrases	組合詞元成新詞組
16	index_prefixes	詞元查詢字段限制
17	index	是否建立索引
18	meta	索引附加信息
19	normalizer
20	norms
21	null_value
22	position_increment_gap
23	properties
24	search_analyzer
25	similarity
26	store
27	term_vector

analyzer參數

只有text類型字段可以支持analyzer參數

//自定義analyzer
PUT custom_analyzer_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "std_folded":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":["lowercase","asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "analyzer_text":{
        "type": "text",
        "analyzer": "std_folded"
      }
    }
  }
}

GET custom_analyzer_index/_analyze
{
  "analyzer": "std_folded",
  "text": "Is this deja vu?"
}

GET custom_analyzer_index/_analyze
{
  "field": "analyzer_text",
  "text": "Is this deja vu?"
}

search_quote_analyzer配置,針對詞組設置特定的分詞器,這對於停用詞非常有效;
要設置停用詞需要使用三個分詞器設置字段:
1)、analyzer 用於所有詞語的索引,包括停用詞;
2)、search_analyzer 用於非短語查詢(會移除停用詞)
3)、search_quote_analyzer 用於短語查詢(不會移除停用詞)

PUT /custom_analyzer_index_search
{
  "settings": {
    "analysis": {
      "analyzer": {
        "analyzer_index_1":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":["lowercase"]
        },
        "analyzer_index_2":{
          "type":"custom",
          "tokenizer":"standard",
          "filter":["lowercase","english_stop"]
        }
      },
      "filter": {
        "english_stop":{
          "type":"stop",
          "stopwords":"_english_"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "analyzer_index_1",
        "search_analyzer": "analyzer_index_2",
        "search_quote_analyzer": "analyzer_index_1"
      }
    }
  }
}

PUT custom_analyzer_index_search/_doc/1
{
  "title":"The Quick Brown Fox"
}

PUT custom_analyzer_index_search/_doc/2
{
  "title":"A Quick Brown Fox"
}

GET custom_analyzer_index_search/_search

GET custom_analyzer_index_search/_search
{
  "query": {
    "query_string": {
      "query": "\"the quick brown fox\""
    }
  }
}

boost參數

每個字段會被自動應用因子boost用來計算文檔相關度分數

PUT param_boost_index
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "boost": 2
      },
      "content":{
        "type": "text"
      }
    }
  }
}

在計算文檔相關分時,title匹配時將比content匹配高一倍的分數,默認爲1.0,boost參數只在term查詢時生效(prefix,range和fuzzy不會生效);

不建議在索引時設置boost幾點理由:
1)、索引時的boost參數值不可改變,除非進行reindex;
2)、查詢的時候設置boost參數可以達到相同效果,不同之處在於可以根據需要設置boost值;
3)、在索引時設置boost參數會額外佔據磁盤空間,可能會降低計算出的文檔相關分;

coerce參數

插入ES的數據並非完全是合規的,例如期望一個字段是數值類型,但是傳入的時候以字符串形式傳入,這時候可以配置coerce參數來強制ES接收

PUT param_coerce_index_1
{
  "mappings": {
    "properties": {
      "number_one":{
        "type": "integer"
      },
      "number_two":{
        "type": "integer",
        "coerce":false
      }
    }
  }
}
//正常
PUT param_coerce_index_1/_doc/1
{
  "number_one":"10"
}
//報錯
PUT param_coerce_index_1/_doc/2
{
  "number_two":"10"
}
coecre參數在設置之後還可以通過api更改
PUT /param_coerce_index_1/_mapping
{
  "properties":{
    "number_two":{
      "type":"integer",
      "coerce":true
    }
  }
}

索引級別的coerce參數設置

//可以在索引級別配置參數index.mapping.coerce來約束es的行爲;
PUT /param_coerce_index_2
{
  "settings": {
    "index.mapping.coerce":false
  }, 
  "mappings": {
    "properties": {
      "number_one":{
        "type": "integer",
        "coerce":true
      },
      "number_two":{
        "type": "integer"
      }
    }
  }
}
//正常
PUT param_coerce_index_2/_doc/1
{
  "number_one":"10"
}
//將報錯
PUT param_coerce_index_2/_doc/2
{
  "number_two":"20"
}

copy_to參數

copy_to參數允許將多個值拷貝到一個組合字段中(可以作爲單個字段進行查詢)

PUT param_copy_to_index
{
  "mappings": {
    "properties": {
      "first_name":{
        "type": "text",
        "copy_to": "full_name"
      },
      "last_name":{
        "type": "text",
        "copy_to": "full_name"
      },
      "full_name":{
        "type": "text"
      }
    }
  }
}

//此處需要注意first_name與last_name字段順序,查詢時保證query字段與index時一致
PUT param_copy_to_index/_doc/1
{
  "last_name":"Smith",
  "first_name":"John"
}

//查看索引數據
GET param_copy_to_index/_search
{
  "query": {
    "match": {
      "full_name": {
        "query": "Smith John",
        "operator": "and"
      }
    }
  }
}

需要明確的幾點:
1)、copy_to只是將字段值複製了,而非其分詞(terms);
2)、原始的_source字段將不會被更改而顯示copy_to的值;
3)、同一個值可以被拷貝到多個字段上,形如 “copy_to”: [“field1”,“field2”];
4)、不允許遞歸地進行值拷貝;

doc_values參數

默認情況下索引中大部分字段都被索引從而使得字段可被搜索,倒排索引允許從排序過的詞元(term)列表中查找詞元並返回詞元對應關聯的文檔;
不同於根據詞元找文檔，排序、聚合及腳本查詢等操作需要不同的數據訪問模式,需要能夠找到文檔並且對應字段上包含對應的詞元;
doc_values值是在文檔索引時構建的磁盤數據結構,其存儲與_source相同的值,但是是面向列的方式存儲,這使得排序和聚合等操作能得以實現;doc_values支持除text與annotated_text以外的字段類型;

//若字段確定無排序或聚合的需求,可以將其doc_values值置爲false,此處session_id的doc_values設置爲false
PUT param_doc_value_index
{
  "mappings": {
    "properties": {
      "status_code":{
        "type": "keyword"
      },
      "session_id":{
        "type": "keyword",
        "doc_values": false
      }
    }
  }
}

dynamic參數

dynamic可選參數

參數	說明
true	新字段將自動發現並且映射
false	新字段自動發現將被忽略,這些字段將不會建立索引從而導致不可搜索,不過這部分字段信息仍會在_source字段中出現,這些字段也不會被添加到映射中;
strict	檢測到有新字段將拋出異常且文檔不會建立索引

enabled參數

默認情況下ES會嘗試爲所有字段建立索引,但有些情況可能只想存儲數據而非爲其建立索引;
enable參數可以針對索引級別和對象字段級別設置,這樣可以使ES跳過對字段內容的解析,但是json內容仍然可以從_source字段中檢索出來,但是不可單獨被檢索或以其他形式存儲;

PUT params_enabled_index
{
  "mappings": {
    "properties": {
      "user_id":{
        "type": "keyword"
      },
      "last_updated":{
        "type": "date"
      },
      "session_data":{
        "type": "object",
        "enabled":false
      }
    }
  }
}

PUT params_enabled_index/_doc/session_1
{
  "user_id":"kimchy",
  "session_data":{
    "arbitrary_object":{
      "some_array":["foo","bar",{"clazz":2}]
    }
  },
  "last_updated":"2020-06-01T10:00:00"
}

GET params_enabled_index/_mapping

GET params_enabled_index/_doc/session_1

字段或索引上設置的enabled參數不可更改,字段enabled參數置成false,ES將不再解析字段內容,這樣可以添加一個非object類型的數據到一個object類型的字段上;

PUT params_enabled_all_index
{
  "mappings": {
    "enabled":false
  }
}
//更新索引字段映射,無效
PUT params_enabled_all_index/_mapping
{
  "properties":{
    "username":{
      "type":"text",
      "fields":{
        "keyword":{
          "type":"keyword"
        }
      }
    }
  }
}

PUT params_enabled_all_index/_doc/session_1
{
  "user_id":"kimchy",
  "session_data":{
    "arbitrary_object":{
      "some_array":["foo","bar",{"clazz":2}]
    }
  },
  "last_updated":"2020-06-01T10:00:00"
}
//增加username
PUT params_enabled_all_index/_doc/session_3
{
  "user_id":"kimchy",
  "session_data":{
    "arbitrary_object":{
      "some_array":["foo","bar",{"clazz":2}]
    }
  },
  "last_updated":"2020-06-01T10:00:00",
  "username":"bbbb"
}

GET params_enabled_all_index/_mapping
//能夠查詢記錄且詳情
GET params_enabled_all_index/_doc/session_3
//無法查詢結果
GET params_enabled_all_index/_search
{
  "query": {
    "match": {
      "username.keyword": "bbbb"
    }
  }
}

//定義類型爲object,因爲enabled參數,字符串可插入
PUT params_enabled_field_parse_ignore_index
{
  "mappings": {
    "properties": {
      "session_data":{
        "type": "object",
        "enabled":false
      }
    }
  }
}

PUT params_enabled_field_parse_ignore_index/_doc/1
{
  "session_data":"foo bar"
}

ES索引字段類型參數_7_4_2

analyzer參數

boost參數

coerce參數

copy_to參數

doc_values參數

dynamic參數

enabled參數

如何熟悉一個陌生系統

裁員了！別錯過2024年大數據工程師必備的10項技能

低代碼集成Java系列：高效構建自定義插件

盛最多水的容器解法

ES領域特定語言(DSL)查詢_9

canal版本升級方案

ES索引中的字段類型_7_2

ES基本概念及操作_3

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結