編程界的小學生

一、舉例

用一個例子來說明mapping到底是什麼玩意。

1、數據準備

PUT /blog/_doc/1
{
  "create_time": "2020-05-01",
  "title": "first article",
  "content": "xxxxxxx",
  "author_id": 123
}

PUT /blog/_doc/2
{
  "create_time": "2020-05-02",
  "title": "second article",
  "content": "xxxxxxxxxx",
  "author_id": 123
}

PUT /blog/_doc/3
{
  "create_time": "2020-05-03",
  "title": "third article",
  "content": "xxxxxxxxxxxxxxxx",
  "author_id": 123
}

2、搜索

GET /blog/_search?q=2020
0條結果輸出。

GET /blog/_search?q=2020-05-01
1條結果輸出，_id=1的那條。

GET /blog/_search?q=create_time:2020-05-01
1條結果輸出，_id=1的那條。

GET /blog/_search?q=create_time:2020
0條結果輸出。

3、分析

爲什麼上述的結果是那樣的？理想狀態下第一搜索方式不該三條都出來嗎？這就是因爲mapping再搗鬼，其實每個key對應的都有一種數據類型，比如create_date對應的就是date數據類型，每種數據類型es裏的分詞方式和搜索行爲都是不同的，這些都體現在mapping裏。mapping到底長什麼樣，怎麼設置，什麼查看等操作繼續往下看。

二、Mapping

1、是什麼

mapping就是ES數據字段field的type元數據，ES在創建索引的時候，dynamic mapping會自動爲不同的數據指定相應mapping，mapping中包含了字段的類型、搜索方式（精確匹配還是全文檢索）、分詞器等。

2、如何查看

查看mapping的語法很簡單：

GET /index/_mappings

比如：

GET /blog/_mapping

返回如下：

{
  "blog" : {
    "mappings" : {
      "properties" : {
        "author_id" : {
          "type" : "long"
        },
        "content" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "create_time" : {
          "type" : "date"
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

返回結果字段解釋說明：

key	解釋
blog	index名稱
properties	index下所有document的字段
type	字段對應值所屬數據類型
keyword	很有用，如果需要精確匹配的話就用field.keyword，這個es自動爲我們默認生成的
ignore_above	超過長度將被忽略，比如content字段就是最大長度255，超出255的搜索字符長度會被忽略，想想百度/Google搜索，都有長度限制。

那爲什麼

GET /blog/_search?q=2020一條搜不到？因爲默認ES的除了text數據類型，其他類型默認是不分詞的，所以2020-05-01這完整是一個詞，不會再次分詞。

3、創建mapping

3.1、語法

PUT /index
{
  "mappings": {
    "properties": {
        "field": {
          "mapping_parameter": "parameter_value"
        }
      }
  }
}

3.2、Demo

創建之前要先保證index是沒有的，也就是需先del掉 DELETE /blog，Mapping的創建只允許在index創建之前。

PUT /blog
{
  "mappings": {
    "properties": {
      "author_id": {
        "type": "long"
      },
      "title": {
        "type": "text",
        "analyzer": "english"
      },
      "content": {
        "type": "text", 
        "analyzer": "standard"
      },
      "create_date": {
        "type": "date"
      }
    }
  }
}

然後我們再次查看此mapping GET /blog/_mapping，會發現content分詞器是standard，title分詞器是english了。

{
  "blog" : {
    "mappings" : {
      "properties" : {
        "author_id" : {
          "type" : "long"
        },
        "content" : {
          "type" : "text",
          "analyzer" : "standard"
        },
        "create_date" : {
          "type" : "date"
        },
        "title" : {
          "type" : "text",
          "analyzer" : "english"
        }
      }
    }
  }
}

3.3、analyzer字段釋義

取值	解釋
no	無法通過檢索查詢到該字段
not_analyzed	將整個字段存儲爲一個詞，不進行再次分詞，常用於短語/成語/郵箱等場景
具體分詞器	比如：english，standard等

只有text類型默認是分詞的，分詞器是standard，其餘數據類型皆是not_analyzed（不分詞）。

3.4、測試mapping

我們剛對title設置了english分詞器，測試一把：

GET /blog/_analyze
{
  "field": "title",
  "text": "Hello-WorlD"
}

結果：

{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "world",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

發現進行了分詞操作，沒毛病，而且很完美的是大小寫自動轉換、自動去除了-符號，這些都是分詞器幫我們乾的。分詞器不是此篇重點，不多BB。

4、修改mapping

嘗試將author_id和create_date修改爲字符串類型。

PUT /blog
{
  "mappings": {
    "properties": {
      "author_id": {
        "type": "text"
      },
      "create_date": {
        "type": "text"
      }
    }
  }
}

結果：

{
  "error" : {
    "root_cause" : [
      {
        "type" : "resource_already_exists_exception",
        "reason" : "index [blog/cZEVhKIbS8GfHHmDS9rGYg] already exists",
        "index_uuid" : "cZEVhKIbS8GfHHmDS9rGYg",
        "index" : "blog"
      }
    ],
    "type" : "resource_already_exists_exception",
    "reason" : "index [blog/cZEVhKIbS8GfHHmDS9rGYg] already exists",
    "index_uuid" : "cZEVhKIbS8GfHHmDS9rGYg",
    "index" : "blog"
  },
  "status" : 400
}

會提示已存在，不讓修改mapping，很簡單的道理，你都有幾百萬數據了，你要修改mapping結構，那我數據類型、分詞啥的都要重新搞一遍，這我怎麼可能讓你修改！

5、mapping的屬性列表

屬性	解釋
analyzer	指定分析器
coerce	是否允許強制類型轉換 true： long類型的field插入“1”也能成功 false： long類型的field插入“1”會報錯，類型不匹配
doc values	爲了提升排序和聚合效率，默認true，如果確定不需要對字段進行排序或聚合，也不需要通過腳本訪問字段值，則可以禁用doc值以節省磁盤空間（`不支持text`和`annotated_text`）
eager_global_ordinals	用於聚合的字段上，優化聚合性能
ignore_above	超過長度將被忽略，想想百度/Google搜索都有長度限制
fields	給field創建多字段，用於不同目的（全文檢索或者聚合分析排序）
norms	是否禁用評分（作爲優化在filter和聚合字段上應該禁用）
search_analyzer	設置單獨的查詢時分析器
similarity	爲字段設置相關度算法，支持BM25、claassic（TF-IDF）等，默認TF/IDF

更多的查看官方文檔：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-params.html

三、定製化dynamic mapping

重點在於dynamic這個屬性，比如如下

舉個最簡單的例子：

PUT /my_blog
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "title": {
        "type": "text"
      },
      "tags": {
        "type": "object",
        "dynamic": "true"
      }
    }
  }
}

創建了一個只能有title和tags字段的index。因爲dynamic:strict。而tags裏配置了dynamic:true代表可以有任意額外的字段。而dynamic:strict代表嚴格模式，不允許其他額外字段，就比如mysql表結構都定了，然後你insert了一個不存在的字段，那就報錯找不到字段了。

測試：

PUT /my_blog/_doc/1
{
  "title" : "first article",
  "content" : "xxxxxxx",
  "tags" : {
    "language" : "java c++ python"
  }
}

結果：

{
  "error" : {
    "root_cause" : [
      {
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [content] within [_doc] is not allowed"
      }
    ],
    "type" : "strict_dynamic_mapping_exception",
    "reason" : "mapping set to strict, dynamic introduction of [content] within [_doc] is not allowed"
  },
  "status" : 400
}

不出意外的報錯，由於是嚴格模式，不允許動態增加額外字段。

測試tags：

PUT /my_blog/_doc/1
{
  "title" : "first article",
  "tags" : {
    "language" : "java c++ python",
    "stars": 10000
  }
}

結果：

{
  "_index" : "my_blog",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

成功了，因爲tags帶有"dynamic": "true"屬性，允許額外增加field。這就叫dynamic mapping。

更多的查看官方文檔：https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-mapping.html

四、數據類型

4.1、列表

數據類型	舉例
long	123456
double	123.111
boolean	true false
date	2020-05-28
string/text	字符串
integer/short/byte/float	基本類型
binary	二進制
range	區間類型，比如：integer_range、float_range long_range、double_range、date_range
Object	單個JSON對象
Nested	JSON對象數組

爲啥123456不是integer？因爲es的mapping_type是由JSON分析器檢測數據類型，而Json沒有隱式類型轉換（integer=>long or float=>double）,所以dynamic mapping會選擇一個比較寬的數據類型。

4.2、Object類型

PUT /blog/_doc/4
{
  "tags" : {
    "language" : "java c++ python",
    "city" : "beijing",
    "stars": 10000
  },
  "create_time": "2020-05-03",
  "title": "third article",
  "content": "xxxxxxxxxxxxxxxx",
  "author_id": 123
}

tags就是object類型，裏面包含三個字段：language、city、stars。

五、mapping總結

創建索引的時候ES會默認爲我們創建他認爲合適的mapping
mapping其實可以粗糙理解成“表結構”，定義了數據類型等
不同的數據類型分詞規則不同，string類型默認都是standard分詞器，其他類型默認都不分詞
可以提前手動創建index的mapping，進行自定義對每個field設置數據類型和分詞器等等
keyword這個屬性很牛逼的，ES自動爲我們生成的，因爲string類型默認都是分詞的，但是指定field.keyword去查的話就是精確匹配了
面向官方文檔學習：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

Elasticsearch的mapping到底是個什麼玩意？

編程界的小學生

一、舉例

1、數據準備

2、搜索

3、分析

二、Mapping

1、是什麼

2、如何查看

3、創建mapping

3.1、語法

3.2、Demo

3.3、analyzer字段釋義

3.4、測試mapping

4、修改mapping

5、mapping的屬性列表

三、定製化dynamic mapping

四、數據類型

4.1、列表

4.2、Object類型

五、mapping總結

爲什麼要⽤ Foundry

【筆記】動手學深度學習-預備知識

py發送email

MySQL 分庫分表方案，總結太全了。。

Qt/C++音視頻開發71-指定mjpeg/h264格式採集本地攝像頭/存儲文件到mp4/設備推流/採集推流

WPF開源輕便、快速的桌面啓動器

公司來了個新同事，把 DDD 運用得爐火純青！

1、數據結構&算法是什麼、爲什麼、怎麼學？

Redis面試必問的緩存穿透、緩存雪崩、緩存擊穿問題

“源碼”到底該怎麼學？

大白話講解Redis的事務

你知道Redis慢查詢嗎？

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結