Elasticsearch全文檢索企業開發記錄總結（三）：Mapping相關配置

理解Mapping

什麼是mapping

ES的mapping非常類似於靜態語言中的數據類型：聲明一個變量爲int類型的變量，以後這個變量都只能存儲int類型的數據。同樣的，一個number類型的mapping字段只能存儲number類型的數據。

同語言的數據類型相比，mapping還有一些其他的含義，mapping不僅告訴ES一個field中是什麼類型的值，它還告訴ES如何索引數據以及數據是否能被搜索到。

當你的查詢沒有返回相應的數據，你的mapping很有可能有問題。當你拿不準的時候，直接檢查你的mapping。
剖析mapping

一個mapping由一個或多個analyzer組成，一個analyzer又由一個或多個filter組成的。當ES索引文檔的時候，它把字段中的內容傳遞給相應的analyzer，analyzer再傳遞給各自的filters。

filter的功能很容易理解：一個filter就是一個轉換數據的方法，輸入一個字符串，這個方法返回另一個字符串，比如一個將字符串轉爲小寫的方法就是一個filter很好的例子。

一個analyzer由一組順序排列的filter組成，執行分析的過程就是按順序一個filter一個filter依次調用， ES存儲和索引最後得到的結果。

總結來說， mapping的作用就是執行一系列的指令將輸入的數據轉成可搜索的索引項。

IK+pinyin 分詞器安裝與配置

ES作爲最強大的全文檢索工具，中英文分詞幾乎是必備功能，下面簡單說明下分詞器安裝步驟（詳細步驟網上很多，這裏選擇nextbang 作者爲例）：

下載中文/拼音分詞器
IK中文分詞器：https://github.com/medcl/elasticsearch-analysis-ik
拼音分詞器：https://github.com/medcl/elasticsearch-analysis-pinyin
安裝
通過releases找到和es對應版本的zip文件，或者source文件（自己通過mvn package打包）；當然也可以下載最新master的代碼。
進入elasticsearch安裝目錄/plugins；mkdir pinyin；cd pinyin；
cp 剛纔打包的zip文件到pinyin目錄；unzip解壓
部署後，記得重啓es節點
配置

settings配置

PUT  my_index/_settings 
"index" : {
        "number_of_shards" : "3",
        "number_of_replicas" : "1",
        "analysis" : {
          "analyzer" : {
            "default" : {
              "tokenizer" : "ik_max_word"
            },
            "pinyin_analyzer" : {
              "tokenizer" : "my_pinyin"
            }
          },
          "tokenizer" : {
            "my_pinyin" : {
              "keep_separate_first_letter" : "false",
              "lowercase" : "true",
              "type" : "pinyin",
              "limit_first_letter_length" : "16",
              "keep_original" : "true",
              "keep_full_pinyin" : "true"
            }
          }
        }
      }

mapping 配置

PUT my_index/index_type/_mapping
"ep" : {
        "_all" : {
          "analyzer" : "ik_max_word"
        },
        "properties" : {
            "name" : {
                "type" : "text",
                "analyzer" : "ik_max_word",
                "include_in_all" : true,
                "fields" : {
                    "pinyin" : {
                        "type" : "text",
                        "term_vector" : "with_positions_offsets",
                        "analyzer" : "pinyin_analyzer",
                        "boost" : 10.0
                      }
                 }
            }
      }
}

4、測試

通過_analyze測試下分詞器是否能正常運行：

GET my_index/_analyze
{
    "text":["劉德華"],
    "ananlyzer":"pinyin_analyzer"
}
向index中put中文數據：

POST my_index/index_type -d'
{
"name":"劉德華"
}
'

Mapping映射設計

根據全文檢索業務的需求進行數據表的映射設計，下面爲本項目的設計原則：

根據業務展示頁面每個數據內容涉及到的字段進行類型確認、是否需要進行聚合、是否加入索引以及是否需要進行分詞等。
以每個展示單位爲映射整體，進行主表與關聯的數據表進行一對多映射，保證在聚合查詢時，可以得到每個展示單位的數據的聚合結果。

酒店表數據Mapping設計：

{
  "hotel": {
    "mappings": {
      "data": {
        "properties": {
          "address": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "areaCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "areaName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "autotrophy": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "cityCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "cityName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "coordinate": {
            "type": "geo_point"
          },
          "createTime": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
          },
          "deleted": {
            "type": "boolean"
          },
          "dictGuestQualificationName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "dictTypeByPositionCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "dictTypeByPositionName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "dictTypeByServiceCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "dictTypeByServiceName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "featurePicPath": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "forumStatus": {
            "type": "long"
          },
          "geohash": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "grade": {
            "type": "double"
          },
          "hotelExtendPicPath1": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "hotelExtendPicPath2": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "hotelExtendPicPath3": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "id": {
            "type": "long"
          },
          "initGrade": {
            "type": "double"
          },
          "level": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "lobbyPicPath": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "phone": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "provinceCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "provinceName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "reason": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "roomPicPath": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "scale": {
            "type": "long"
          },
          "score": {
            "type": "double"
          },
          "serviceScope": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "soundphone": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "star": {
            "type": "long"
          },
          "statusCode": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "statusName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "streetName": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              },
              "pinyin": {
                "type": "text",
                "analyzer": "pinyin"
              }
            },
            "analyzer": "ik_max_word"
          },
          "updateTime": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
          },
          "vipEnable": {
            "type": "long"
          }
        }
      }
    }
  }
}

Mapping解析

上面mapping涉及到的：

type：數據類型
fields ：可以對一個字段提供多種索引模式，例如，一個string 字段可以映射爲text全文搜索的字段，也可以映射keyword爲排序或聚合的字段
analyzer：指定分詞器
ignore_above ：超過多少個字符的文本，將會被忽略，不被索引

…

官方文檔：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html#_multi_fields_2

Elasticsearch全文檢索企業開發記錄總結（三）：Mapping相關配置

理解Mapping

IK+pinyin 分詞器安裝與配置

Mapping映射設計

Mapping解析

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

大齡程序員思考

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

Elasticsearch全文檢索企業開發記錄總結（三）：Mapping相關配置

Elasticsearch全文檢索企業開發記錄總結（二）：ES客戶端搭建

LinkList ArrayList 深入研究對比

Elasticsearch全文檢索企業開發記錄總結（一）：整體架構

Spring+Spring MVC+Mybatis框架手動整合（筆記遷移）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結