ElasticSearch踩坑記錄

結繩記事，記錄，思考，方有成長~

環境信息：ElasticSearch 7.X

1. 使用`term`查詢`text分詞`的字段，實現模糊查詢，返回結果爲空。

比如我打算根據中國來搜索我是中國人這條記錄，但並未查到。

"query": {
	"term": {
		"title": "中國"
	}
}

原因：在創建mapping時未指定分詞器，雖然text字段在保存到ES前會先分詞，構建倒排索引，但如果只指定這個字段的type爲text這1個屬性，則默認分詞後的效果爲我、是、中、國、人（即拆成每一個漢字，可參照第二部分的執行結果），所以需要指定分詞器

# 1 構建mapping
PUT diary
{
  "settings": {
    "number_of_shards": "4",
    "number_of_replicas": "1"
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        # 指定分詞器！！！，否則會被分成一個個漢字
        "analyzer": "ik_max_word"
      }
    }
  }
}

# 2 寫入記錄
POST diary/_doc/111
{
  "title": "我是中國人"
}
# 3 term查詢
POST diary/_search
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "title": "中國"
        }
      }
    }
  }
}

2. 如何查看分詞效果

# 2.1未指定分詞器
POST diary/_analyze
{
  "text": "我是中國人"
}
 # 分詞效果如下
 {
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<IDEOGRAPHIC>",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "中",
      "start_offset": 2,
      "end_offset": 3,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "國",
      "start_offset": 3,
      "end_offset": 4,
      "type": "<IDEOGRAPHIC>",
      "position": 3
    },
    {
      "token": "人",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<IDEOGRAPHIC>",
      "position": 4
    }
  ]
}

# 2.2 指定分詞器
POST diary/_analyze
{
  "text": "我是中國人",
  "analyzer": "ik_max_word"
}
# 執行結果如下
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "CN_CHAR",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "中國人",
      "start_offset": 2,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "中國",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "國人",
      "start_offset": 3,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 4
    }
  ]
}

3. ik_smart和ik_max_word區別

ik_smart：粗略分詞，如果詞項有包含關係，則只保留詞項長度最大的那個；

# POST diary/_analyze
{
  "text": "我是中國人",
  "analyzer": "ik_smart"
}
# 分詞結果
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "CN_CHAR",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "中國人",
      "start_offset": 2,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 2
    }
  ]
}

ik_max_word：細分詞，不管詞項是否存在包含關係，都會作爲分詞結果。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

ElasticSearch踩坑記錄

1. 使用`term`查詢`text分詞`的字段，實現模糊查詢，返回結果爲空。

2. 如何查看分詞效果

3. ik_smart和ik_max_word區別

京東面試：如何進行JVM調優？

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

Python 將PowerPoint (PPT/PPTX) 轉爲HTML

SQL優化-20231016

MySQL執行計劃EXPLAIN

ElasticSearch複雜查詢-指標聚合、桶聚合

如何使用Vue從頭搭建前端應用之 “後端視角”

Java 日記框架

開發過程中如何快速定位問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

ElasticSearch踩坑記錄

1. 使用term查詢text分詞的字段，實現模糊查詢，返回結果爲空。

2. 如何查看分詞效果

3. ik_smart和ik_max_word區別

1. 使用`term`查詢`text分詞`的字段，實現模糊查詢，返回結果爲空。