前言

本文基於elasticsearch7.3.0版本

說明

edge_ngram和ngram是elasticsearch內置的兩個tokenizer和filter

實例

步驟

自定義兩個分析器edge_ngram_analyzer和ngram_analyzer
進行分詞測試

創建測試索引

PUT analyzer_test
{
  "settings": {
    "refresh_interval": "1s",
    "index": {
      "max_ngram_diff": 10
    },
    "analysis": {
      "analyzer": {
        "edge_ngram_analyzer": {
          "type": "custom",
          "char_filter": [],
          "tokenizer": "keyword",
          "filter": [
            "edge_ngram_filter"
          ]
        },
        "ngram_analyzer": {
          "type": "custom",
          "char_filter": [],
          "tokenizer": "keyword",
          "filter": [
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "edge_ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 11
        },
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 5
        }
      }
    }
  }
}

測試edge_ngram_analyzer分析器

POST /analyzer_test/_analyze
{
  "text": "虹橋機場",
  "analyzer": "edge_ngram_analyzer"
}

{
  "tokens" : [
    {
      "token" : "虹",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "虹橋",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "虹橋機",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "虹橋機場",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    }
  ]
}

測試ngram_analyzer分析器

POST /analyzer_test/_analyze
{
  "text": "虹橋機場",
  "analyzer": "ngram_analyzer"
}

{
  "tokens" : [
    {
      "token" : "虹橋",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "虹橋機",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "虹橋機場",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "橋機",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "橋機場",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "機場",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    }
  ]
}

區別

edge_ngram是從第一個字符開始,按照步長,進行分詞,適合前綴匹配場景,比如:訂單號,手機號,郵政編碼的檢索
ngram是從每一個字符開始,按照步長,進行分詞,適合前綴中綴檢索

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Elasticsearch-edge_ngram和ngram的區別

前言

說明

實例

區別

webpack3基本使用

方法簡寫: a:function(){}可以簡寫爲a(){}

Vue項目-路由demo

使用vue腳手架搭建項目詳解

Vue項目中使用v-charts

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結