Elasticsearch 5.5.1 中文/拼音分詞親測有效

原創

leon_0204

2018-09-04 14:46

所有不說明elastic 版本的博客都是耍流氓。 ——某碼農

原文鏈接

版本如題。拼音和中文分詞一起的整個測試流程如下：

預備刪除 index

DELETE /index_name/
{
}

創建一個 index_name 的 index

PUT /index_name/
{
    "index": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {
                    "type": "custom",
                    "tokenizer": "ik_max_word",
                    "filter": ["my_pinyin", "word_delimiter"]
                }
            },
            "filter": {
                "my_pinyin": {
                    "type": "pinyin",
                    "first_letter": "prefix",
                    "padding_char": " "
                }
            }
        }
    }
}

修改 type 的 mapping

PUT /index_name/app/_mapping
{
    "app": {
        "properties": {
            "ProductCName": {
                "type": "keyword",
                "fields": {
                    "pinyin": {
                        "type": "text",
                        "store": false,
                        "term_vector": "with_positions_offsets",
                        "analyzer": "ik_pinyin_analyzer",
                        "boost": 10
                    }
                }
            },
            "ProductEName":{  
                "type":"text",  
                "analyzer": "ik_max_word"  
            },
            "Description":{  
                "type":"text",  
                "analyzer": "ik_max_word"  
            }
        }
    }
}

創建測試數據

PUT /index_name/app/1
{
  "ProductCName":"口紅世家",
  "ProductEName":"Red History",
  "Description":"口紅真是很棒的東西呢"
}

測試拼音分詞效果

POST /index_name/_analyze?pretty
{
  "analyzer": "pinyin",
  "text":"王者榮耀"
}

{
  "tokens": [
    {
      "token": "wang",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "wzry",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "zhe",
      "start_offset": 1,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "rong",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "yao",
      "start_offset": 3,
      "end_offset": 4,
      "type": "word",
      "position": 3
    }
  ]
}