elasticsearch安裝ik分詞器

一、概要：

1.es默認的分詞器對中文支持不好，會分割成一個個的漢字。ik分詞器對中文的支持要好一些，主要由兩種模式：ik_smart和ik_max_word
2.環境
操作系統：centos
es版本：6.0.0

二、安裝插件

1.插件地址：https://github.com/medcl/elasticsearch-analysis-ik
2.運行命令行：

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip

運行完成後會發現多了以下文件：esroot 下的plugins和config文件夾多了analysis-ik目錄。

三、重啓es

1.查找es進程

ps -ef | grep elastic

2.終止進程
從上面的結果可以看到es進程號是12776.
執行命令：

kill 12776

3.啓動es後臺運行

./bin/sh elastic search –d

提醒：重啓es會重新分片，線上環境要注意了。

四、測試

1.使用ik_max_word分詞

GET _analyze 
{ 
   "analyzer":"ik_max_word",
   "text":"中華人民共和國國歌"
}

分詞結果：

{
   "tokens": [
     {
       "token": "中華人民共和國",
       "start_offset": 0,
       "end_offset": 7,
       "type": "CN_WORD",
       "position": 0
     },
     {
       "token": "中華人民",
       "start_offset": 0,
       "end_offset": 4,
       "type": "CN_WORD",
       "position": 1
     },
     {
       "token": "中華",
       "start_offset": 0,
       "end_offset": 2,
       "type": "CN_WORD",
       "position": 2
     },
     {
       "token": "華人",
       "start_offset": 1,
       "end_offset": 3,
       "type": "CN_WORD",
       "position": 3
     },
     {
       "token": "人民共和國",
       "start_offset": 2,
       "end_offset": 7,
       "type": "CN_WORD",
       "position": 4
     },
     {
       "token": "人民",
       "start_offset": 2,
       "end_offset": 4,
       "type": "CN_WORD",
       "position": 5
     },
     {
       "token": "共和國",
       "start_offset": 4,
       "end_offset": 7,
       "type": "CN_WORD",
       "position": 6
     },
     {
       "token": "共和",
       "start_offset": 4,
       "end_offset": 6,
       "type": "CN_WORD",
       "position": 7
     },
     {
       "token": "國",
       "start_offset": 6,
       "end_offset": 7,
       "type": "CN_CHAR",
       "position": 8
     },
     {
       "token": "國歌",
       "start_offset": 7,
       "end_offset": 9,
       "type": "CN_WORD",
       "position": 9
     }
   ]
}

2.使用ik_smart分詞

GET _analyze 
{ 
   "analyzer":"ik_smart",
   "text":"中華人民共和國國歌"
}

分詞結果：

{
   "tokens": [
     {
       "token": "中華人民共和國",
       "start_offset": 0,
       "end_offset": 7,
       "type": "CN_WORD",
       "position": 0
     },
     {
       "token": "國歌",
       "start_offset": 7,
       "end_offset": 9,
       "type": "CN_WORD",
       "position": 1
     }
   ]
}

五、java api分詞測試

1.調用ik_max_word分詞

@Test
public void analyzer_ik_max_word() throws Exception {
     java.lang.String text = "提前祝大家春節快樂！";

    TransportClient client = EsClient.get();
     AnalyzeRequest request = (new AnalyzeRequest()).analyzer("ik_max_word").text(text);
     List<AnalyzeResponse.AnalyzeToken> tokens = client.admin().indices().analyze(request).actionGet().getTokens();
     System.out.println(tokens.size());//6
     for (AnalyzeResponse.AnalyzeToken token : tokens) {
         System.out.println(token.getTerm() + " ");
     }
}

結果：

6
提前 
祝 
大家 
春節快樂 
春節 
快樂

2.調用ik_smart分詞

@Test
public void analyzer_ik_smart() throws Exception {
     java.lang.String text = "提前祝大家春節快樂！";

    TransportClient client = EsClient.get();
     AnalyzeRequest request = (new AnalyzeRequest()).analyzer("ik_smart").text(text);
     List<AnalyzeResponse.AnalyzeToken> tokens = client.admin().indices().analyze(request).actionGet().getTokens();
     System.out.println(tokens.size());
     for (AnalyzeResponse.AnalyzeToken token : tokens) {
         System.out.println(token.getTerm() + " ");
     }
}

結果：

4
提前 
祝 
大家 
春節快樂

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

elasticsearch安裝ik分詞器

一、概要：

二、安裝插件

三、重啓es

四、測試

五、java api分詞測試

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

nodejs學習06——小案例

評估統計算法在銀行僞造鈔票檢測中的價值

C# Xmlserializer 程序集內存泄露

Java ThreadPoolShutdown

5月21日相聚上海張江！與文心大模型一起共建大模型產業應用生態圈

原來你是這樣的JAVA--[07]聊聊Integer和BigDecimal

敬姐推薦書單（持續更新）

mdbook安裝使用實錄

跟敬姐讀《程序員的自我修養》

若依(ruoyi)開源系統-多數據源問題踩坑實錄

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結