【Elasticsearch】安裝使用ik中文分詞

原創

2020-02-23 15:21

序言

Elasticsearch默認提供的分詞器，會把每個漢字分開，而不是我們想要的根據關鍵詞來分詞。例如：

curl -XPOST  "http://localhost:9200/test/_analyze?analyzer=standard&pretty=true&text=我是中國人"

我們會得到這樣的結果：

{  
    tokens: [  
        {  
            token: text  
            start_offset: 2  
            end_offset: 6  
            type: <ALPHANUM>  
            position: 1  
        },
        {  
            token: 我  
            start_offset: 9  
            end_offset: 10  
            type: <IDEOGRAPHIC>  
            position: 2  
        },
        {  
            token: 是  
            start_offset: 10  
            end_offset: 11  
            type: <IDEOGRAPHIC>  
            position: 3  
        },
        {  
            token: 中  
            start_offset: 11  
            end_offset: 12  
            type: <IDEOGRAPHIC>  
            position: 4  
        },
        {  
            token: 國  
            start_offset: 12  
            end_offset: 13  
            type: <IDEOGRAPHIC>  
            position: 5  
        },
        {  
            token: 人  
            start_offset: 13  
            end_offset: 14  
            type: <IDEOGRAPHIC>  
            position: 6  
        }  
    ]  
}

正常情況下，這不是我們想要的結果，比如我們更希望 “中國人”，“中國”，“我”這樣的分詞，這樣我們就需要安裝中文分詞插件，ik就是實現這個功能的。

安裝

elasticsearch-analysis-ik 是一款中文的分詞插件，支持自定義詞庫。
安裝步驟：

到github網站下載源代碼，網站地址爲：https://github.com/medcl/elasticsearch-analysis-ik
master爲最新版本，tag可以選擇已經release的版本。
右側下方有一個按鈕“Download ZIP”，點擊下載源代碼elasticsearch-analysis-ik-master.zip。
解壓文件elasticsearch-analysis-ik-master.zip，進入下載目錄，執行命令：
unzip elasticsearch-analysis-ik-master.zip
將解壓目錄文件中config/ik文件夾複製到ES安裝目錄config文件夾下。
因爲是源代碼，此處需要使用maven打包，進入解壓文件夾中，執行命令：
mvn clean package
將打包得到的jar文件elasticsearch-analysis-ik-1.2.8-sources.jar複製到ES安裝目錄的lib目錄下。
在ES的配置文件config/elasticsearch.yml中增加ik的配置，在最後增加：

index:  
  analysis:                     
    analyzer:        
      ik:  
          alias: [ik_analyzer]  
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider  
      ik_max_word:  
          type: ik  
          use_smart: false  
      ik_smart:  
          type: ik  
          use_smart: true

或

index.analysis.analyzer.ik.type : "ik"
7. 重新啓動elasticsearch服務，這樣就完成配置了，收入命令：
curl -XPOST "http://localhost:9200/test/_analyze?analyzer=ik&pretty=true&text=我是中國人"
測試結果如下：

{  
    tokens: [  
        {  
            token: text  
            start_offset: 2  
            end_offset: 6  
            type: ENGLISH  
            position: 1  
        },
        {  
            token: 我  
            start_offset: 9  
            end_offset: 10  
            type: CN_CHAR  
            position: 2  
        },
        {  
            token: 中國人  
            start_offset: 11  
            end_offset: 14  
            type: CN_WORD  
            position: 3  
        },
        {  
            token: 中國  
            start_offset: 11  
            end_offset: 13  
            type: CN_WORD  
            position: 4  
        },
        {  
            token: 國人  
            start_offset: 12  
            end_offset: 14  
            type: CN_WORD  
            position: 5  
        }  
    ]  
}

說明：

ES安裝插件本來使用使用命令plugin來完成，但是我本機安裝ik時一直不成功，所以就使用源代碼打包安裝了。
自定義詞庫的方式，請參考 https://github.com/medcl/elasticsearch-analysis-ik

note：

target是jar的輸出目錄，release目錄是ik的jar包和依賴包的輸出目錄，如果沒有引入ik的依賴包會導致出現：
nested: NoClassDefFoundError[org/apache/http/client/ClientProtocolException]
錯誤

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【Elasticsearch】安裝使用ik中文分詞

序言

安裝

說明：

note：

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

lightdb數據庫超時相關控制參數

lightdb秒級增加列和刪除列（not null帶默認值）

Java ThreadPoolShutdown

【Linux進階】CentOS安裝MySQL數據庫

【Linux進階】CentOS安裝java環境

【Linux進階】Linux防火牆iptables詳解

【Elasticsearch】基礎知識

Java回調機制(CallBack)詳解

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結