二、ElasticSearch6 安裝中文分詞器(IK Analysis)

通過前一篇的安裝後:ElasticSearch6.2.4 安裝OK了 我們繼續安裝IK分詞器

一、安裝

    以下是版本對照表(GitHub地址): 

IK versionES version
master6.x -> master
6.2.46.2.4
6.1.36.1.3
5.6.85.6.8
5.5.35.5.3
5.4.35.4.3
5.3.35.3.3
5.2.25.2.2
5.1.25.1.2
1.10.62.4.6
1.9.52.3.5
1.8.12.2.1
1.7.02.1.1
1.5.02.0.0
1.2.61.0.0
1.2.50.90.x
1.1.30.20.x
1.0.00.16.2 -> 0.19.0

  1、離線安裝:

   (1、)如下地址下載最新包(自行檢查對應版本號)

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.4/elasticsearch-analysis-ik-6.2.4.zip

   (2、)解壓到es安裝目錄下

[payment@localhost elasticsearch-6.2.4]$ cd plugins/
[payment@localhost plugins]$ pwd
/home/payment/elasticSearch/elasticsearch-6.2.4/plugins
[payment@localhost plugins]$ unzip elasticsearch-analysis-ik-6.2.4.zip

   2、在線安裝(推薦):

[payment@gameServer elasticsearch-6.2.4]$ pwd
/home/payment/elasticSearch/elasticsearch-6.2.4
[payment@gameServer elasticsearch-6.2.4]$ 
[payment@gameServer elasticsearch-6.2.4]$ ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.4/elasticsearch-analysis-ik-6.2.4.zip
-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.4/elasticsearch-analysis-ik-6.2.4.zip
[=================================================] 100%   
-> Installed analysis-ik
[payment@gameServer elasticsearch-6.2.4]$ 

 二、重啓ElasticSearch服務

    1、停止服務:

[payment@gameServer elasticsearch-6.2.4]$ ps -ef|grep elasticsearch
payment  27352     1  0 10:50 pts/0    00:00:39 /usr/local/java/jdk1.8.0_161//bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch.oFTj99LA -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:logs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=64m -Des.path.home=/home/payment/elasticSearch/elasticsearch-6.2.4 -Des.path.conf=/home/payment/elasticSearch/elasticsearch-6.2.4/config -cp /home/payment/elasticSearch/elasticsearch-6.2.4/lib/* org.elasticsearch.bootstrap.Elasticsearch -d
payment  29017 26594  0 13:10 pts/0    00:00:00 grep elasticsearch
[payment@gameServer elasticsearch-6.2.4]$ 
[payment@gameServer elasticsearch-6.2.4]$ 
[payment@gameServer elasticsearch-6.2.4]$ kill -9 27352

    2、啓動ElasticSearch 

[payment@gameServer elasticsearch-6.2.4]$ pwd
/home/payment/elasticSearch/elasticsearch-6.2.4
[payment@gameServer elasticsearch-6.2.4]$ ./bin/elasticsearch -d && tail -f logs/elasticsearch.log
[2018-06-06T13:12:28,029][INFO ][o.e.d.DiscoveryModule    ] [SdEluaQ] using discovery type [zen]
[2018-06-06T13:12:28,536][INFO ][o.e.n.Node               ] initialized
[2018-06-06T13:12:28,536][INFO ][o.e.n.Node               ] [SdEluaQ] starting ...
[2018-06-06T13:12:28,711][INFO ][o.e.t.TransportService   ] [SdEluaQ] publish_address {172.17.63.15:9300}, bound_addresses {172.17.63.15:9300}
[2018-06-06T13:12:28,721][INFO ][o.e.b.BootstrapChecks    ] [SdEluaQ] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-06-06T13:12:31,765][INFO ][o.e.c.s.MasterService    ] [SdEluaQ] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {SdEluaQ}{SdEluaQkTfi1p-yRtlxHSA}{IYnq99tLTjKcjGSXoxTS5w}{172.17.63.15}{172.17.63.15:9300}
[2018-06-06T13:12:31,769][INFO ][o.e.c.s.ClusterApplierService] [SdEluaQ] new_master {SdEluaQ}{SdEluaQkTfi1p-yRtlxHSA}{IYnq99tLTjKcjGSXoxTS5w}{172.17.63.15}{172.17.63.15:9300}, reason: apply cluster state (from master [master {SdEluaQ}{SdEluaQkTfi1p-yRtlxHSA}{IYnq99tLTjKcjGSXoxTS5w}{172.17.63.15}{172.17.63.15:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-06-06T13:12:31,782][INFO ][o.e.h.n.Netty4HttpServerTransport] [SdEluaQ] publish_address {172.17.63.15:9200}, bound_addresses {172.17.63.15:9200}
[2018-06-06T13:12:31,782][INFO ][o.e.n.Node               ] [SdEluaQ] started
[2018-06-06T13:12:31,921][INFO ][o.e.g.GatewayService     ] [SdEluaQ] recovered [0] indices into cluster_state
[2018-06-06T13:13:42,980][INFO ][o.e.n.Node               ] [] initializing ...
[2018-06-06T13:13:43,141][INFO ][o.e.e.NodeEnvironment    ] [SdEluaQ] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [402.8gb], net total_space [442.7gb], types [rootfs]
[2018-06-06T13:13:43,141][INFO ][o.e.e.NodeEnvironment    ] [SdEluaQ] heap size [990.7mb], compressed ordinary object pointers [true]
[2018-06-06T13:13:43,143][INFO ][o.e.n.Node               ] node name [SdEluaQ] derived from node ID [SdEluaQkTfi1p-yRtlxHSA]; set [node.name] to override
[2018-06-06T13:13:43,143][INFO ][o.e.n.Node               ] version[6.2.4], pid[29196], build[ccec39f/2018-04-12T20:37:28.497551Z], OS[Linux/2.6.32-696.28.1.el6.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_161/25.161-b12]
[2018-06-06T13:13:43,143][INFO ][o.e.n.Node               ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.vXQsyXAG, -XX:+HeapDumpOnOutOfMemoryError, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.path.home=/home/payment/elasticSearch/elasticsearch-6.2.4, -Des.path.conf=/home/payment/elasticSearch/elasticsearch-6.2.4/config]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [aggs-matrix-stats]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [analysis-common]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [ingest-common]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [lang-expression]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [lang-mustache]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [lang-painless]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [mapper-extras]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [parent-join]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [percolator]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [rank-eval]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [reindex]
[2018-06-06T13:13:43,782][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [repository-url]
[2018-06-06T13:13:43,783][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [transport-netty4]
[2018-06-06T13:13:43,783][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded module [tribe]
[2018-06-06T13:13:43,783][INFO ][o.e.p.PluginsService     ] [SdEluaQ] loaded plugin [analysis-ik]
[2018-06-06T13:13:46,137][INFO ][o.e.d.DiscoveryModule    ] [SdEluaQ] using discovery type [zen]
[2018-06-06T13:13:46,605][INFO ][o.e.n.Node               ] initialized
[2018-06-06T13:13:46,605][INFO ][o.e.n.Node               ] [SdEluaQ] starting ...
[2018-06-06T13:13:46,770][INFO ][o.e.t.TransportService   ] [SdEluaQ] publish_address {172.17.63.15:9300}, bound_addresses {172.17.63.15:9300}
[2018-06-06T13:13:46,778][INFO ][o.e.b.BootstrapChecks    ] [SdEluaQ] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2018-06-06T13:13:49,828][INFO ][o.e.c.s.MasterService    ] [SdEluaQ] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {SdEluaQ}{SdEluaQkTfi1p-yRtlxHSA}{OJnGIoaBRDaK0mBJRTarMQ}{172.17.63.15}{172.17.63.15:9300}
[2018-06-06T13:13:49,835][INFO ][o.e.c.s.ClusterApplierService] [SdEluaQ] new_master {SdEluaQ}{SdEluaQkTfi1p-yRtlxHSA}{OJnGIoaBRDaK0mBJRTarMQ}{172.17.63.15}{172.17.63.15:9300}, reason: apply cluster state (from master [master {SdEluaQ}{SdEluaQkTfi1p-yRtlxHSA}{OJnGIoaBRDaK0mBJRTarMQ}{172.17.63.15}{172.17.63.15:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-06-06T13:13:49,853][INFO ][o.e.h.n.Netty4HttpServerTransport] [SdEluaQ] publish_address {172.17.63.15:9200}, bound_addresses {172.17.63.15:9200}
[2018-06-06T13:13:49,861][INFO ][o.e.n.Node               ] [SdEluaQ] started
[2018-06-06T13:13:49,973][INFO ][o.e.g.GatewayService     ] [SdEluaQ] recovered [0] indices into cluster_state
啓動並監聽啓動日誌:
   看到:加載了 分詞插件 
loaded plugin [analysis-ik]

三、檢查分詞器

   檢查分詞:

[root@gameServer ~]# curl -XGET http://172.17.63.15:9200/_analyze?pretty -H 'Content-Type:application/json' -d'               
{
  "analyzer": "ik_smart",
  "text": "聽說看這篇博客的哥們最帥、姑娘最美"
}'
{
  "tokens" : [
    {
      "token" : "聽說",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "看",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "這篇",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "博客",
      "start_offset" : 5,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "的",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "CN_CHAR",
      "position" : 4
    },
    {
      "token" : "哥們",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "最",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "CN_CHAR",
      "position" : 6
    },
    {
      "token" : "帥",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "CN_CHAR",
      "position" : 7
    },
    {
      "token" : "姑娘",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "CN_WORD",
      "position" : 8
    },
    {
      "token" : "最美",
      "start_offset" : 15,
      "end_offset" : 17,
      "type" : "CN_WORD",
      "position" : 9
    }
  ]
}
解釋(來源GitHub):

ik_max_word 和 ik_smart 什麼區別?
ik_max_word: 會將文本做最細粒度的拆分,比如會將“中華人民共和國國歌”拆分爲“中華人民共和國,中華人民,中華,華人,人民共和國,人民,人,民,共和國,共和,和,國國,國歌”,會窮盡各種可能的組合;
ik_smart: 會做最粗粒度的拆分,比如會將“中華人民共和國國歌”拆分爲“中華人民共和國,國歌”。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章