elasticsearch6.6分词插件ik的安装和使用

测试环境：centos7.6，elasticsearch-6.6.0

analysis-ik插件项目地址：https://github.com/medcl/elasticsearch-analysis-ik

需要注意的是es的版本和ik插件的版本必须一致。这里在使用ik6.6.0插件发现maven打包后生成target/releases下的zip文件为6.5.0版本的，可能是新版本的一个bug，我的解决办法是修改pom.xml中的elasticsearch.version参数为6.6.0，打包后的文件可正常使用。

下载安装ik插件

1、下载解压

其他版本下载地址：https://github.com/medcl/elasticsearch-analysis-ik/releases

cd /usr/local/src
wget https://github.com/medcl/elasticsearch-analysis-ik/archive/v6.6.0.tar.gz
tar -xvf v6.6.0.tar.gz
cd elasticsearch-analysis-ik-6.6.0

2、源代码打包

由于下载的是源代码，需要使用maven进行打包，若报mvn不是可用命令，则需要进行安装，安装文档：http://maven.apache.org/download.cgi。每一步显示绿色BUILD SUCCESS才表示成功。

mvn clean
mvn compile
mvn package

3、打包成功后会在target/releases/目录下多出一个zip文件

[root@localhost elasticsearch-analysis-ik-6.6.0]# cd target/releases/
[root@localhost releases]# ls
elasticsearch-analysis-ik-6.6.0.zip

然后在es安装目录的plugins文件下创建一个文件夹ik

mkdir /usr/local/src/elasticsearch-6.6.0/plugins/ik

解压elasticsearch-analysis-ik-6.6.0.zip文件到ik目录下

unzip elasticsearch-analysis-ik-6.6.0.zip -d /usr/local/src/elasticsearch-6.6.0/plugins/ik/

然后查看，插件就安装完了。注意plugins目录下除了插件的目录不能有多余文件，要不然无法启动的。

[root@localhost releases]# ll /usr/local/src/elasticsearch-6.6.0/plugins/ik/
总用量 1428
-rw-r--r-- 1 root root 263965 2月  27 15:33 commons-codec-1.9.jar
-rw-r--r-- 1 root root  61829 2月  27 15:33 commons-logging-1.2.jar
drwxr-xr-x 2 root root    299 11月 20 21:36 config
-rw-r--r-- 1 root root  54692 2月  27 17:38 elasticsearch-analysis-ik-6.6.0.jar
-rw-r--r-- 1 root root 736658 2月  27 15:33 httpclient-4.5.2.jar
-rw-r--r-- 1 root root 326724 2月  27 15:33 httpcore-4.4.4.jar
-rw-r--r-- 1 root root   1805 2月  27 17:38 plugin-descriptor.properties
-rw-r--r-- 1 root root    125 2月  27 17:38 plugin-security.policy

4、启动es查看插件运行状态，如果显示有下面这条就说面插件加载成功。

[elastic@localhost elasticsearch-6.6.0]$ ./bin/elasticsearch
...
[INFO ][o.e.p.PluginsService     ] [gP0R6oA] loaded plugin [analysis-ik]

测试ik分词效果

先看一个测试的demo，以下curl使用postman生成，当然也可以使用postman直接请求

1、请求方法

curl -X POST \
  'http://127.0.0.1:9200/_analyze?pretty=true' \
  -H 'content-type: application/json' \
  -d '{
    "analyzer": "ik_smart",
    "text": "中华人民共和国国歌"
}'

返回结果

{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

2、请求方法

curl -X POST \
  'http://127.0.0.1:9200/_analyze?pretty=true' \
  -H 'content-type: application/json' \
  -d '{
    "analyzer": "ik_max_word",
    "text": "中华人民共和国国歌"
}'

返回结果

{
  "tokens" : [
    {
      "token" : "中华人民共和国",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "中华人民",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "中华",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "华人",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "人民共和国",
      "start_offset" : 2,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "人民",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 5
    },
    {
      "token" : "共和国",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "CN_WORD",
      "position" : 6
    },
    {
      "token" : "共和",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "CN_WORD",
      "position" : 7
    },
    {
      "token" : "国",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "CN_CHAR",
      "position" : 8
    },
    {
      "token" : "国歌",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 9
    }
  ]
}

两个方法一个分的比较细，一个比较粗，看下两个的详细区别

ik_max_word: 会将文本做最细粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合；

ik_smart: 会做最粗粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。

第三方账号登录

代码汇博客

登录