1. 模擬字符串數據存儲
localhost:9200/yigo-redist.1/_analyze?analyzer=default&text=全能片(前)---TRW-GDB7891AT剎車片自帶報警線,無單獨報警線號碼,卡仕歐,卡仕歐,乘用車,剎車片
- 索引爲`yigo-redist.1`
- 使用了索引`yigo-redist.1`中的分詞器(`analyzer`) `default`
- 解析的字符串(`text`)爲"全能片(前)---TRW-GDB7891AT剎車片自帶報警線,無單獨報警線號碼,卡仕歐,卡仕歐,乘用車,剎車片"
如果結果爲:
{
"tokens" : [ {
"token" : "全能",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "片",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_CHAR",
"position" : 2
}, {
"token" : "前",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 3
}, {
"token" : "trw-gdb7891at",
"start_offset" : 9,
"end_offset" : 22,
"type" : "LETTER",
"position" : 4
}, {
"token" : "剎車片",
"start_offset" : 22,
"end_offset" : 25,
"type" : "CN_WORD",
"position" : 5
}, {
"token" : "自帶",
"start_offset" : 25,
"end_offset" : 27,
"type" : "CN_WORD",
"position" : 6
}, {
"token" : "報警",
"start_offset" : 27,
"end_offset" : 29,
"type" : "CN_WORD",
"position" : 7
}, {
"token" : "線",
"start_offset" : 29,
"end_offset" : 30,
"type" : "CN_CHAR",
"position" : 8
}, {
"token" : "無",
"start_offset" : 31,
"end_offset" : 32,
"type" : "CN_WORD",
"position" : 9
}, {
"token" : "單獨",
"start_offset" : 32,
"end_offset" : 34,
"type" : "CN_WORD",
"position" : 10
}, {
"token" : "報警",
"start_offset" : 34,
"end_offset" : 36,
"type" : "CN_WORD",
"position" : 11
}, {
"token" : "線",
"start_offset" : 36,
"end_offset" : 37,
"type" : "CN_CHAR",
"position" : 12
}, {
"token" : "號碼",
"start_offset" : 37,
"end_offset" : 39,
"type" : "CN_WORD",
"position" : 13
}, {
"token" : "卡",
"start_offset" : 40,
"end_offset" : 41,
"type" : "CN_CHAR",
"position" : 14
}, {
"token" : "仕",
"start_offset" : 41,
"end_offset" : 42,
"type" : "CN_WORD",
"position" : 15
}, {
"token" : "歐",
"start_offset" : 42,
"end_offset" : 43,
"type" : "CN_WORD",
"position" : 16
}, {
"token" : "卡",
"start_offset" : 44,
"end_offset" : 45,
"type" : "CN_CHAR",
"position" : 17
}, {
"token" : "仕",
"start_offset" : 45,
"end_offset" : 46,
"type" : "CN_WORD",
"position" : 18
}, {
"token" : "歐",
"start_offset" : 46,
"end_offset" : 47,
"type" : "CN_WORD",
"position" : 19
}, {
"token" : "乘用車",
"start_offset" : 48,
"end_offset" : 51,
"type" : "CN_WORD",
"position" : 20
}, {
"token" : "剎車片",
"start_offset" : 52,
"end_offset" : 55,
"type" : "CN_WORD",
"position" : 21
} ]
}
2. 關鍵詞查詢
localhost:9200//yigo-redist.1/_analyze?analyzer=default_search&text=gdb7891
- 索引爲`yigo-redist.1`
- 使用了索引`yigo-redist.1`中的分詞器(`analyzer`) `default_search`
- 解析的字符串(`text`)爲"gdb7891"
{
"tokens" : [ {
"token" : "gdb7891",
"start_offset" : 0,
"end_offset" : 7,
"type" : "LETTER",
"position" : 1
} ]
}
3. 關鍵詞使用存儲的分詞器查詢
localhost:9200//yigo-redist.1/_analyze?analyzer=default&text=gdb7891
- 索引爲`yigo-redist.1`
- 使用了索引`yigo-redist.1`中的分詞器(`analyzer`) `default_search`
- 解析的字符串(`text`)爲"gdb7891"
{
"tokens" : [ {
"token" : "gdb7891",
"start_offset" : 0,
"end_offset" : 7,
"type" : "LETTER",
"position" : 1
}, {
"token" : "",
"start_offset" : 0,
"end_offset" : 7,
"type" : "LETTER",
"position" : 1
}, {
"token" : "gdb7891",
"start_offset" : 0,
"end_offset" : 7,
"type" : "LETTER",
"position" : 1
}, {
"token" : "",
"start_offset" : 0,
"end_offset" : 3,
"type" : "ENGLISH",
"position" : 2
}, {
"token" : "gdb",
"start_offset" : 0,
"end_offset" : 3,
"type" : "ENGLISH",
"position" : 2
}, {
"token" : "gdb",
"start_offset" : 0,
"end_offset" : 3,
"type" : "ENGLISH",
"position" : 2
}, {
"token" : "7891",
"start_offset" : 3,
"end_offset" : 7,
"type" : "ARABIC",
"position" : 3
}, {
"token" : "7891",
"start_offset" : 3,
"end_offset" : 7,
"type" : "ARABIC",
"position" : 3
}, {
"token" : "",
"start_offset" : 3,
"end_offset" : 7,
"type" : "ARABIC",
"position" : 3
} ]
}
總結
- 通過步驟1可以看出,存儲的數據"全能片(前)---TRW-GDB7891AT剎車片自帶報警線,無單獨報警線號碼,卡仕歐,卡仕歐,乘用車,剎車片",被拆分成了很多詞組碎片,然後存儲在了索引數據中
- 通過步驟2可以看出,當關鍵詞輸入"gdb7891",這個在檢索分詞器(`default_search`)下,沒有拆分,只一個可供查詢的碎片就是"gdb7891",但是步驟1,拆分的碎片裏不存在"gb7891"的詞組碎片,唯一相近的就是"trw-gdb7891at",所以使用普通的match-query是無法匹配步驟1輸入的索引數據
- 通過步驟3,可以看出如果使用相同的分詞器,"gdb7891"能夠拆分成"gdb","7891"等等,通過這2個碎片都能找到步驟1輸入的索引數據,但是因爲關鍵詞被拆分了,所以會查詢到更多的匹配的數據,比如:與"gdb"匹配的,與"7891"匹配的,與"gdb7891"匹配的
- 如果說想通過分詞器(`default_search`)檢索出步驟1的數據,需要使用wildcard-query,使用"*gdb7891*",就可以匹配
{
"query": {
"wildcard" : { "description" : "*gdb7891*" }
}
}