對大字段在設計mapping時,添加term_vector參數,如下:
"description": {
"similarity": "customize_bm25",
"type": "text",
"store": true,
"analyzer": "my_jieba_index_analyzer",
"search_analyzer": "my_jieba_search_analyzer",
"term_vector" : "with_positions_offsets"
}
配置該參數後,能明顯看到高亮速度快了很多。
但是,當輸入某些查詢詞時,可能會遇到如下錯誤:
錯誤Lucense解析字段中的空格導致的。
解決方案:把空格term,使用filter過濾掉。
但是,在添加空格filter時,發現一個問題,就是使用jieba分詞器,就算添加了如下filter過濾器,也沒辦法過濾到空格term:
"my_stop_filter": {
"ignore_case": "true",
"type": "stop",
"stopwords": [
" ",
"的",
"得",
"地"
]
},
而使用ik分詞器是可以,所以就轉戰ik了。定義了兩個解析器,如下:
"my_ik_index_analyzer": {
"filter": [
"my_stop_filter"
],
"type": "custom",
"tokenizer": "ik_max_word"
},
"my_ik_search_analyzer": {
"filter": [
"my_stop_filter"
],
"type": "custom",
"tokenizer": "ik_smart"
}
大字段mapping定義如下:
"description": {
"similarity": "customize_bm25",
"type": "text",
"store": true,
"analyzer": "my_ik_index_analyzer",
"search_analyzer": "my_ik_search_analyzer",
"term_vector" : "with_positions_offsets"
}
如此,上述報錯就會消失。
done......