1)環境準備
啓動Elasticsearch https://blog.csdn.net/qq_36918149/article/details/104221934
啓動Kinbana https://blog.csdn.net/qq_36918149/article/details/104224625
2)Character Filter
演示一:
#去除html標籤
POST _analyze
{
"tokenizer":"keyword",
"char_filter":["html_strip"],
"text": "<b>hello world</b>"
}
結果,html標籤已經被去除
演示二:
#使用char filter進行替換
POST _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type" : "mapping",
"mappings" : [ "- => _"]
}
],
"text": "123-456, I-test! test-990 650-555-1234"
}
結果顯示,“-” 已替換爲“_”,並進行了分詞
3)Tokenizer
演示:
#按路徑分詞
POST _analyze
{
"tokenizer":"path_hierarchy",
"text":"/user/ymruan/a/b/c/d/e"
}
結果:路徑被依次進行了分詞
4)Token Filter
演示一:
# whitespace與stop
GET _analyze
{
"tokenizer": "whitespace",
"filter": ["stop","snowball"],
"text": ["The rain in Spain falls mainly on the plain."]
}
結果:連接詞被去掉以後的分詞結果
4)總結
- 過濾器過濾順序:char filter - >tokenizer -> token filter。
- 本章節在實際應用中還需要,仔細看一下api文檔。