Html strip character filter
默認替換掉所有的標記文本
GET /_analyze
{
"tokenizer": "keyword",
"char_filter": [
"html_strip"
],
"text": "<p>I'm so <b>happy</b>!</p>"
}
自定義
忽略b標籤,刪除其他標籤
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_custom_html_strip_char_filter"
]
}
},
"char_filter": {
"my_custom_html_strip_char_filter": {
"type": "html_strip",
"escaped_tags": [
"b"
]
}
}
}
}
}
Mapping character filter
GET /_analyze
{
"tokenizer": "keyword",
"char_filter": [
{
"type": "mapping",
"mappings": [
"٠ => 0",
"١ => 1",
"٢ => 2",
"٣ => 3",
"٤ => 4",
"٥ => 5",
"٦ => 6",
"٧ => 7",
"٨ => 8",
"٩ => 9"
]
}
],
"text": "My license plate is ٢٥٠١٥"
}
/*輸出結果*/
[ My license plate is 25015 ]
自定義
:)
替換成 _happy_
, :(
替換成_sad_
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_mappings_char_filter"
]
}
},
"char_filter": {
"my_mappings_char_filter": {
"type": "mapping",
"mappings": [
":) => _happy_",
":( => _sad_"
]
}
}
}
}
}
正則表達式字符過濾器
正則表達式字符過濾器使用一個正則表達式匹配應該與指定替換字符串替換字符。替換字符串可以引用正則表達式中的捕獲組。
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "(?<=\\p{Lower})(?=\\p{Upper})",
"replacement": " "
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
POST my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "The fooBarBaz method"
}
/*輸出*/
[ the, foo, bar, baz, method ]