ElasticSearch:高亮搜索

概述

什么是highlight

Highlight就是我们所谓的高亮,即允许对一个或者对个字段在搜索结果中高亮显示。比如字体加粗或者字体呈现和其他文本普通颜色等。

为了执行高亮显示,该字段必须有实际的内容,并且这个字段必须存储,即在mapping中store设为true,不能只存在于内存中,否则系统会自动加载_source字段并匹配相关的列。

三种高亮类型

ES提供了三种高亮类型,Lucene的plain highlighter,以及fast vector highlighter(fvh)以及posting highlighter.

Plain Highlighter

Plain Hightlighter是默认的高亮选择,由使用Lucene Hightlighter实现的。它主要是试图反应查询匹配逻辑。

如果想高亮很多字段,而且带有复杂的查询,那么这个highlight并不是很快的。为了准确地反映查询逻辑,它创建了一个很小的内存索引。并通过Lucene的查询执行计划来重新运行原始的查询条件,从而获得对当前文档的低级匹配信息,每个字段和每个需要高亮显示的文档都会重复这个过程,所以是有性能隐患的。所以需要你换一个hightlight类型

Fast Vector Highlighter

如果我们在mapping中对字段指定了term_vector参数,且参数值是with_positions_offsets,那么fast vector highlighter 将会替代plain highlighter成为默认的highlight类型。

它的主要特点:

  1. 对磁盘的消耗更少
  2. 将文本切割为句子,并且对句子进行高亮,效果更好
  3. 性能比plain highlight高,因为不需要重新对高亮文本进行分词
Posting Highlighter

如果我们在mapping里index_options设置成offsets,这个posting hightlighter将会代替plain highlighter。

它对大文件而言(大于1M),性能更高。

示例

查询地址信息中含有mill或者Court的记录,并将它们高亮显示。

查询语句如下:

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "Court" } }
      ]
    }
  }, 
  "highlight": {
    "fields": {
      "address": {}
    }
  }
}

查询结果如下:

{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "472",
    "_score" : 5.4032025,
    "_source" : {
        "account_number" : 472,
        "balance" : 25571,
        "firstname" : "Lee",
        "lastname" : "Long",
        "age" : 32,
        "gender" : "F",
        "address" : "288 Mill Street",
        "employer" : "Comverges",
        "email" : "[email protected]",
        "city" : "Movico",
        "state" : "MT"
    },
    "highlight" : {
        "address" : [
            "288 <em>Mill</em> Street"
        ]
    }
},
{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "18",
    "_score" : 2.1248586,
    "_source" : {
        "account_number" : 18,
        "balance" : 4180,
        "firstname" : "Dale",
        "lastname" : "Adams",
        "age" : 33,
        "gender" : "M",
        "address" : "467 Hutchinson Court",
        "employer" : "Boink",
        "email" : "[email protected]",
        "city" : "Orick",
        "state" : "MD"
    },
    "highlight" : {
        "address" : [
            "467 Hutchinson <em>Court</em>"
        ]
    }
}

发现它会自动在匹配字段上加上<em> </em>标签

自定义高亮标签

语法如下:

"pre_tags": ["<tag1>"],
"post_tags": ["</tag2>"],

查询语句如下:

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "Court" } }
      ]
    }
  }, 
  "highlight": {
    "pre_tags": ["<a>"],
    "post_tags": ["</a>"], 
    "fields": {
      "address": {}
    }
  }
}

查询结果如下:

{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "472",
    "_score" : 5.4032025,
    "_source" : {
        "account_number" : 472,
        "balance" : 25571,
        "firstname" : "Lee",
        "lastname" : "Long",
        "age" : 32,
        "gender" : "F",
        "address" : "288 Mill Street",
        "employer" : "Comverges",
        "email" : "[email protected]",
        "city" : "Movico",
        "state" : "MT"
    },
    "highlight" : {
        "address" : [
            "288 <a>Mill</a> Street"
        ]
    }
},
{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "18",
    "_score" : 2.1248586,
    "_source" : {
        "account_number" : 18,
        "balance" : 4180,
        "firstname" : "Dale",
        "lastname" : "Adams",
        "age" : 33,
        "gender" : "M",
        "address" : "467 Hutchinson Court",
        "employer" : "Boink",
        "email" : "[email protected]",
        "city" : "Orick",
        "state" : "MD"
    },
    "highlight" : {
        "address" : [
            "467 Hutchinson <a>Court</a>"
        ]
    }
}

发现高亮标签已经被替换

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章