ElasticSearch:高亮搜索

概述

什麼是highlight

Highlight就是我們所謂的高亮,即允許對一個或者對個字段在搜索結果中高亮顯示。比如字體加粗或者字體呈現和其他文本普通顏色等。

爲了執行高亮顯示,該字段必須有實際的內容,並且這個字段必須存儲,即在mapping中store設爲true,不能只存在於內存中,否則系統會自動加載_source字段並匹配相關的列。

三種高亮類型

ES提供了三種高亮類型,Lucene的plain highlighter,以及fast vector highlighter(fvh)以及posting highlighter.

Plain Highlighter

Plain Hightlighter是默認的高亮選擇,由使用Lucene Hightlighter實現的。它主要是試圖反應查詢匹配邏輯。

如果想高亮很多字段,而且帶有複雜的查詢,那麼這個highlight並不是很快的。爲了準確地反映查詢邏輯,它創建了一個很小的內存索引。並通過Lucene的查詢執行計劃來重新運行原始的查詢條件,從而獲得對當前文檔的低級匹配信息,每個字段和每個需要高亮顯示的文檔都會重複這個過程,所以是有性能隱患的。所以需要你換一個hightlight類型

Fast Vector Highlighter

如果我們在mapping中對字段指定了term_vector參數,且參數值是with_positions_offsets,那麼fast vector highlighter 將會替代plain highlighter成爲默認的highlight類型。

它的主要特點:

  1. 對磁盤的消耗更少
  2. 將文本切割爲句子,並且對句子進行高亮,效果更好
  3. 性能比plain highlight高,因爲不需要重新對高亮文本進行分詞
Posting Highlighter

如果我們在mapping裏index_options設置成offsets,這個posting hightlighter將會代替plain highlighter。

它對大文件而言(大於1M),性能更高。

示例

查詢地址信息中含有mill或者Court的記錄,並將它們高亮顯示。

查詢語句如下:

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "Court" } }
      ]
    }
  }, 
  "highlight": {
    "fields": {
      "address": {}
    }
  }
}

查詢結果如下:

{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "472",
    "_score" : 5.4032025,
    "_source" : {
        "account_number" : 472,
        "balance" : 25571,
        "firstname" : "Lee",
        "lastname" : "Long",
        "age" : 32,
        "gender" : "F",
        "address" : "288 Mill Street",
        "employer" : "Comverges",
        "email" : "[email protected]",
        "city" : "Movico",
        "state" : "MT"
    },
    "highlight" : {
        "address" : [
            "288 <em>Mill</em> Street"
        ]
    }
},
{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "18",
    "_score" : 2.1248586,
    "_source" : {
        "account_number" : 18,
        "balance" : 4180,
        "firstname" : "Dale",
        "lastname" : "Adams",
        "age" : 33,
        "gender" : "M",
        "address" : "467 Hutchinson Court",
        "employer" : "Boink",
        "email" : "[email protected]",
        "city" : "Orick",
        "state" : "MD"
    },
    "highlight" : {
        "address" : [
            "467 Hutchinson <em>Court</em>"
        ]
    }
}

發現它會自動在匹配字段上加上<em> </em>標籤

自定義高亮標籤

語法如下:

"pre_tags": ["<tag1>"],
"post_tags": ["</tag2>"],

查詢語句如下:

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "Court" } }
      ]
    }
  }, 
  "highlight": {
    "pre_tags": ["<a>"],
    "post_tags": ["</a>"], 
    "fields": {
      "address": {}
    }
  }
}

查詢結果如下:

{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "472",
    "_score" : 5.4032025,
    "_source" : {
        "account_number" : 472,
        "balance" : 25571,
        "firstname" : "Lee",
        "lastname" : "Long",
        "age" : 32,
        "gender" : "F",
        "address" : "288 Mill Street",
        "employer" : "Comverges",
        "email" : "[email protected]",
        "city" : "Movico",
        "state" : "MT"
    },
    "highlight" : {
        "address" : [
            "288 <a>Mill</a> Street"
        ]
    }
},
{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "18",
    "_score" : 2.1248586,
    "_source" : {
        "account_number" : 18,
        "balance" : 4180,
        "firstname" : "Dale",
        "lastname" : "Adams",
        "age" : 33,
        "gender" : "M",
        "address" : "467 Hutchinson Court",
        "employer" : "Boink",
        "email" : "[email protected]",
        "city" : "Orick",
        "state" : "MD"
    },
    "highlight" : {
        "address" : [
            "467 Hutchinson <a>Court</a>"
        ]
    }
}

發現高亮標籤已經被替換

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章