ES学习笔记七-多字段搜索

multifield search

好吧,让我们来复习下filter,有个重要的filter叫term,term又可以同时搜索多个值。多值搜索不是多字段搜索

{

“query”:{

  "filtered":{

      "filter":{

          "terms":{

                 "title":["1", "2", "3"]

             }

        }

   }

}

}

GET /_search
{
 
"query": {
   
"bool": {
     
"should": [
       
{ "match": { "title":  "War and Peace" }},
       
{ "match": { "author": "Leo Tolstoy"   }},
       
{ "bool":  {
         
"should": [
           
{ "match": { "translator": "Constance Garnett" }},
           
{ "match": { "translator": "Louise Maude"      }}
         
]
       
}}
     
]
   
}
 
}
}
如果把should单独不用bool包裹会产生什么影响呢

GET /_search
{
 
"query": {
   
"bool": {
     
"should": [
       
{ "match": { "title":  "War and Peace" }},
       
{ "match": { "author": "Leo Tolstoy"   }},
       
  { "match": { "translator": "Constance Garnett" }},
           
{ "match": { "translator": "Louise Maude"      }}


     
]
   
}
 
}
}

The answer lies in how the score is calculated. The bool query runs each match query, adds their scores together, then multiplies by the number of matching clauses, and divides by the total number of clauses. Each clause at the same level has the same weight. In the preceding query, the boolquery containing the translator clauses counts for one-third of the total score. If we had put the translator clauses at the same level as title and author, they would have reduced the contribution of the title and author clauses to one-quarter each.

GET /_search
{
 
"query": {
   
"bool": {
     
"should": [
       
{ "match": {
           
"title":  {
             
"query": "War and Peace",
             
"boost": 2
       
}}},
       
{ "match": {
           
"author":  {
             
"query": "Leo Tolstoy",
             
"boost": 2
       
}}},
       
{ "bool":  {
           
"should": [
             
{ "match": { "translator": "Constance Garnett" }},
             
{ "match": { "translator": "Louise Maude"      }}
           
]
       
}}
     
]
   
}
 
}
}

single query string

best fileds

dis_max queryedit

Instead of the bool query, we can use the dis_max or Disjunction Max Query. Disjunction means or(while conjunction means and) so the Disjunction Max Query simply means return documents that match any of these queries, and return the score of the best matching query:

{
   
"query": {
       
"dis_max": {
           
"queries": [
               
{ "match": { "title": "Brown fox" }},
               
{ "match": { "body":  "Brown fox" }}
           
]
       
}
   
}
}
普通的bool查询会将所有符合条件的search的得分相加再取平均分,而dis_max会返回所有符合条件的查询中得分最高的结果(分离式的), Disjunction Max Query. Disjunction means or(while conjunction means and) so the Disjunction Max Query simply means return documents that match any of these queries, and return the score of the best matching query:

A simple dis_max query like the following would choose the single best matching field, and ignore the other:

tie_breakeredit

It is possible, however, to also take the _score from the other matching clauses into account, by specifying the tie_breaker parameter:

{
   
"query": {
       
"dis_max": {
           
"queries": [
               
{ "match": { "title": "Quick pets" }},
               
{ "match": { "body":  "Quick pets" }}
           
],
           
"tie_breaker": 0.3
       
}
   
}
}

The tie_breaker parameter makes the dis_max query behave more like a halfway house between dis_max and bool. It changes the score calculation as follows:

参数tie_breaker找到了bool与dis_max计算评分的折中方案,

  1. Take the _score of the best-matching clause.
  2. Multiply the score of each of the other matching clauses by the tie_breaker.
  3. Add them all together and normalize.
意思就是best_matching的得分仍然是最高的,其他的分数乘以tie_breaker,并相加再,使得分正常化(?取平均数)

With the tie_breaker, all matching clauses count, but the best-matching clause counts most.

tie_breaker的值最好保持在0.1-0.4之间,为了保证单个query(dis_max)最高得分的意义。

换言之,如果tie_breaker的值为0,则此查询的意义就是dis_max,如果为1,则意义就是bool查询(取平均值)。

multi_match query

By default, this query runs as type best_fields, which means that it generates a match query for each field and wraps them in a dis_max query. This dis_max query

{
   
"multi_match": {
       
"query":                "Quick brown fox",
       
"type":                 "best_fields",
       
"fields":               [ "title", "body" ],
       
"tie_breaker":          0.3,
       
"minimum_should_match": "30%"
   
}
}
从title与body查询中取出得分最高的filed + 另一个查询条件的得分*tie_breaker 

等价于下边的查询----------------------------------------------------------------------------------

{
 
"dis_max": {
   
"queries":  [
     
{
       
"match": {
         
"title": {
           
"query": "Quick brown fox",
           
"minimum_should_match": "30%"
         
}
       
}
     
},
     
{
       
"match": {
         
"body": {
           
"query": "Quick brown fox",
           
"minimum_should_match": "30%"
         
}
       
}
     
},
   
],
   
"tie_breaker": 0.3
 
}
}

using wildcards in field names

字段名字可以使用通配符

{
   
"multi_match": {
       
"query":  "Quick brown fox",
       
"fields": "*_title"
   
}
}

boosting individual fields

单独为某个字段设置得分权重

{
   
"multi_match": {
       
"query":  "Quick brown fox",
       
"fields": [ "*_title", "chapter_title^2" ]
   
}
}

most fields

We can achieve this by indexing the same text in other fields to provide more-precise matching. One field may contain the unstemmed version, another the original word with diacritics, and a third might use shingles to provide information about word proximity. These other fields act as signals that increase the relevance score of each matching document. The more fields that match, the better.

A document is included in the results list if it matches the broad-matching main field. If it also matches the signal fields, it gets extra points and is pushed up the results list.

什么事most fields呢,由于分析器的不同,词元的提取也会不同,比如white-space 会将空格的词隔开,

jump 与 jumped jumping 的词根都是jump,但由于分词器的不同会导致可能存在3个词元。如果signal fileds匹配的更多,则查询将会获得额外的分数并将数据显示在更靠前的列表中。

cross-fields entity search

GET /books/_search
{
   
"query": {
       
"multi_match": {
           
"query":       "peter smith",
           
"type":        "cross_fields",
           
"fields":      [ "title^2", "description" ]
       
}
   
}
}

field-centric queries

While this would work, we don’t like having to store redundant data. Instead, Elasticsearch offers us two solutions—one at index time and one at search time—which we discuss next.

关于cross fields的问题 有两种方式解决,第一种,将所有字段的所有内容聚合成一个字段存入es

PUT /my_index
{
   
"mappings": {
       
"person": {
           
"properties": {
               
"first_name": {
                   
"type":     "string",
                   
"copy_to":  "full_name"
               
},
               
"last_name": {
                   
"type":     "string",
                   
"copy_to":  "full_name"
               
},
               
"full_name": {
                   
"type":     "string"
               
}
           
}
       
}
   
}
}

exact-value fields


Avoid using not_analyzed fields in multi_match queries.


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章