multifield search
好吧,讓我們來複習下filter,有個重要的filter叫term,term又可以同時搜索多個值。多值搜索不是多字段搜索{
“query”:{
"filtered":{
"filter":{
"terms":{
"title":["1", "2", "3"]
}
}
}
}
}
GET /_search如果把should單獨不用bool包裹會產生什麼影響呢
{
"query": {
"bool": {
"should": [
{ "match": { "title": "War and Peace" }},
{ "match": { "author": "Leo Tolstoy" }},
{ "bool": {
"should": [
{ "match": { "translator": "Constance Garnett" }},
{ "match": { "translator": "Louise Maude" }}
]
}}
]
}
}
}
GET /_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "War and Peace" }},
{ "match": { "author": "Leo Tolstoy" }},
{ "match": { "translator": "Constance Garnett" }},
{ "match": { "translator": "Louise Maude" }}
]
}
}
}
The answer lies in how the score is calculated. The bool
query
runs each match
query,
adds their scores together, then multiplies by the number of matching clauses, and divides by the total number of clauses. Each clause at the same level has the same weight. In the preceding query, the bool
query
containing the translator clauses counts for one-third of the total score. If we had put the translator clauses at the same level as title and author, they would have reduced the contribution of the title and author clauses to one-quarter each.
GET /_search
{
"query": {
"bool": {
"should": [
{ "match": {
"title": {
"query": "War and Peace",
"boost": 2
}}},
{ "match": {
"author": {
"query": "Leo Tolstoy",
"boost": 2
}}},
{ "bool": {
"should": [
{ "match": { "translator": "Constance Garnett" }},
{ "match": { "translator": "Louise Maude" }}
]
}}
]
}
}
}
single query string
best fileds
dis_max queryedit
Instead of the bool
query,
we can use the dis_max
or Disjunction
Max Query. Disjunction means or(while
conjunction means and) so the Disjunction Max Query simply means return documents that match any of these queries, and return the score of the best
matching query:
{普通的bool查詢會將所有符合條件的search的得分相加再取平均分,而dis_max會返回所有符合條件的查詢中得分最高的結果(分離式的), Disjunction Max Query. Disjunction means or(while conjunction means and) so the Disjunction Max Query simply means return documents that match any of these queries, and return the score of the best matching query:
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Brown fox" }},
{ "match": { "body": "Brown fox" }}
]
}
}
}
A simple dis_max
query
like the following would choose
the single best matching field, and ignore the other:
tie_breakeredit
It is possible, however, to also
take the _score
from
the other matching clauses into account, by specifying the tie_breaker
parameter:
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
],
"tie_breaker": 0.3
}
}
}
The tie_breaker
parameter
makes the dis_max
query
behave more like a halfway house between dis_max
and bool
.
It changes the score calculation as follows:
參數tie_breaker找到了bool與dis_max計算評分的折中方案,
- Take the
_score
of the best-matching clause. - Multiply the score of each of the other matching clauses by the
tie_breaker
. - Add them all together and normalize.
With the tie_breaker
,
all matching clauses count, but the best-matching clause counts most.
換言之,如果tie_breaker的值爲0,則此查詢的意義就是dis_max,如果爲1,則意義就是bool查詢(取平均值)。
multi_match query
By default, this query runs as typebest_fields
,
which means that
it generates a match
query
for each field and wraps them in a dis_max
query.
This dis_max
query
{從title與body查詢中取出得分最高的filed + 另一個查詢條件的得分*tie_breaker
"multi_match": {
"query": "Quick brown fox",
"type": "best_fields",
"fields": [ "title", "body" ],
"tie_breaker": 0.3,
"minimum_should_match": "30%"
}
}
等價於下邊的查詢----------------------------------------------------------------------------------
{
"dis_max": {
"queries": [
{
"match": {
"title": {
"query": "Quick brown fox",
"minimum_should_match": "30%"
}
}
},
{
"match": {
"body": {
"query": "Quick brown fox",
"minimum_should_match": "30%"
}
}
},
],
"tie_breaker": 0.3
}
}
using wildcards in field names
字段名字可以使用通配符
{
"multi_match": {
"query": "Quick brown fox",
"fields": "*_title"
}
}
boosting individual fields
單獨爲某個字段設置得分權重
{
"multi_match": {
"query": "Quick brown fox",
"fields": [ "*_title", "chapter_title^2" ]
}
}
most fields
We can achieve this by indexing the same text in other fields to provide more-precise matching. One field may contain the unstemmed version, another the original word with diacritics, and a third might use shingles to provide information about word proximity. These other fields act as signals that increase the relevance score of each matching document. The more fields that match, the better.
A document is included in the results list if it matches the broad-matching main field. If it also matches the signal fields, it gets extra points and is pushed up the results list.
什麼事most fields呢,由於分析器的不同,詞元的提取也會不同,比如white-space 會將空格的詞隔開,jump 與 jumped jumping 的詞根都是jump,但由於分詞器的不同會導致可能存在3個詞元。如果signal fileds匹配的更多,則查詢將會獲得額外的分數並將數據顯示在更靠前的列表中。
cross-fields entity search
GET /books/_search
{
"query": {
"multi_match": {
"query": "peter smith",
"type": "cross_fields",
"fields": [ "title^2", "description" ]
}
}
}
field-centric queries
While this would work, we don’t like having to store redundant data. Instead, Elasticsearch offers us two solutions—one at index time and one at search time—which we discuss next.關於cross fields的問題 有兩種方式解決,第一種,將所有字段的所有內容聚合成一個字段存入es
PUT /my_index
{
"mappings": {
"person": {
"properties": {
"first_name": {
"type": "string",
"copy_to": "full_name"
},
"last_name": {
"type": "string",
"copy_to": "full_name"
},
"full_name": {
"type": "string"
}
}
}
}
}
exact-value fields
Avoid using not_analyzed
fields
in multi_match
queries.