By default, results are returned sorted by relevance—with the most relevant docs first。
首先來了解一下排序:
{query:{
},
"from":0,
"size":10,
"sort":"field" | "sort:"["filed1","field2"] | "sort":{"filed":"desc"}
}
"sort": {
"dates": {
"order": "asc",
"mode": "min"
}
}
string sorting and multifields
fine
old art
, it results in three terms. We probably want to sort alphabetically on the first term, then the second term, and so forth, but Elasticsearch
doesn’t have this information at its disposal at sort time.解決的辦法是定義mapping
"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
GET /_search搜索結果相關性
{
"query": {
"match": {
"tweet": "elasticsearch"
}
},
"sort": "tweet.raw"
}
The standard similarity algorithm used in Elasticsearch is known as term frequency/inverse document frequency, or TF/IDF, which takes the following factors into account:
- Term frequency 詞元在此文檔中出現的頻率越高,則相關性越好
- How often does the term appear in the field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.
- Inverse document frequency 詞元在其他文檔中出現的頻率越高,則相關性越低
- How often does each term appear in the index? The more often, the less relevant. Terms that appear in many documents have a lower weight than more-uncommon terms.
- Field-length norm 文檔的長度越低,相關度越小
- How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short
title
field carries more weight than the same term appearing in a longcontent
field.
相關性得分計算是以分片爲單位計算的,不是以索引爲單位計算的。
GET /_search?explain記得 explain只在debug中使用 production model中請關閉此選項,性能開銷很大
{
"query" : { "match" : { "tweet" : "honeymoon" }}
}