1、constant_score查詢
內部包裝了過濾查詢,故而不會計算相似度分,該查詢返回的相似度分與字段上指定boost參數值相同
//請求參數
GET bank/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"city.keyword": "Fredericktown"
}
},
"boost": 1.3
}
}
}
//結果返回
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "97",
"_score" : 1.3,
"_source" : {
"account_number" : 97,
"balance" : 49671,
"firstname" : "Karen",
"lastname" : "Trujillo",
"age" : 40,
"gender" : "F",
"address" : "512 Cumberland Walk",
"employer" : "Tsunamia",
"email" : "[email protected]",
"city" : "Fredericktown",
"state" : "MO"
}
}
]
}
}
constant_score查詢的頂層參數
序號 | 參數 | 描述 |
---|---|---|
1 | filter | 必須,查詢對象,指定希望執行的過濾查詢,任何返回的文檔都必須匹配這個查詢;過濾查詢不會計算相似度分,爲提升性能,ES會對使用頻率高的過濾查詢的結果進行緩存 |
2 | boost | 可選,浮點數,該值作爲匹配了以上filter的文檔的相似度分,默認爲1.0 |
2、dis_max查詢
最大析取(disjunction max)
返回的文檔必須要滿足多個查詢子句中的一項條件;
若一個文檔能匹配多個查詢子句時,則dis_max查詢將爲能匹配上查詢子句條件的項增加額外分,即針對多個子句文檔有一項滿足就針對滿足的那一項分配更高分,這也能打破在多個文檔都匹配某一個或多個條件時分數相同的情況;
//查詢city爲Brogan與gender爲M的銀行職員,能匹配任一項在返回結果中就存在
//tie_breaker計算最高分數因子
//在dis_max查詢時針對多個查詢字段指定不同的boost因子,此處指定city與gender不同的因子,分數也會因此不同
//請求參數
GET bank/_search
{
"size": 5,
"query": {
"dis_max": {
"tie_breaker": 0.7,
"queries": [
{
"term": {
"city.keyword": {
"boost": 2,
"value": "Brogan"
}
}
},
{
"term": {
"gender.keyword": {
"boost": 1.0,
"value": "M"
}
}
}
]
}
}
}
//返回結果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 507,
"relation" : "eq"
},
"max_score" : 13.48206,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 13.48206,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "[email protected]",
"city" : "Brogan",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.679258,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "[email protected]",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "18",
"_score" : 0.679258,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "[email protected]",
"city" : "Orick",
"state" : "MD"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "20",
"_score" : 0.679258,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "[email protected]",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "37",
"_score" : 0.679258,
"_source" : {
"account_number" : 37,
"balance" : 18612,
"firstname" : "Mcgee",
"lastname" : "Mooney",
"age" : 39,
"gender" : "M",
"address" : "826 Fillmore Place",
"employer" : "Reversus",
"email" : "[email protected]",
"city" : "Tooleville",
"state" : "OK"
}
}
]
}
}
dis_max查詢的頂層參數
序號 | 參數 | 描述 |
---|---|---|
1 | queries | 必須,數組對象,包含一個或多個查詢子句,返回的文檔必須匹配一個或多個查詢條件,匹配的條件越多則分數越高 |
2 | tie_breaker | 可選,浮點值,參數介於0與1.0之間,用於增加匹配條件文檔額外的分,默認爲0.0 |
若一個文檔同時滿足多個查詢子句,則dis_max查詢計算分數規則如下;
1)、取匹配項最高的那個分數;
2)、將匹配項的分數與tie_breaker值相乘;
3)、將相乘得到的分數與最高分相加;
若tie_breaker的值大於0.0,則所有匹配子句都計算額外分,得分最高的子句額外分最多;
3、function_score查詢
function_score能夠修改查詢出的文檔分數,例如在某個場景下計算分數代價比較大,這時可以在過濾後的文檔集合上做計算,這時就可以考慮使用function_score查詢;
要使用function_score,需要定義一個或多個函數,這些函數爲查詢返回的文檔計算一個新的分數;
//針對匹配的集合隨機分配分數,每次執行基本都不一樣
//function_score默認提供了多種分數計算函數,此處使用random_score函數隨機分配分數
//請求參數
GET bank/_search
{
"query": {
"function_score": {
"query": {"match_all": {}},
"boost": 5,
"random_score": {},
"boost_mode": "multiply"
}
}
}
//返回參數
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : 4.9830976,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "958",
"_score" : 4.9830976,
"_source" : {
"account_number" : 958,
"balance" : 32849,
"firstname" : "Brown",
"lastname" : "Wilkins",
"age" : 40,
"gender" : "M",
"address" : "686 Delmonico Place",
"employer" : "Medesign",
"email" : "[email protected]",
"city" : "Shelby",
"state" : "WY"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "642",
"_score" : 4.9650164,
"_source" : {
"account_number" : 642,
"balance" : 32852,
"firstname" : "Reyna",
"lastname" : "Harris",
"age" : 35,
"gender" : "M",
"address" : "305 Powell Street",
"employer" : "Bedlam",
"email" : "[email protected]",
"city" : "Florence",
"state" : "KS"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "695",
"_score" : 4.9611883,
"_source" : {
"account_number" : 695,
"balance" : 36800,
"firstname" : "Gonzales",
"lastname" : "Mcfarland",
"age" : 26,
"gender" : "F",
"address" : "647 Louisa Street",
"employer" : "Songbird",
"email" : "[email protected]",
"city" : "Crisman",
"state" : "ID"
}
}
]
}
}
另外多個函數可以組合使用,在這種情況下可以選擇僅在文檔與查詢子句匹配的情況下使用函數計算分數,
//請求參數
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"match": {
"city": "Shelby"
}
},
"random_score": {},
"weight": 23
},
{
"filter": {
"match": {
"city": "Brogan"
}
},
"weight": 42
}
],
"max_boost": 42,
"min_score": 42,
"score_mode": "max",
"boost_mode": "multiply"
}
}
}
//結果返回
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 42.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 42.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "[email protected]",
"city" : "Brogan",
"state" : "IL"
}
}
]
}
}
每個函數經過過濾查詢所產生的的分數之間沒有什麼關係,一個函數中若未指定過濾條件默認就爲"match_all":{};
首先每個文檔都由定義的函數計算分數,score_mode參數指定計算的分數如何組合
序號 | 參數 | 描述 |
---|---|---|
1 | multiply | 計算分數相乘(默認模式) |
2 | sum | 計算分數求和 |
3 | ave | 計算分數求平均 |
4 | first | 文檔在函數中首個匹配的過濾器計算的分數 |
5 | max | 計算分數最大作爲分數 |
6 | min | 計算分數最小最爲分數 |
因爲分數可以有不同精度且希望相同函數對不同字段分數計算存在差異,這就允許自定義參數weight
來調整每個函數的計算結果;
參數weight
可以在functions
數組中定義且與各自函數計算分相乘得到最終分,若指定了weight
而未指定分數計算函數則weight
將作爲計算函數且最終分數返回weight
;
在score_mode參數爲avg情況下,則各個分數將通過加權平均計算而來;例如兩函數計算結果分別爲3,4且各自權重爲5,6,則計算的分數將爲(35+46)/(5+6)而非(35+46)/2;
另外參數max_boost表示計算得分的最大值,默認max_boost值爲FLT_MAX;
boost_mode參數定義如下
序號 | 參數 | 描述 |
---|---|---|
1 | multiply | 查詢分與函數分相乘(默認模式) |
2 | replace | 使用函數分,查詢分忽略 |
3 | sum | 查詢分與函數分相加 |
4 | avg | 查詢分與函數分的平均分 |
5 | max | 查詢分與函數分中最大值 |
6 | min | 查詢分與函數分中最小值 |
默認情況下更改分數不會更改匹配的文檔,要排除不符合特定分數的文檔可以設置min_score參數;
指定了min_score參數,通過查詢得出的所有文檔將逐個對比分數,低於min_score的將被排除;
function_score查詢指定了以下幾種計算分數的函數:
序號 | 函數 |
---|---|
1 | script_score |
2 | weight |
3 | random_score |
4 | field_value_factor |
5 | 歷史遺留函數(gauss、linear、exp) |
3.1、script_score函數講解
script_score函數允許包裝另一個查詢,並可以選擇使用腳本表達式從文檔中取數值字段值來自定義分數計算;
ES中文檔計算分數使用32位正浮點數表示,若計算分數大於精度表示範圍,取精度最近的值且保證分數不爲負數,否則ES將報錯;
//請求參數
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match": {
"city": "Dante"
}
},
"script_score": {
"script": "Math.log(2 + doc['age'].value)"
}
}
}
}
//結果返回
{
"took" : 386,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 23.665949,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 23.665949,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "[email protected]",
"city" : "Dante",
"state" : "TN"
}
}
]
}
}
在腳本處理部分可以設置參數來計算得分;腳本被編譯之後會被緩存以加快執行,如果腳本中有參數則將參數也一起處理;
//指定參數a,b用戶分數計算
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match": {
"city": "Orick"
}
},
"script_score": {
"script": {
"params": {
"a": 4,
"b": 1.5
},
"source": "params.a + Math.pow(params.b,doc['age'].value)"
}
}
}
}
}
//結果返回
{
"took" : 73,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 4210414.5,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "18",
"_score" : 4210414.5,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "[email protected]",
"city" : "Orick",
"state" : "MD"
}
}
]
}
}
script_score函數參數boost_mode值默認爲multiply,即將查詢分與函數分相乘,若只使用函數分則可配置boost_mode爲replace;
3.2、weight函數講解
weight函數計算相似度分由查詢分與設置的weight相乘得到,因爲過濾條件只適用部分的文檔(boost因子也只對匹配項起作用),對於未滿足過濾條件的文檔可以設置默認的weight;
3.3、random_score函數講解
random_score函數會生成介於0到1之間均勻分佈的一個分數,默認使用Luence內部文檔id作爲隨機源,不過分數值是不可重複的因爲文檔通過合併會被重新編號;
若希望分數可重複出現,可以配置seed參數與field參數;最終分數值是基於seed參數,文檔field字段最小值以及由索引名及分片id計算得到的鹽計算而來;需要注意的是位於相同分片中且具有相同字段值的文檔將獲得相同的分數,因此通常希望使用唯一區分所有文檔的字段,默認的選擇是文檔的_seq_no字段;_seq_no字段唯一的缺點在於文檔更新之時其_seq_no字段值也會改變,進而計算出的分數也會改變;
//請求參數
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match": {
"_id": "288"
}
},
"random_score": {
"seed": 10,
"field": "_seq_no"
}
}
}
}
//結果返回
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.8462738,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "288",
"_score" : 0.8462738,
"_source" : {
"account_number" : 288,
"balance" : 27243,
"firstname" : "Wong",
"lastname" : "Stone",
"age" : 39,
"gender" : "F",
"address" : "440 Willoughby Street",
"employer" : "Zentix",
"email" : "[email protected]",
"city" : "Wheatfields",
"state" : "DC"
}
}
]
}
}
//更改文檔使其_seq_no更改
//請求參數
PUT bank/_doc/288
{
"age":39
}
//請求參數
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match": {
"_id": "288"
}
},
"random_score": {
"seed": 10,
"field": "_seq_no"
}
}
}
}
//結果返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.45209903,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "288",
"_score" : 0.45209903,
"_source" : {
"age" : 38
}
}
]
}
}
3.4、field_value_factor函數講解
field_value_factor函數允許使用文檔中的字段來計算最終分數,這一點與script_score函數類似,不過稍稍較腳本編寫更簡單點;如果在field_value_factor函數中配置多個字段,則計算分數只取第一個字段查詢分來計算;
//以下配置得到的計算公式: sqrt(1.2 * doc['balance'].value)
//請求參數
GET bank/_search
{
"size": 2,
"query": {
"function_score": {
"query": {
"match": {
"age": "40"
}
},
"field_value_factor": {
"field": "balance",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
//結果返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 45,
"relation" : "eq"
},
"max_score" : 244.14177,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "97",
"_score" : 244.14177,
"_source" : {
"account_number" : 97,
"balance" : 49671,
"firstname" : "Karen",
"lastname" : "Trujillo",
"age" : 40,
"gender" : "F",
"address" : "512 Cumberland Walk",
"employer" : "Tsunamia",
"email" : "[email protected]",
"city" : "Fredericktown",
"state" : "MO"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "878",
"_score" : 242.88022,
"_source" : {
"account_number" : 878,
"balance" : 49159,
"firstname" : "Battle",
"lastname" : "Blackburn",
"age" : 40,
"gender" : "F",
"address" : "234 Hendrix Street",
"employer" : "Zilphur",
"email" : "[email protected]",
"city" : "Wanamie",
"state" : "PA"
}
}
]
}
}
field_value_score函數可配置參數
序號 | 參數 | 描述 |
---|---|---|
1 | field | 索引文檔中字段 |
2 | factor | 分數計算時乘數係數,默認爲1 |
3 | modifier | 針對計算出的分數應用的函數,可選值:none,log,log1p,log2p,ln,ln1p,ln2p,square,sqrt或reciprocal,默認爲none |
4 | missing | 若文檔中無field字段值則使用該值 |
modifier可選參數
序號 | 函數 | 描述 |
---|---|---|
1 | none | 對計算分不使用任何的函數 |
2 | log | 使用常用對數(10爲底)對計算分進行處理,因爲該對數函數對0到1之間的數處理結果爲負數,一般推薦使用log1p代替 |
3 | log1p | 對計算分加1後再使用通用對數處理 |
4 | log2p | 對計算分加2後再使用通用對數處理 |
5 | ln | 使用自然對數(e爲底)對計算分進行處理,因爲該對數函數對0到1之間的數處理結果爲負數,一般推薦使用ln1p代替 |
6 | ln1p | 對計算分加1後再使用自然對數處理 |
7 | ln2p | 對計算分加2後再使用自然對數處理 |
8 | square | 對計算分做平方處理 |
9 | sqrt | 對計算分做開方處理 |
10 | reciprocal | 對計算分做倒數處理 |
tips:使用field_value_score函數計算的分數不可爲負數,否則將拋出異常;使用log或ln時需要確保分數不在0到1之間或使用log1p或ln1p代替;