1、constant_score查询
内部包装了过滤查询,故而不会计算相似度分,该查询返回的相似度分与字段上指定boost参数值相同
//请求参数
GET bank/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"city.keyword": "Fredericktown"
}
},
"boost": 1.3
}
}
}
//结果返回
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "97",
"_score" : 1.3,
"_source" : {
"account_number" : 97,
"balance" : 49671,
"firstname" : "Karen",
"lastname" : "Trujillo",
"age" : 40,
"gender" : "F",
"address" : "512 Cumberland Walk",
"employer" : "Tsunamia",
"email" : "[email protected]",
"city" : "Fredericktown",
"state" : "MO"
}
}
]
}
}
constant_score查询的顶层参数
序号 | 参数 | 描述 |
---|---|---|
1 | filter | 必须,查询对象,指定希望执行的过滤查询,任何返回的文档都必须匹配这个查询;过滤查询不会计算相似度分,为提升性能,ES会对使用频率高的过滤查询的结果进行缓存 |
2 | boost | 可选,浮点数,该值作为匹配了以上filter的文档的相似度分,默认为1.0 |
2、dis_max查询
最大析取(disjunction max)
返回的文档必须要满足多个查询子句中的一项条件;
若一个文档能匹配多个查询子句时,则dis_max查询将为能匹配上查询子句条件的项增加额外分,即针对多个子句文档有一项满足就针对满足的那一项分配更高分,这也能打破在多个文档都匹配某一个或多个条件时分数相同的情况;
//查询city为Brogan与gender为M的银行职员,能匹配任一项在返回结果中就存在
//tie_breaker计算最高分数因子
//在dis_max查询时针对多个查询字段指定不同的boost因子,此处指定city与gender不同的因子,分数也会因此不同
//请求参数
GET bank/_search
{
"size": 5,
"query": {
"dis_max": {
"tie_breaker": 0.7,
"queries": [
{
"term": {
"city.keyword": {
"boost": 2,
"value": "Brogan"
}
}
},
{
"term": {
"gender.keyword": {
"boost": 1.0,
"value": "M"
}
}
}
]
}
}
}
//返回结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 507,
"relation" : "eq"
},
"max_score" : 13.48206,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 13.48206,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "[email protected]",
"city" : "Brogan",
"state" : "IL"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.679258,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "[email protected]",
"city" : "Dante",
"state" : "TN"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "18",
"_score" : 0.679258,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "[email protected]",
"city" : "Orick",
"state" : "MD"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "20",
"_score" : 0.679258,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "[email protected]",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "37",
"_score" : 0.679258,
"_source" : {
"account_number" : 37,
"balance" : 18612,
"firstname" : "Mcgee",
"lastname" : "Mooney",
"age" : 39,
"gender" : "M",
"address" : "826 Fillmore Place",
"employer" : "Reversus",
"email" : "[email protected]",
"city" : "Tooleville",
"state" : "OK"
}
}
]
}
}
dis_max查询的顶层参数
序号 | 参数 | 描述 |
---|---|---|
1 | queries | 必须,数组对象,包含一个或多个查询子句,返回的文档必须匹配一个或多个查询条件,匹配的条件越多则分数越高 |
2 | tie_breaker | 可选,浮点值,参数介于0与1.0之间,用于增加匹配条件文档额外的分,默认为0.0 |
若一个文档同时满足多个查询子句,则dis_max查询计算分数规则如下;
1)、取匹配项最高的那个分数;
2)、将匹配项的分数与tie_breaker值相乘;
3)、将相乘得到的分数与最高分相加;
若tie_breaker的值大于0.0,则所有匹配子句都计算额外分,得分最高的子句额外分最多;
3、function_score查询
function_score能够修改查询出的文档分数,例如在某个场景下计算分数代价比较大,这时可以在过滤后的文档集合上做计算,这时就可以考虑使用function_score查询;
要使用function_score,需要定义一个或多个函数,这些函数为查询返回的文档计算一个新的分数;
//针对匹配的集合随机分配分数,每次执行基本都不一样
//function_score默认提供了多种分数计算函数,此处使用random_score函数随机分配分数
//请求参数
GET bank/_search
{
"query": {
"function_score": {
"query": {"match_all": {}},
"boost": 5,
"random_score": {},
"boost_mode": "multiply"
}
}
}
//返回参数
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : 4.9830976,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "958",
"_score" : 4.9830976,
"_source" : {
"account_number" : 958,
"balance" : 32849,
"firstname" : "Brown",
"lastname" : "Wilkins",
"age" : 40,
"gender" : "M",
"address" : "686 Delmonico Place",
"employer" : "Medesign",
"email" : "[email protected]",
"city" : "Shelby",
"state" : "WY"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "642",
"_score" : 4.9650164,
"_source" : {
"account_number" : 642,
"balance" : 32852,
"firstname" : "Reyna",
"lastname" : "Harris",
"age" : 35,
"gender" : "M",
"address" : "305 Powell Street",
"employer" : "Bedlam",
"email" : "[email protected]",
"city" : "Florence",
"state" : "KS"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "695",
"_score" : 4.9611883,
"_source" : {
"account_number" : 695,
"balance" : 36800,
"firstname" : "Gonzales",
"lastname" : "Mcfarland",
"age" : 26,
"gender" : "F",
"address" : "647 Louisa Street",
"employer" : "Songbird",
"email" : "[email protected]",
"city" : "Crisman",
"state" : "ID"
}
}
]
}
}
另外多个函数可以组合使用,在这种情况下可以选择仅在文档与查询子句匹配的情况下使用函数计算分数,
//请求参数
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"match": {
"city": "Shelby"
}
},
"random_score": {},
"weight": 23
},
{
"filter": {
"match": {
"city": "Brogan"
}
},
"weight": 42
}
],
"max_boost": 42,
"min_score": 42,
"score_mode": "max",
"boost_mode": "multiply"
}
}
}
//结果返回
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 42.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"_score" : 42.0,
"_source" : {
"account_number" : 1,
"balance" : 39225,
"firstname" : "Amber",
"lastname" : "Duke",
"age" : 32,
"gender" : "M",
"address" : "880 Holmes Lane",
"employer" : "Pyrami",
"email" : "[email protected]",
"city" : "Brogan",
"state" : "IL"
}
}
]
}
}
每个函数经过过滤查询所产生的的分数之间没有什么关系,一个函数中若未指定过滤条件默认就为"match_all":{};
首先每个文档都由定义的函数计算分数,score_mode参数指定计算的分数如何组合
序号 | 参数 | 描述 |
---|---|---|
1 | multiply | 计算分数相乘(默认模式) |
2 | sum | 计算分数求和 |
3 | ave | 计算分数求平均 |
4 | first | 文档在函数中首个匹配的过滤器计算的分数 |
5 | max | 计算分数最大作为分数 |
6 | min | 计算分数最小最为分数 |
因为分数可以有不同精度且希望相同函数对不同字段分数计算存在差异,这就允许自定义参数weight
来调整每个函数的计算结果;
参数weight
可以在functions
数组中定义且与各自函数计算分相乘得到最终分,若指定了weight
而未指定分数计算函数则weight
将作为计算函数且最终分数返回weight
;
在score_mode参数为avg情况下,则各个分数将通过加权平均计算而来;例如两函数计算结果分别为3,4且各自权重为5,6,则计算的分数将为(35+46)/(5+6)而非(35+46)/2;
另外参数max_boost表示计算得分的最大值,默认max_boost值为FLT_MAX;
boost_mode参数定义如下
序号 | 参数 | 描述 |
---|---|---|
1 | multiply | 查询分与函数分相乘(默认模式) |
2 | replace | 使用函数分,查询分忽略 |
3 | sum | 查询分与函数分相加 |
4 | avg | 查询分与函数分的平均分 |
5 | max | 查询分与函数分中最大值 |
6 | min | 查询分与函数分中最小值 |
默认情况下更改分数不会更改匹配的文档,要排除不符合特定分数的文档可以设置min_score参数;
指定了min_score参数,通过查询得出的所有文档将逐个对比分数,低于min_score的将被排除;
function_score查询指定了以下几种计算分数的函数:
序号 | 函数 |
---|---|
1 | script_score |
2 | weight |
3 | random_score |
4 | field_value_factor |
5 | 历史遗留函数(gauss、linear、exp) |
3.1、script_score函数讲解
script_score函数允许包装另一个查询,并可以选择使用脚本表达式从文档中取数值字段值来自定义分数计算;
ES中文档计算分数使用32位正浮点数表示,若计算分数大于精度表示范围,取精度最近的值且保证分数不为负数,否则ES将报错;
//请求参数
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match": {
"city": "Dante"
}
},
"script_score": {
"script": "Math.log(2 + doc['age'].value)"
}
}
}
}
//结果返回
{
"took" : 386,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 23.665949,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "6",
"_score" : 23.665949,
"_source" : {
"account_number" : 6,
"balance" : 5686,
"firstname" : "Hattie",
"lastname" : "Bond",
"age" : 36,
"gender" : "M",
"address" : "671 Bristol Street",
"employer" : "Netagy",
"email" : "[email protected]",
"city" : "Dante",
"state" : "TN"
}
}
]
}
}
在脚本处理部分可以设置参数来计算得分;脚本被编译之后会被缓存以加快执行,如果脚本中有参数则将参数也一起处理;
//指定参数a,b用户分数计算
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match": {
"city": "Orick"
}
},
"script_score": {
"script": {
"params": {
"a": 4,
"b": 1.5
},
"source": "params.a + Math.pow(params.b,doc['age'].value)"
}
}
}
}
}
//结果返回
{
"took" : 73,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 4210414.5,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "18",
"_score" : 4210414.5,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "[email protected]",
"city" : "Orick",
"state" : "MD"
}
}
]
}
}
script_score函数参数boost_mode值默认为multiply,即将查询分与函数分相乘,若只使用函数分则可配置boost_mode为replace;
3.2、weight函数讲解
weight函数计算相似度分由查询分与设置的weight相乘得到,因为过滤条件只适用部分的文档(boost因子也只对匹配项起作用),对于未满足过滤条件的文档可以设置默认的weight;
3.3、random_score函数讲解
random_score函数会生成介于0到1之间均匀分布的一个分数,默认使用Luence内部文档id作为随机源,不过分数值是不可重复的因为文档通过合并会被重新编号;
若希望分数可重复出现,可以配置seed参数与field参数;最终分数值是基于seed参数,文档field字段最小值以及由索引名及分片id计算得到的盐计算而来;需要注意的是位于相同分片中且具有相同字段值的文档将获得相同的分数,因此通常希望使用唯一区分所有文档的字段,默认的选择是文档的_seq_no字段;_seq_no字段唯一的缺点在于文档更新之时其_seq_no字段值也会改变,进而计算出的分数也会改变;
//请求参数
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match": {
"_id": "288"
}
},
"random_score": {
"seed": 10,
"field": "_seq_no"
}
}
}
}
//结果返回
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.8462738,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "288",
"_score" : 0.8462738,
"_source" : {
"account_number" : 288,
"balance" : 27243,
"firstname" : "Wong",
"lastname" : "Stone",
"age" : 39,
"gender" : "F",
"address" : "440 Willoughby Street",
"employer" : "Zentix",
"email" : "[email protected]",
"city" : "Wheatfields",
"state" : "DC"
}
}
]
}
}
//更改文档使其_seq_no更改
//请求参数
PUT bank/_doc/288
{
"age":39
}
//请求参数
GET bank/_search
{
"query": {
"function_score": {
"query": {
"match": {
"_id": "288"
}
},
"random_score": {
"seed": 10,
"field": "_seq_no"
}
}
}
}
//结果返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.45209903,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "288",
"_score" : 0.45209903,
"_source" : {
"age" : 38
}
}
]
}
}
3.4、field_value_factor函数讲解
field_value_factor函数允许使用文档中的字段来计算最终分数,这一点与script_score函数类似,不过稍稍较脚本编写更简单点;如果在field_value_factor函数中配置多个字段,则计算分数只取第一个字段查询分来计算;
//以下配置得到的计算公式: sqrt(1.2 * doc['balance'].value)
//请求参数
GET bank/_search
{
"size": 2,
"query": {
"function_score": {
"query": {
"match": {
"age": "40"
}
},
"field_value_factor": {
"field": "balance",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
//结果返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 45,
"relation" : "eq"
},
"max_score" : 244.14177,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "97",
"_score" : 244.14177,
"_source" : {
"account_number" : 97,
"balance" : 49671,
"firstname" : "Karen",
"lastname" : "Trujillo",
"age" : 40,
"gender" : "F",
"address" : "512 Cumberland Walk",
"employer" : "Tsunamia",
"email" : "[email protected]",
"city" : "Fredericktown",
"state" : "MO"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "878",
"_score" : 242.88022,
"_source" : {
"account_number" : 878,
"balance" : 49159,
"firstname" : "Battle",
"lastname" : "Blackburn",
"age" : 40,
"gender" : "F",
"address" : "234 Hendrix Street",
"employer" : "Zilphur",
"email" : "[email protected]",
"city" : "Wanamie",
"state" : "PA"
}
}
]
}
}
field_value_score函数可配置参数
序号 | 参数 | 描述 |
---|---|---|
1 | field | 索引文档中字段 |
2 | factor | 分数计算时乘数系数,默认为1 |
3 | modifier | 针对计算出的分数应用的函数,可选值:none,log,log1p,log2p,ln,ln1p,ln2p,square,sqrt或reciprocal,默认为none |
4 | missing | 若文档中无field字段值则使用该值 |
modifier可选参数
序号 | 函数 | 描述 |
---|---|---|
1 | none | 对计算分不使用任何的函数 |
2 | log | 使用常用对数(10为底)对计算分进行处理,因为该对数函数对0到1之间的数处理结果为负数,一般推荐使用log1p代替 |
3 | log1p | 对计算分加1后再使用通用对数处理 |
4 | log2p | 对计算分加2后再使用通用对数处理 |
5 | ln | 使用自然对数(e为底)对计算分进行处理,因为该对数函数对0到1之间的数处理结果为负数,一般推荐使用ln1p代替 |
6 | ln1p | 对计算分加1后再使用自然对数处理 |
7 | ln2p | 对计算分加2后再使用自然对数处理 |
8 | square | 对计算分做平方处理 |
9 | sqrt | 对计算分做开方处理 |
10 | reciprocal | 对计算分做倒数处理 |
tips:使用field_value_score函数计算的分数不可为负数,否则将抛出异常;使用log或ln时需要确保分数不在0到1之间或使用log1p或ln1p代替;