1、bool查詢
bool查詢是組合葉子查詢或複合查詢子句的默認查詢方式,如must,should,must_not或者filter子句;must與should子句查詢最終分數由兩個子句各自匹配分數相加得到,而must_not與filter子句需要在過濾查詢中執行;
bool查詢底層由Lucene中的BooleanQuery類實現,該查詢由一個或多個布爾子句組成,每個子句由特定類型聲明;
1.1、bool查詢子句中的類型
序號 | 類型 | 描述 |
---|---|---|
1 | must | 該查詢子句必須出現在匹配的文檔中且與相似度分數計算相關 |
2 | filter | 該查詢子句必須出現在匹配的文檔中且是在過濾上下文中執行,與must查詢不同的是該查詢會忽略相似度分數計算且會對結果緩存 |
3 | should | 該查詢子句應該出現在匹配的文檔中 |
4 | must_not | 該查詢子句必須不能出現在匹配的文檔中,該查詢在過濾上下文中執行,這也意味着不會計算相似度分數(分數爲0)且對結果會緩存 |
文檔同時匹配查詢子句must或should可獲得更高的分數,而最後相似度分_score
就是通過匹配must或should計算出的分數相加得到
//請求參數
POST bank/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"gender.keyword": "M"
}
}
],
"filter": {
"term": {
"state.keyword": "MO"
}
},
"must_not": [
{
"range": {
"age": {
"gte": 20,
"lte": 30
}
}
}
],
"should": [
{
"match": {
"email": "comcubine.com"
}
},
{
"match": {
"address": "Avenue"
}
}
],
"minimum_should_match": 1,
"boost": 1
}
}
}
//返回結果
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 7.1838775,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "58",
"_score" : 7.1838775,
"_source" : {
"account_number" : 58,
"balance" : 31697,
"firstname" : "Marva",
"lastname" : "Cannon",
"age" : 40,
"gender" : "M",
"address" : "993 Highland Place",
"employer" : "Comcubine",
"email" : "[email protected]",
"city" : "Orviston",
"state" : "MO"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "286",
"_score" : 2.2192826,
"_source" : {
"account_number" : 286,
"balance" : 39063,
"firstname" : "Rosetta",
"lastname" : "Turner",
"age" : 35,
"gender" : "M",
"address" : "169 Jefferson Avenue",
"employer" : "Spacewax",
"email" : "[email protected]",
"city" : "Stewart",
"state" : "MO"
}
}
]
}
}
minimum_should_match參數說明
可以使用minimum_should_match參數指定必須匹配should子句的文檔數量或文檔百分比,若一個bool查詢包含至少一個should子句且無must或filter子句,則minimum_should_match默認值爲1,反之爲0;
1.2、使用bool.filter計算相似度分
查詢中包含filter子句的查詢不會計算相似度分(返回_score爲0),
以下三個示例均返回字段爲state且值爲WA的文檔
1)、示例查詢分數均爲0,因爲未指定可計算分數的查詢
//請求參數
GET bank/_search
{
"size": 2,
"query": {
"bool": {
"filter": {
"term": {
"state.keyword": "WA"
}
}
}
}
}
//結果返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 19,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "20",
"_score" : 0.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "[email protected]",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "284",
"_score" : 0.0,
"_source" : {
"account_number" : 284,
"balance" : 22806,
"firstname" : "Randolph",
"lastname" : "Banks",
"age" : 29,
"gender" : "M",
"address" : "875 Hamilton Avenue",
"employer" : "Caxt",
"email" : "[email protected]",
"city" : "Crawfordsville",
"state" : "WA"
}
}
]
}
}
2)、示例查詢分爲1.0,因爲使用了match_all查詢返回了所有文檔
//請求參數
GET bank/_search
{
"size": 2,
"query": {
"bool": {
"must": {
"match_all":{}
},
"filter": {
"term": {
"state.keyword": "WA"
}
}
}
}
}
//結果返回,分數均爲1.0
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 19,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "20",
"_score" : 1.0,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "[email protected]",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "284",
"_score" : 1.0,
"_source" : {
"account_number" : 284,
"balance" : 22806,
"firstname" : "Randolph",
"lastname" : "Banks",
"age" : 29,
"gender" : "M",
"address" : "875 Hamilton Avenue",
"employer" : "Caxt",
"email" : "[email protected]",
"city" : "Crawfordsville",
"state" : "WA"
}
}
]
}
}
3)、示例查詢分爲1.0,因爲使用了constant_score查詢,其效果與示例2中一樣
//請求參數,boost設置爲1.2
GET bank/_search
{
"size": 2,
"query": {
"constant_score": {
"filter": {
"term": {
"state.keyword": "WA"
}
},
"boost": 1.2
}
}
}
//結果返回,分數均爲1.2
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 19,
"relation" : "eq"
},
"max_score" : 1.2,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "20",
"_score" : 1.2,
"_source" : {
"account_number" : 20,
"balance" : 16418,
"firstname" : "Elinor",
"lastname" : "Ratliff",
"age" : 36,
"gender" : "M",
"address" : "282 Kings Place",
"employer" : "Scentric",
"email" : "[email protected]",
"city" : "Ribera",
"state" : "WA"
}
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "284",
"_score" : 1.2,
"_source" : {
"account_number" : 284,
"balance" : 22806,
"firstname" : "Randolph",
"lastname" : "Banks",
"age" : 29,
"gender" : "M",
"address" : "875 Hamilton Avenue",
"employer" : "Caxt",
"email" : "[email protected]",
"city" : "Crawfordsville",
"state" : "WA"
}
}
]
}
}
1.3、爲查詢命名
爲查詢命名以觀察實際是哪個查詢子句被匹配
每一個過濾操作或查詢操作在指定匹配子句時都可配置_name參數
//請求參數,針對每個查詢指定查詢字段別名
GET bank/_search
{
"size": 3,
"query": {
"bool": {
"should": [
{
"match": {
"email": {
"query": "comcubine.com",
"_name": "q_n1"
}
}
},
{
"match": {
"address": {
"query": "Avenue",
"_name": "q_n2"
}
}
}
],
"filter": {
"terms": {
"age": [
40,
38
],
"_name": "q_a"
}
}
}
}
}
//結果返回,同時列舉匹配項
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 85,
"relation" : "eq"
},
"max_score" : 6.5046196,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "58",
"_score" : 6.5046196,
"_source" : {
"account_number" : 58,
"balance" : 31697,
"firstname" : "Marva",
"lastname" : "Cannon",
"age" : 40,
"gender" : "M",
"address" : "993 Highland Place",
"employer" : "Comcubine",
"email" : "[email protected]",
"city" : "Orviston",
"state" : "MO"
},
"matched_queries" : [
"q_a",
"q_n1"
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "664",
"_score" : 1.5400248,
"_source" : {
"account_number" : 664,
"balance" : 16163,
"firstname" : "Hart",
"lastname" : "Mccormick",
"age" : 40,
"gender" : "M",
"address" : "144 Guider Avenue",
"employer" : "Dyno",
"email" : "[email protected]",
"city" : "Carbonville",
"state" : "ID"
},
"matched_queries" : [
"q_a",
"q_n2"
]
},
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "791",
"_score" : 1.5400248,
"_source" : {
"account_number" : 791,
"balance" : 48249,
"firstname" : "Janine",
"lastname" : "Huber",
"age" : 38,
"gender" : "F",
"address" : "348 Porter Avenue",
"employer" : "Viocular",
"email" : "[email protected]",
"city" : "Fivepointville",
"state" : "MA"
},
"matched_queries" : [
"q_a",
"q_n2"
]
}
]
}
}
查詢結果當中會包含每一個匹配到的查詢,在查詢操作和過濾操作上指定標籤只在bool查詢中有意義;
2、boosting查詢
返回匹配positive查詢的文檔並降低匹配negative查詢的文檔相似度分;
這樣就可以在不排除某些文檔的前提下對文檔進行查詢,搜索結果中存在只不過相似度分數相比正常匹配的要低;
GET bank/_search
{
"query": {
"boosting": {
"positive": {
"term": {
"state.keyword": {
"value": "DC"
}
}
},
"negative": {
"term": {
"age": {
"value": 23
}
}
},
"negative_boost": 0.2
}
}
}
2.1、boosting查詢的頂層參數
序號 | 參數 | 參數說明 |
---|---|---|
1 | positive | 必須存在,查詢對象,指定希望執行的查詢子句,返回的結果都將滿足該子句指定的條件 |
2 | negative | 必須存在,查詢對象,指定的查詢子句用於降低匹配文檔的相似度分 |
3 | negative_boost | 必須存在,浮點數,介於0與1.0之間的浮點數,用於降低匹配文檔的相似分 |
若一個匹配返回的文檔既滿足positive查詢子句又滿足negative查詢子句,那麼boosting查詢計算相似度分數步驟如下:
1)、獲取從positive查詢中的原始分數;
2)、將獲取的分數與negative_boost係數相乘得到最終分;