ES複合查詢之bool/boosting查詢_9_1_1

1、bool查詢

bool查詢是組合葉子查詢或複合查詢子句的默認查詢方式,如must,should,must_not或者filter子句;must與should子句查詢最終分數由兩個子句各自匹配分數相加得到,而must_not與filter子句需要在過濾查詢中執行;

bool查詢底層由Lucene中的BooleanQuery類實現,該查詢由一個或多個布爾子句組成,每個子句由特定類型聲明;

1.1、bool查詢子句中的類型
序號 類型 描述
1 must 該查詢子句必須出現在匹配的文檔中且與相似度分數計算相關
2 filter 該查詢子句必須出現在匹配的文檔中且是在過濾上下文中執行,與must查詢不同的是該查詢會忽略相似度分數計算且會對結果緩存
3 should 該查詢子句應該出現在匹配的文檔中
4 must_not 該查詢子句必須不能出現在匹配的文檔中,該查詢在過濾上下文中執行,這也意味着不會計算相似度分數(分數爲0)且對結果會緩存

文檔同時匹配查詢子句must或should可獲得更高的分數,而最後相似度分_score就是通過匹配must或should計算出的分數相加得到

//請求參數
POST bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "gender.keyword": "M"
          }
        }
      ],
      "filter": {
        "term": {
          "state.keyword": "MO"
        }
      },
      "must_not": [
        {
          "range": {
            "age": {
              "gte": 20,
              "lte": 30
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "email": "comcubine.com"
          }
        },
        {
          "match": {
            "address": "Avenue"
          }
        }
      ],
      "minimum_should_match": 1,
      "boost": 1
    }
  }
}

//返回結果
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 7.1838775,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "58",
        "_score" : 7.1838775,
        "_source" : {
          "account_number" : 58,
          "balance" : 31697,
          "firstname" : "Marva",
          "lastname" : "Cannon",
          "age" : 40,
          "gender" : "M",
          "address" : "993 Highland Place",
          "employer" : "Comcubine",
          "email" : "[email protected]",
          "city" : "Orviston",
          "state" : "MO"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "286",
        "_score" : 2.2192826,
        "_source" : {
          "account_number" : 286,
          "balance" : 39063,
          "firstname" : "Rosetta",
          "lastname" : "Turner",
          "age" : 35,
          "gender" : "M",
          "address" : "169 Jefferson Avenue",
          "employer" : "Spacewax",
          "email" : "[email protected]",
          "city" : "Stewart",
          "state" : "MO"
        }
      }
    ]
  }
}

minimum_should_match參數說明
可以使用minimum_should_match參數指定必須匹配should子句的文檔數量或文檔百分比,若一個bool查詢包含至少一個should子句且無must或filter子句,則minimum_should_match默認值爲1,反之爲0;

1.2、使用bool.filter計算相似度分

查詢中包含filter子句的查詢不會計算相似度分(返回_score爲0),
以下三個示例均返回字段爲state且值爲WA的文檔
1)、示例查詢分數均爲0,因爲未指定可計算分數的查詢

//請求參數
GET bank/_search
{
  "size": 2, 
  "query": {
    "bool": {
      "filter": {
        "term": {
          "state.keyword": "WA"
        }
      }
    }
  }
}

//結果返回
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "20",
        "_score" : 0.0,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "[email protected]",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "284",
        "_score" : 0.0,
        "_source" : {
          "account_number" : 284,
          "balance" : 22806,
          "firstname" : "Randolph",
          "lastname" : "Banks",
          "age" : 29,
          "gender" : "M",
          "address" : "875 Hamilton Avenue",
          "employer" : "Caxt",
          "email" : "[email protected]",
          "city" : "Crawfordsville",
          "state" : "WA"
        }
      }
    ]
  }
}

2)、示例查詢分爲1.0,因爲使用了match_all查詢返回了所有文檔

//請求參數
GET bank/_search
{
  "size": 2, 
  "query": {
    "bool": {
      "must": {
        "match_all":{}
      },
      "filter": {
        "term": {
          "state.keyword": "WA"
        }
      }
    }
  }
}

//結果返回,分數均爲1.0
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "20",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "[email protected]",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "284",
        "_score" : 1.0,
        "_source" : {
          "account_number" : 284,
          "balance" : 22806,
          "firstname" : "Randolph",
          "lastname" : "Banks",
          "age" : 29,
          "gender" : "M",
          "address" : "875 Hamilton Avenue",
          "employer" : "Caxt",
          "email" : "[email protected]",
          "city" : "Crawfordsville",
          "state" : "WA"
        }
      }
    ]
  }
}

3)、示例查詢分爲1.0,因爲使用了constant_score查詢,其效果與示例2中一樣

//請求參數,boost設置爲1.2
GET bank/_search
{
  "size": 2, 
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "state.keyword": "WA"
        }
      },
      "boost": 1.2
    }
  }
}

//結果返回,分數均爲1.2
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 19,
      "relation" : "eq"
    },
    "max_score" : 1.2,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "20",
        "_score" : 1.2,
        "_source" : {
          "account_number" : 20,
          "balance" : 16418,
          "firstname" : "Elinor",
          "lastname" : "Ratliff",
          "age" : 36,
          "gender" : "M",
          "address" : "282 Kings Place",
          "employer" : "Scentric",
          "email" : "[email protected]",
          "city" : "Ribera",
          "state" : "WA"
        }
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "284",
        "_score" : 1.2,
        "_source" : {
          "account_number" : 284,
          "balance" : 22806,
          "firstname" : "Randolph",
          "lastname" : "Banks",
          "age" : 29,
          "gender" : "M",
          "address" : "875 Hamilton Avenue",
          "employer" : "Caxt",
          "email" : "[email protected]",
          "city" : "Crawfordsville",
          "state" : "WA"
        }
      }
    ]
  }
}
1.3、爲查詢命名

爲查詢命名以觀察實際是哪個查詢子句被匹配
每一個過濾操作或查詢操作在指定匹配子句時都可配置_name參數

//請求參數,針對每個查詢指定查詢字段別名
GET bank/_search
{
  "size": 3,
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "email": {
              "query": "comcubine.com",
              "_name": "q_n1"
            }
          }
        },
        {
          "match": {
            "address": {
              "query": "Avenue",
              "_name": "q_n2"
            }
          }
        }
      ],
      "filter": {
        "terms": {
          "age": [
            40,
            38
          ],
          "_name": "q_a"
        }
      }
    }
  }
}


//結果返回,同時列舉匹配項
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 85,
      "relation" : "eq"
    },
    "max_score" : 6.5046196,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "58",
        "_score" : 6.5046196,
        "_source" : {
          "account_number" : 58,
          "balance" : 31697,
          "firstname" : "Marva",
          "lastname" : "Cannon",
          "age" : 40,
          "gender" : "M",
          "address" : "993 Highland Place",
          "employer" : "Comcubine",
          "email" : "[email protected]",
          "city" : "Orviston",
          "state" : "MO"
        },
        "matched_queries" : [
          "q_a",
          "q_n1"
        ]
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "664",
        "_score" : 1.5400248,
        "_source" : {
          "account_number" : 664,
          "balance" : 16163,
          "firstname" : "Hart",
          "lastname" : "Mccormick",
          "age" : 40,
          "gender" : "M",
          "address" : "144 Guider Avenue",
          "employer" : "Dyno",
          "email" : "[email protected]",
          "city" : "Carbonville",
          "state" : "ID"
        },
        "matched_queries" : [
          "q_a",
          "q_n2"
        ]
      },
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "791",
        "_score" : 1.5400248,
        "_source" : {
          "account_number" : 791,
          "balance" : 48249,
          "firstname" : "Janine",
          "lastname" : "Huber",
          "age" : 38,
          "gender" : "F",
          "address" : "348 Porter Avenue",
          "employer" : "Viocular",
          "email" : "[email protected]",
          "city" : "Fivepointville",
          "state" : "MA"
        },
        "matched_queries" : [
          "q_a",
          "q_n2"
        ]
      }
    ]
  }
}

查詢結果當中會包含每一個匹配到的查詢,在查詢操作和過濾操作上指定標籤只在bool查詢中有意義;

2、boosting查詢

返回匹配positive查詢的文檔並降低匹配negative查詢的文檔相似度分;
這樣就可以在不排除某些文檔的前提下對文檔進行查詢,搜索結果中存在只不過相似度分數相比正常匹配的要低;

GET bank/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "state.keyword": {
            "value": "DC"
          }
        }
      },
      "negative": {
        "term": {
          "age": {
            "value": 23
          }
        }
      },
      "negative_boost": 0.2
    }
  }
}
2.1、boosting查詢的頂層參數
序號 參數 參數說明
1 positive 必須存在,查詢對象,指定希望執行的查詢子句,返回的結果都將滿足該子句指定的條件
2 negative 必須存在,查詢對象,指定的查詢子句用於降低匹配文檔的相似度分
3 negative_boost 必須存在,浮點數,介於0與1.0之間的浮點數,用於降低匹配文檔的相似分

若一個匹配返回的文檔既滿足positive查詢子句又滿足negative查詢子句,那麼boosting查詢計算相似度分數步驟如下:
1)、獲取從positive查詢中的原始分數;
2)、將獲取的分數與negative_boost係數相乘得到最終分;

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章