1、intervals查詢
intervals查詢使用了匹配規則,這些規則將會使用在指定字段的對應詞(term)上;
這些規則定義將產生橫跨文本的最小化的間隔(interval),這些間隔可以被父級間隔(interval)組合或過濾;
intervals查詢示例
//請求參數
GET software/_search
{
"query": {
"intervals": {
"desc": {
"all_of": {
"ordered": true,
"intervals": [
{
"match": {
"query": "distributed search",
"max_gaps": 0,
"ordered": true
}
},
{
"any_of": {
"intervals": [
{
"match": {
"query": "analytics engine"
}
},
{
"match": {
"query": "Elastic Stack"
}
}
]
}
}
]
}
}
}
}
}
intervals查詢頂級參數
序號 | 參數 | 描述 |
---|---|---|
1 | <field> | (必須)—希望搜索的文檔字段;該參數對應着規則對象,基於詞(term)、順序(order)以及相互間距離來匹配文檔; |
2、intervals查詢關鍵字
合法的規則關鍵詞有以下幾類
序號 | 關鍵字 | 描述 |
---|---|---|
1 | match | |
2 | prefix | |
3 | wildcard | |
4 | fuzzy | |
5 | all_of | |
6 | any_of |
2.1、match規則參數說明
match規則匹配被分詞後的文本
具體匹配參數
序號 | 參數 | 描述 |
---|---|---|
1 | query | (必須,字符串類型)–指定需要查詢的文本信息 |
2 | max_gaps | (可選,數值類型)—匹配詞(term)之間最大間隔,默認爲-1;未指定或指定爲-1則匹配無間隔限制,設置爲0則匹配詞必須要在已匹配詞的下個詞開始匹配(連續) |
3 | ordered | (可選,布爾類型)—值爲true表示匹配詞必須按照指定順序出現,默認爲false |
4 | analyzer | (可選,字符串類型)—指定查詢的分詞器,默認爲指定查詢字段對應的分詞器 |
5 | filter | (可選,規則對象)—對應一個interval filter |
6 | use_field | (可選,字符串類型)—若指定該字段,則intervals查詢不使用上層轉而以該字段進行查詢,查詢使用的分詞器也是該字段對應的搜索分詞器; |
2.2、prefix規則參數說明
prefix規則匹配的詞要以指定的字符串開頭,若prefix參數指定的字符串匹配超過128個詞(term)則ES將報錯,
這可以通過設置字段參數index_prefix來接觸該限制;
具體匹配參數
序號 | 參數 | 描述 |
---|---|---|
1 | prefix | (必須,字符串類型)—指定匹配詞(term)開頭的字符串 |
2 | analyzer | (可選,字符串類型)—分詞器用於對前綴字符串進行normalize處理,默認爲上層指定的分詞器 |
3 | use_field | (可選,字符串類型)—若指定該字段,則intervals查詢不使用上層轉而以該字段進行查詢 |
2.3、wildcard規則參數說明
wildcard規則使用通配符進行匹配,指定的通配符匹配超過128個則ES將報錯;
具體匹配參數
序號 | 參數 | 描述 |
---|---|---|
1 | pattern | (必須,字符串類型)—指定通配符;參數支持兩類通配符: ? 匹配單個字符; * 匹配零或多個字符,包括空字符 |
2 | analyzer | (可選,字符串類型)—分詞器用於對通配符進行normalize處理,默認爲上層指定的分詞器 |
3 | use_field | (可選,字符串類型)—若指定該字段,則intervals查詢不使用上層轉而以該字段進行查詢 |
2.4、fuzzy規則參數說明
fuzzy規則匹配與給定詞(term)相似詞(可編輯距離內的term)的匹配結果,若模糊匹配的詞(term)超過128個則ES將報錯;
具體匹配參數
序號 | 參數 | 描述 |
---|---|---|
1 | term | (必須,字符串類型)—需要匹配的詞 |
2 | prefix_length | (可選,字符串類型)—創建擴展時起始字符數保持不變,默認起始字符數爲0 |
3 | transpositions | (可選,布爾類型)—確定編輯時是否包括兩個相鄰字符的換位(ab->ba),默認爲true |
4 | fuzziness | (可選,字符串類)—匹配允許的最大編輯距離,默認爲auto |
5 | analyzer | (可選,字符串類型)—分詞器用於對term進行normalize處理,默認爲上層指定的分詞器 |
6 | use_field | (可選,字符串類型)—若指定該字段,則intervals查詢不使用上層轉而以該字段進行查詢 |
2.5、all_of規則參數說明
all_of規則返回的匹配結果是跨越多個組合規則而得到的;
具體匹配參數
序號 | 參數 | 描述 |
---|---|---|
1 | intervals | (必須,對象數組)—需要組合的規則數組;所有規則都必須在文檔中產生匹配項以使最終有匹配文檔 |
2 | max_gaps | (可選,數值類型)—匹配詞(term)之間最大間隔,默認爲-1;未指定或指定爲-1則匹配無間隔限制,設置爲0則匹配詞必須要在已匹配詞的下個詞開始匹配(連續) |
3 | ordered | (可選,布爾類型)—值爲true表示匹配詞必須按照指定順序出現,默認爲false |
4 | filter | (可選,規則對象)—對應一個interval filter |
2.6、any_of規則參數說明
any_of規則匹配任何子規則的文檔;
具體匹配參數
序號 | 參數 | 描述 |
---|---|---|
1 | intervals | (必須,對象數組)—需要任一匹配的規則數組; |
2 | filter | (可選,規則對象)—對應一個interval filter |
2.6、filter規則參數說明
filter規則是基於查詢返回intervals;
具體匹配參數
序號 | 參數 | 描述 |
---|---|---|
1 | after | (可選,查詢對象)—query的interval在filter的interval之後 |
2 | before | (可選,規則對象)—query的interval在filter的interval之前 |
3 | contained_by | (可選,查詢對象)—filter中的interval包含query的interval |
4 | containing | (可選,查詢對象)—query的interval包含filter的interval |
5 | not_contained_by | (可選,查詢對象)—filter中的interval不包含query的interval |
6 | not_containing | (可選,查詢對象)—query的interval不包含filter的interval |
7 | not_overlapping | (可選,查詢對象)—filter中的interval與query的interval不重疊 |
8 | overlapping | (可選,查詢對象)—filter中的interval與query的interval相互重疊 |
9 | script | (可選,腳本對象)—腳本用於返回匹配的文檔 |
//以下查詢包含filter規則,有兩個限制條件:
//1、要求desc字段查詢時指定的query字段中兩個詞相隔不得超過3個位置(max_gaps)
//2、在匹配詞'distributed engine'之間不允許包含'redis'字段
POST software/_search
{
"query": {
"intervals":{
"desc":{
"match":{
"query":"distributed engine",
"max_gaps": 3,
"filter":{
"not_containing":{
"match":{
"query": "redis"
}
}
}
}
}
}
}
}
//結果返回,可結合不同情況分別測試
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.19999999,
"hits" : [
{
"_index" : "software",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.19999999,
"_source" : {
"title" : "elasticsearch",
"desc" : "Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack"
}
},
{
"_index" : "software",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.19999999,
"_source" : {
"title" : "elasticsearch",
"desc" : "distributed search and analytics engine at the heart of the Elastic Stack"
}
}
]
}
}
//查詢的字段'distributed engine'要在'redis'之前
GET software/_search
{
"query": {
"intervals":{
"desc":{
"match":{
"query":"distributed engine",
"max_gaps": 3,
"filter":{
"before":{
"match":{
"query": "redis"
}
}
}
}
}
}
}
}
//結果返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.19999999,
"hits" : [
{
"_index" : "software",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.19999999,
"_source" : {
"title" : "elasticsearch",
"desc" : "distributed search redis analytics engine redis"
}
}
]
}
}
GET software/_search
{
"query": {
"intervals":{
"desc":{
"match":{
"query":"distributed engine",
"filter":{
"script":{
"source":"interval.start > 1 && interval.end < 10 && interval.gaps == 3"
}
}
}
}
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.19999999,
"hits" : [
{
"_index" : "software",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.19999999,
"_source" : {
"title" : "elasticsearch",
"desc" : "Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack"
}
}
]
}
}
最小化
intervals查詢總是最小化間隔(interval)以保證查詢時間在線性範圍內;這在有時候會出現令人不解的情況,尤其是在使用了max_gaps參數或filter的情況下;例如以下查詢希望’library API’短語中包含code的查詢:
//
GET software/_search
{
"query": {
"intervals":{
"desc":{
"match":{
"query":"library API",
"filter":{
"contained_by":{
"match":{
"query":"code"
}
}
}
}
}
}
}
}
以上的查詢語句並不與短語but rather a code library and API that can easily be used
匹配,可以將contained_by
改成after
進行匹配;
另外的一個限制是在any_of
子規則查詢當中出現的重疊短語;即當一個較短短語匹配則較長短語將永遠無法匹配到,這在組合使用max_gaps時返回令人不解的結果,考慮以下的查詢:
GET software/_search
{
"query": {
"intervals": {
"desc": {
"all_of": {
"intervals": [
{
"match": {
"query": "add"
}
},
{
"any_of": {
"intervals": [
{
"match": {
"query": "search"
}
},
{
"match": {
"query": "search capabilities"
}
}
]
}
},
{
"match": {
"query": "to"
}
}
],
"max_gaps": 0,
"ordered": true
}
}
}
}
}
以上這個查詢將永遠也不會匹配add search capabilities to
,因爲any_of
的規則只會產生search
,在這種情況下就需要重寫上面的查詢條件,重寫之後的條件如下:
GET software/_search
{
"query": {
"intervals": {
"desc": {
"any_of": {
"intervals": [
{
"match": {
"query": "add search capabilities to",
"max_gaps": 0,
"ordered": true
}
},
{
"match": {
"query": "add search to",
"max_gaps": 0,
"ordered": true
}
}
]
}
}
}
}
}
//以上兩個查詢條件結果相同
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.3333333,
"hits" : [
{
"_index" : "software",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.3333333,
"_source" : {
"title" : "lucene",
"desc" : "Lucene is not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications"
}
}
]
}
}
以下爲查詢的索引文檔信息
PUT software/_doc/1
{
"title":"elasticsearch",
"desc":"Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack"
}
PUT software/_doc/2
{
"title":"redis",
"desc":"Redis is an open source, in-memory data structure store, used as a database, cache and message broker"
}
PUT software/_doc/3
{
"title":"Luence",
"desc":"Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities"
}
PUT software/_doc/4
{
"title":"elasticsearch",
"desc":"distributed search and analytics engine at the heart of the Elastic Stack"
}
PUT software/_doc/5
{
"title":"elasticsearch",
"desc":"distributed search redis analytics engine redis"
}
PUT software/_doc/6
{
"title":"lucene",
"desc":"Lucene is not a complete application, but rather a code library and API that can easily be used to add search capabilities to applications"
}