1. query DSL入門
1.1 DSL
query string 後邊的參數原來越多,搜索條件越來越複雜,不能滿足需求。
GET /book/_search?q=name:java&size=10&from=0&sort=price:desc
DSL:Domain Specified Language,特定領域的語言
es特有的搜索語言,可在請求體中攜帶搜索條件,功能強大。
查詢全部 GET /book/_search
GET /book/_search
{
"query": { "match_all": {} }
}
排序 GET /book/_search?sort=price:desc
GET /book/_search
{
"query" : {
"match" : {
"name" : " java"
}
},
"sort": [
{ "price": "desc" }
]
}
分頁查詢 GET /book/_search?size=10&from=0
GET /book/_search
{
"query": { "match_all": {} },
"from": 0,
"size": 1
}
指定返回字段 GET /book/ _search? _source=name,studymodel
GET /book/_search
{
"query": { "match_all": {} },
"_source": ["name", "studymodel"]
}
通過組合以上各種類型查詢,實現複雜查詢。
1.2. Query DSL語法
{
QUERY_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
{
QUERY_NAME: {
FIELD_NAME: {
ARGUMENT: VALUE,
ARGUMENT: VALUE,...
}
}
}
GET /test_index/_search
{
"query": {
"match": {
"test_field": "test"
}
}
}
1.3 組合多個搜索條件
搜索需求:title必須包含elasticsearch,content可以包含elasticsearch也可以不包含,author_id必須不爲111
sql where and or !=
初始數據:
POST /website/_doc/1
{
"title": "my hadoop article",
"content": "hadoop is very bad",
"author_id": 111
}
POST /website/_doc/2
{
"title": "my elasticsearch article",
"content": "es is very bad",
"author_id": 112
}
POST /website/_doc/3
{
"title": "my elasticsearch article",
"content": "es is very goods",
"author_id": 111
}
搜索:
GET /website/_doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "elasticsearch"
}
}
],
"should": [
{
"match": {
"content": "elasticsearch"
}
}
],
"must_not": [
{
"match": {
"author_id": 111
}
}
]
}
}
}
返回:
{
"took" : 488,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47000363,
"hits" : [
{
"_index" : "website",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.47000363,
"_source" : {
"title" : "my elasticsearch article",
"content" : "es is very bad",
"author_id" : 112
}
}
]
}
}
更復雜的搜索需求:
select * from test_index where name='tom' or (hired =true and (personality ='good' and rude != true ))
GET /test_index/_search
{
"query": {
"bool": {
"must": { "match":{ "name": "tom" }},
"should": [
{ "match":{ "hired": true }},
{ "bool": {
"must":{ "match": { "personality": "good" }},
"must_not": { "match": { "rude": true }}
}}
],
"minimum_should_match": 1
}
}
}
2. full-text search 全文檢索
2.1 全文檢索
重新創建book索引
PUT /book/
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"description":{
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"studymodel":{
"type": "keyword"
},
"price":{
"type": "double"
},
"timestamp": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"pic":{
"type":"text",
"index":false
}
}
}
}
插入數據
PUT /book/_doc/1
{
"name": "Bootstrap開發",
"description": "Bootstrap是由Twitter推出的一個前臺頁面開發css框架,是一個非常流行的開發框架,此框架集成了多種頁面效果。此開發框架包含了大量的CSS、JS程序代碼,可以幫助開發者(尤其是不擅長css頁面開發的程序人員)輕鬆的實現一個css,不受瀏覽器限制的精美界面css效果。",
"studymodel": "201002",
"price":38.6,
"timestamp":"2019-08-25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "bootstrap", "dev"]
}
PUT /book/_doc/2
{
"name": "java編程思想",
"description": "java語言是世界第一編程語言,在軟件開發領域使用人數最多。",
"studymodel": "201001",
"price":68.6,
"timestamp":"2019-08-25 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "java", "dev"]
}
PUT /book/_doc/3
{
"name": "spring開發基礎",
"description": "spring 在java領域非常流行,java程序員都在用。",
"studymodel": "201001",
"price":88.6,
"timestamp":"2019-08-24 19:11:35",
"pic":"group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags": [ "spring", "java"]
}
搜索
GET /book/_search
{
"query" : {
"match" : {
"description" : "java程序員"
}
}
}
2.2 _score初探
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.137549,
"hits" : [
{
"_index" : "book",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.137549,
"_source" : {
"name" : "spring開發基礎",
"description" : "spring 在java領域非常流行,java程序員都在用。",
"studymodel" : "201001",
"price" : 88.6,
"timestamp" : "2019-08-24 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"spring",
"java"
]
}
},
{
"_index" : "book",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.57961315,
"_source" : {
"name" : "java編程思想",
"description" : "java語言是世界第一編程語言,在軟件開發領域使用人數最多。",
"studymodel" : "201001",
"price" : 68.6,
"timestamp" : "2019-08-25 19:11:35",
"pic" : "group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg",
"tags" : [
"java",
"dev"
]
}
}
]
}
}
結果分析
1、建立索引時, description字段 term倒排索引
java 2,3
程序員 3
2、搜索時,直接找description中含有java的文檔 2,3,並且3號文檔含有兩個java字段,一個程序員,所以得分高,排在前面。2號文檔含有一個java,排在後面。
3. DSL 語法練習
3.1 match_all
查詢所有文檔
GET /book/_search
{
"query": {
"match_all": {}
}
}
3.2 match
match query 知道分詞器的存在,會對field進行分詞操作,然後再查詢
GET /book/_search
{
"query": {
"match": {
"description": "java程序員"
}
}
}
3.3 multi_match
可以指定多個字段
GET /book/_search
{
"query": {
"multi_match": {
"query": "java程序員",
"fields": ["name", "description"]
}
}
}
3.4、range query
範圍查詢
GET /book/_search
{
"query": {
"range": {
"price": {
"gte": 80,
"lte": 90
}
}
}
}
3.5、term query
字段爲keyword時,存儲和搜索都不分詞
GET /book/_search
{
"query": {
"term": {
"description": "java程序員"
}
}
}
3.6、terms query
查詢某個字段裏含有多個關鍵詞的文檔
GET /book/_search
{
"query": { "terms": { "tags": [ "search", "full_text", "dev" ] }}
}
3.7、exist query
查詢有某些字段值的文檔
GET /_search
{
"query": {
"exists": {
"field": "join_date"
}
}
}
3. 8、Fuzzy query
返回包含與搜索詞類似的詞的文檔,該詞由Levenshtein編輯距離度量。
包括以下幾種情況:
-
更改角色(box→fox)
-
刪除字符(aple→apple)
-
插入字符(sick→sic)
-
調換兩個相鄰字符(ACT→CAT)
GET /book/_search
{
"query": {
"fuzzy": {
"description": {
"value": "jave"
}
}
}
}
3.9、ids
GET /book/_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}
3.10、prefix 前綴查詢
GET /book/_search
{
"query": {
"prefix": {
"description": {
"value": "spring"
}
}
}
}
3.11、regexp query 正則查詢
GET /book/_search
{
"query": {
"regexp": {
"description": {
"value": "j.*a",
"flags" : "ALL",
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
4. Filter
4.1 filter與query示例
需求:用戶查詢description中有"java程序員",並且價格大於80小於90的數據。
GET /book/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "java程序員"
}
},
{
"range": {
"price": {
"gte": 80,
"lte": 90
}
}
}
]
}
}
}
使用filter:
GET /book/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "java程序員"
}
}
],
"filter": {
"range": {
"price": {
"gte": 80,
"lte": 90
}
}
}
}
}
}
4.2 filter與query對比
filter,僅僅只是按照搜索條件過濾出需要的數據而已,不計算任何相關度分數,對相關度沒有任何影響。
query,會去計算每個document相對於搜索條件的相關度,並按照相關度進行排序。
應用場景:
一般來說,如果你是在進行搜索,需要將最匹配搜索條件的數據先返回,那麼用query 如果你只是要根據一些條件篩選出一部分數據,不關注其排序,那麼用filter
4.3 filter與query性能
filter,不需要計算相關度分數,不需要按照相關度分數進行排序,同時還有內置的自動cache最常使用filter的數據
query,相反,要計算相關度分數,按照分數進行排序,而且無法cache結果
5. 定位錯誤語法
驗證錯誤語句:
GET /book/_validate/query?explain
{
"query": {
"mach": {
"description": "java程序員"
}
}
}
返回:
{
"valid" : false,
"error" : "org.elasticsearch.common.ParsingException: no [query] registered for [mach]"
}
正確
GET /book/_validate/query?explain
{
"query": {
"match": {
"description": "java程序員"
}
}
}
返回
{
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"valid" : true,
"explanations" : [
{
"index" : "book",
"valid" : true,
"explanation" : "description:java description:程序員"
}
]
}
一般用在那種特別複雜龐大的搜索下,比如你一下子寫了上百行的搜索,這個時候可以先用validate api去驗證一下,搜索是否合法。
合法以後,explain就像mysql的執行計劃,可以看到搜索的目標等信息。
6. 定製排序規則
6.1 默認排序規則
默認情況下,是按照_score降序排序的
然而,某些情況下,可能沒有用到_score,比如說filter
GET book/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"description": "java程序員"
}
}
]
}
}
}
當然,也可以是constant_score
6.2 定製排序規則
相當於sql中order by ?sort=sprice:desc
GET /book/_search
{
"query": {
"constant_score": {
"filter" : {
"term" : {
"studymodel" : "201001"
}
}
}
},
"sort": [
{
"price": {
"order": "asc"
}
}
]
}
7. Text字段排序問題
如果對一個text field進行排序,結果往往不準確,因爲分詞後是多個單詞,再排序就不是我們想要的結果了。
通常解決方案是,將一個text field建立兩次索引,一個分詞,用來進行搜索;一個不分詞,用來進行排序。
fielddate:true
PUT /website
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"content": {
"type": "text"
},
"post_date": {
"type": "date"
},
"author_id": {
"type": "long"
}
}
}
}
插入數據
PUT /website/_doc/1
{
"title": "first article",
"content": "this is my second article",
"post_date": "2019-01-01",
"author_id": 110
}
PUT /website/_doc/2
{
"title": "second article",
"content": "this is my second article",
"post_date": "2019-01-01",
"author_id": 110
}
PUT /website/_doc/3
{
"title": "third article",
"content": "this is my third article",
"post_date": "2019-01-02",
"author_id": 110
}
搜索
GET /website/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"title.keyword": {
"order": "desc"
}
}
]
}
8. Scroll分批查詢
場景:下載某一個索引中1億條數據,到文件或是數據庫。
不能一下全查出來,系統內存溢出。所以使用scoll滾動搜索技術,一批一批查詢。
scoll搜索會在第一次搜索的時候,保存一個當時的視圖快照,之後只會基於該舊的視圖快照提供數據搜索,如果這個期間數據變更,是不會讓用戶看到的
每次發送scroll請求,我們還需要指定一個scoll參數,指定一個時間窗口,每次搜索請求只要在這個時間窗口內能完成就可以了。
搜索
GET /book/_search?scroll=1m
{
"query": {
"match_all": {}
},
"size": 3
}
返回
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAMOkWTURBNDUtcjZTVUdKMFp5cXloVElOQQ==",
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
]
}
}
獲得的結果會有一個scoll_id,下一次再發送scoll請求的時候,必須帶上這個scoll_id
GET /_search/scroll
{
"scroll": "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAMOkWTURBNDUtcjZTVUdKMFp5cXloVElOQQ=="
}
與分頁區別:
分頁給用戶看的 deep paging
scroll是用戶系統內部操作,如下載批量數據,數據轉移。零停機改變索引映射。