ElasticSearch學習筆記
1、ElasticSearch安裝
替換 ik分詞器 :版本要對應,如果不對應,會報錯..
需要Java JDK 配置。
2、ElasticSearch簡單的CRUD
1> 創建索引------>> 類型------>>文檔
給字段確定類型
PUT /schools/_mapping/school
{
"properties":{
"TimeFormat":{
"type":"date",
"format":"yyyy-MM-dd HH:mm:ss"
}
}
}
創建index 爲student ,type爲article 的 字段subject 類型爲text 使用ik_max_word 分詞器的文檔。
PUT /student/?pretty
{
"settings" : {
"analysis" : {
"analyzer" : {
"ik" : {
"tokenizer" : "ik_max_word"
}
}
}
},
"mappings" : {
"article" : {
"dynamic" : true,
"properties" : {
"subject" : {
"type" : "text",
"analyzer" : "ik_max_word"
}
}
}
}
}
如果不手動指定,分詞器就不會默認使用ik .且以上只能針對文檔中的字段指定
以下針對index 進行指定使用ik分詞器
PUT /students
{
"settings" : {
"index" : {
"analysis.analyzer.default.type": "ik_max_word"
}
}
}
A . 單條插入
PUT http://localhost:9200/movies/movie/3
{
"title": "To Kill a Mockingbird",
"director": "Robert Mulligan",
"year": 1962
}
PUT url/index/type/id
{
“字段”:”值”,
“字段”:”值”,
“字段”:”值”,
....
}
使用以上格式創建索引、類型、文檔
{ "_index": "movies", "_type": "movie", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true }
Version,爲1,result 爲:created
B. 批量插入
POST /schools/_bulk
{"index":{"_index":"schools","_type":"school","_id":"1"}}
{"name":"Central School","description":"CBSE Affiliation","street":"Nagan","city":"paprola","state":"HP","zip":"176115","location":[31.8955385,76.8380405],"fees":2000,"tags":["Senior Secondary","beautiful campus"],"rating":"3.5"}
{"index":{"_index":"schools","_type":"school","_id":"2"}}
{"name":"Saint Paul School","description":"ICSE Afiliation","street":"Dawarka","city":"Delhi","state":"Delhi","zip":"110075","location":[28.5733056,77.0122136],"fees":5000,"tags":["Good Faculty","Great Sports"],"rating":"4.5"}
{"index":{"_index":"schools","_type":"school","_id":"3"}}
{"name":"Crescent School","description":"State Board Affiliation","street":"Tonk Road","city":"Jaipur","state":"RJ","zip":"176114","location":[26.8535922,75.7923988],"fees":2500,"tags":["Well equipped labs"],"rating":"4.5"}
使用_bulk 進行批量的插入數據。
2> 修改文檔
現在,在索引中有了一部電影信息,接下來來了解如何更新它,添加一個類型列表。要做到這一點,只需使用相同的ID索引它。使用與之前完全相同的索引請求,但類型擴展了JSON對象
PUT http://localhost:9200/movies/movie/3
{
"title": "To Kill a Mockingbird",
"director": "Robert Mulligan",
"year": 1962,
"genres": ["Crime", "Drama", "Mystery"]
}
響應如下:
{ "_index": "movies", "_type": "movie", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false }
Version,變爲了2,result 爲:updated
修改文檔的單個字段 (script inline)
POST schools/school/_update_by_query
{
"script": {
"inline": "ctx._source.TimeFormat ='2016-09-08 15:20:30';ctx._source.zip='1766889'"
},
"query":{
"term":{
"city":"delhi"
}
}
}
3> 刪除文檔
爲了通過ID從索引中刪除單個指定的文檔,使用與獲取索引文檔相同的URL,只是這裏將HTTP方法更改爲DELETE。
DELETE http://localhost:9200/movies/movie/3
返回響應:
{
"_index": "movies",
"_type": "movie",
"_id": "1",
"_version": 2,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 5,
"_primary_term": 1
}
4> 查詢文檔
爲了通過ID從索引中查詢單個指定的文檔,使用與獲取索引文檔相同的URL,只是這裏將HTTP方法更改爲GET。
GET http://localhost:9200/movies/movie/3
條件搜索:
常用查詢:
全文本查詢:針對文本
1、查詢全部:match_all
2、模糊匹配: match (類似sql 的 like)
3、全句匹配: match_phrase (類似sql 的 = )
4、多字段匹配:multi_match (多屬性查詢)
5、語法查詢:query_string (直接寫需要配置的 關鍵字 )
6、字段查詢 : term (針對某個屬性的查詢,這裏注意 term 不會進行分詞,比如 在 es 中 存了 “火鍋” 會被分成 “火/鍋” 當你用 term 去查詢 “火時能查到”,但是查詢 “火鍋” 時,就什麼都沒有,而 match 就會將詞語分成 “火/鍋”去查)
7、範圍查詢:range ()
字段查詢:針對結構化數據,如數字,日期 。。。
分頁:
“from”: 10,
“size”: 10
constant_score: 固定分數。
filter: 查詢: (query 屬於類似就可以查出來,而 filter 類似 = 符號,要麼成功,要麼失敗,沒有中間值,查詢速度比較快
1、查詢全部:match_all
POST _search
{
"query": {
"match_all": {}
}
}
2、模糊匹配: match (類似sql 的 like)
POST /schools/school/_search
{
"query": {
"match": {
"name":"Saint Paul School"
}
}
}
使用 match 進行搜索時:搜索內容通過分詞器進行分詞後,與文本分詞後的結果進行匹配,如上例:搜索 /schools/school/ 中的name 字段中 Saint Paul School 進過分詞的所有匹配項 ,只要name中有分詞其中之一就會被匹配。
3、全句匹配: match_phrase (類似sql 的 = )
POST /schools/school/_search
{
"query": {
"match_phrase": {
"name":"Saint Paul School"
}
}
}
使用 match_phrase進行搜索時:搜索內容通過分詞器進行分詞後,與文本分詞後的結果進行連續,精確的匹配,如上例:搜索 /schools/school/ 中的name 字段中 Saint Paul School 進過分詞的所有匹配項 ,只有name中同時有Saint、 Paul 、School 三個連續的分詞纔會被匹配。相當於是對 sql中 =的用法,但可以忽略 空格。
4、多字段匹配:multi_match (多屬性查詢)
POST /schools/school/_search
{
"query": {
"multi_match": {
"query":"Saint Paul School",
"fields": [
"name","tags"
]
}
}
}
multi_match 可以對多字段進行模糊搜索, query 中的搜索字段會被分詞,並各自匹配,fields 字段用來確定搜索的字段。
5、語法查詢:query_string (直接寫需要配置的 關鍵字 )
POST /schools/school/_search
{
"query": {
"query_string": {
"query":"Saint Paul School",
"fields": [
"name","tags"
]
}
}
}
query_string 可以對多字段進行模糊搜索, query 中的搜索字段會被分詞,並各自匹配,fields 字段用來確定搜索的字段。
6、字段查詢 : term
POST /schools/school/_search
{
"query": {
"term": {
"name":"Saint Paul School"
}
}
}
Term 搜索時,需要沒有空格,不會進行分詞,還需要條件全小寫。要不然查不出來....
7、範圍查詢:range ()
POST /schools/school/_search
{
"query": {
"range": {
"fees": {
"from": 1000,
"to": 2500
}
}
}
}
組合查詢不好使,大概需要 bool 查詢....
8、bool 查詢
POST /schools/school/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"fees": {
"from": 1000,
"to": 3000
}
}
},
{
"match": {
"name": "School"
}
},
{
"wildcard": {
"zip": {
"value": "17*15"
}
}
}
],
"boost": 1,
"must_not": [
{
"term": {
"name": {
"value": "to"
}
}
}
],
"should": [
{
"match": {
"city": "paprola"
}
}
]
}
}
}
9、高亮設置
POST /schools/school/_search
{
"query": {
"match": {
"name": "Saint school"
}
},
"highlight": {
"fields": {
"name":{}
}
}
}
10、分頁 from 當前行數,從0開始(是行數,不是頁碼!!) size 展示條數(下圖,第二行開始,查一條數據)
POST /schools/school/_search
{
"query": {
"match": {
"name": "Saint school"
}
},
"highlight": {
"fields": {
"name":{}
}
}
, "from": 1
, "size": 1
}
11、過濾查詢 ,查詢多個filter,sort 以數組的形式查詢。
POST /schools/school/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "school"
}
}
],
"filter":[{
"exists": {
"field": "name"
}
},
{
"range": {
"fees": {
"from": 10,
"to": 2000
}
}
}
]
}
}
, "from": 1
, "size": 10
, "sort": [
{
"fees": {
"order": "desc"
}
}
]
}
11.1、 id過濾器
11.2、 range 過濾器
11.3、exists 過濾器
11.4、term/terms 過濾器
POST /schools/school/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "school"
}
}
],
"filter":[{
"exists": {
"field": "name"
}
},
{
"range": {
"fees": {
"from": 10,
"to": 5000
}
}
},
{
"ids":{
"values":[1,2,3]
}
},{
"term":{
"street":"tonk"
}
}
]
}
}
, "from": 0
, "size": 10
, "sort": [
{
"fees": {
"order": "desc"
}
}
]
}
11、聚合(Aggregations)
聚合提供了功能可以分組並統計你的數據。理解聚合最簡單的方式就是可以把它粗略的看做SQL的GROUP BY 操作和SQL 的聚合函數。
ES中常用的聚合:
Metric(度量聚合) :度量聚合主要針對number類型的數據,需要ES做比較多的計算工作
Bucketing (桶聚合):劃分不同的“桶”,將數據分配到不同的“桶”裏。非常類似sql中的group By 語句的含義。
ES中的聚合API(格式) :
"aggregations" : { // 表示聚合操作,可以使用aggs替代
"<aggregation_name>" : { // 聚合名,可以是任意的字符串。用做響應的key,便於快速取得正確的響應數據。
"<aggregation_type>" : { // 聚合類別,就是各種類型的聚合,如min等
<aggregation_body> // 聚合體,不同的聚合有不同的body
}
[,"aggregations" : { [<sub_aggregation>]+ } ]? // 嵌套的子聚合,可以有0或多個
}
[,"<aggregation_name_2>" : { ... } ]* // 另外的聚合,可以有0或多個
}
1. 度量(metric)聚合
A、(avg)平均值聚合 、(min) 最小值聚合、(max)最大值聚合、(sum)相加和聚合 、(stats)以上4種打包聚合
query": {
"match": {
"name": "Saint school"
}
},
"highlight": {
"fields": {
"name": {}
}
},
"aggregations":
{
"fees_avg": {
"avg": {
"field": "fees"
}
}, "fees_min": {
"min": {
"field": "fees"
}
}, "fees_max": {
"max": {
"field": "fees"
}
}, "fees_sum": {
"sum": {
"field": "fees"
}
}, "fees_stats": {
"stats": {
"field": "fees"
}
}
}
,
"from": 0,
"size": 10
}
2. 桶(bucketing)聚合
自定義區間範圍的聚合(range)to不包含自身
POST /schools/school/_search
{
"query": {
"match": {
"name": "Saint school"
}
},
"highlight": {
"fields": {
"name": {}
}
},
"aggregations": {
"fees_range": {
"range": {
"field": "fees",
"ranges": [
{
"from": 0,
"to": 2000
},
{
"from": 2000,
"to": 3000
},
{
"from": 3000,
"to": 5001
}
]
}
}
},
"from": 0,
"size": 10
}
自定義分組依據Term(不能選擇text類型的field)
POST /schools/school/_search
{
"query": {
"match": {
"name": "Saint school"
}
},
"highlight": {
"fields": {
"name": {}
}
},
"aggregations": {
"fees_term": {
"terms": {
"field": "location",
"size":3
}
}
},
"from": 0,
"size": 10
}
時間區間聚合(Date Range Aggregation)
# 時間區間聚合專門針對date類型的字段,它與Range Aggregation的主要區別是其可以使用時間運算表達式。
#now+10y:表示從現在開始的第10年。
#now+10M:表示從現在開始的第10個月。
#1990-01-10||+20y:表示從1990-01-01開始後的第20年,即2010-01-01。
#now/y:表示在年位上做舍入運算。
POST /schools/school/_search
{
"query": {
"match": {
"name": "Saint school"
}
},
"highlight": {
"fields": {
"name": {}
}
},
"aggregations": {
"fees_term": {
"terms": {
"field": "location",
"size":3
}
},
"time_aggs":{
"date_range":{
"field":"TimeFormat",
"format":"yyyy-MM-dd",
"ranges":[
{
"from":"now/y",
"to":"now"
},
{
"from":"now/y-1y",
"to":"now/y"
},
{
"from":"now/y-3y",
"to":"now/y-1y"
}
]
}
}
},
"from": 0,
"size": 10
}
直方圖聚合(Histogram Aggregation)
# Histogram Aggregation
#直方圖聚合,它將某個number類型字段等分成n份,統計落在每一個區間內的記錄數。它與前面介紹的Range聚合
# 非常像,只不過Range可以任意劃分區間,而Histogram做等間距劃分。既然是等間距劃分,那麼參數裏面必然有距離參數,就是interval參數。
POST /schools/school/_search
{
"query": {
"match": {
"name": "Saint school"
}
},
"highlight": {
"fields": {
"name": {}
}
},
"aggregations": {
"fees_aggs":{
"histogram":{
"field":"fees",
"interval":1000
}
}, "time_agg":{
"date_histogram":{
"field":"TimeFormat",
"interval":"year",
"format":"yyyy-MM_dd"
}
}
},
"from": 0,
"size": 10
}