第二章 Elasticsearch入門
添加索引
添加的方法有很多,簡單的來說就是將JSON格式的數據放到Elasticsearch 索引中。
以下請求都在shell中,貌似不太專業,你可以直接在Kibana操作。
每個請求後面綴上?pretty
或?pretty=true
,會以JSON的格式展示在shell中,例如:
curl -X GET "localhost:9200/bank/_mapping?pretty" -H 'Content-Type: application/json'
添加一個
直接使用PUT請求,將指定索引添加到文檔的索引。如果該請求customer尚不存在,此請求將自動創建該索引,添加ID爲1
添加命令
curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d' { "name": "John Doe" } '
返回值
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
通過id查看文檔
curl -X GET "localhost:9200/customer/_doc/1?pretty"
返回值
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 26,
"_primary_term" : 4,
"found" : true,
"_source" : {
"name": "John Doe"
}
}
批量索引文檔
如果一次性添加一個以上的索引,則可以使用批量添加API,可以優化運行速度。這取決於,文檔的大小和複雜性、索引編制和搜索負載以及集羣的可用資源
Elasticsearch 官方提供的示例數據
運行命令
curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"
返回結果,後面會詳述其含義
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 128.6kb 128.6kb
查看索引結構
命令
curl 'http://localhost:9200/bank/_mapping'
返回結果
{
"bank": {
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"balance": {
"type": "long"
},
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"employer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
簡單搜索
每個搜索都是獨立的,Elasticsearch 在請求中不維護任何狀態信息
命令
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
根據from和size進行分頁查詢
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
],
"from": 10,
"size": 10
}
'
返回結果
{
"took" : 160,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"_score" : null,
"_source" : {
"account_number" : 0,
"balance" : 16623,
"firstname" : "Bradshaw",
"lastname" : "Mckenzie",
"age" : 29,
"gender" : "F",
"address" : "244 Columbus Place",
"employer" : "Euron",
"email" : "[email protected]",
"city" : "Hobucken",
"state" : "CO"
},
"sort" : [
0
]
},
# 省略其餘結果
]
}
}
結果說明
- took Elasticsearch 運行查詢花費時間,以毫秒爲單位
- time_out 請求是否超時(在請求的時候可以設定)
- _shards 搜索了多少分片,以及成功、失敗或是跳過了多少(這裏涉及到分佈式部署)
- max_source 找到的最相關文檔的分數
- hits.total.value 找到了多少匹配的文檔
- hits.sort 文檔的排序位置(不按照默認的相關性得分排序,在搜索時指定排序字段)
- hits._source 文檔的相關性得分
match查詢(字段的類型,會有差異,後續文章中會詳述)
匹配字段包含某個單詞。address字段包含mail或lane
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match": { "address": "mill lane" } }
}'
match_phrase查詢
詞組搜索,某字段包含某個短語。address字段包含mill lane這個詞組
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": { "match_phrase": { "address": "mill lane" } }
}
'
利用bool查詢組合多個查詢條件
命令解釋:age = 40 && state != "ID"
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
'
必須指定某個查詢條件的匹配程度(類似於與或非)
- must:結果必須滿足查詢條件。會影響相關性得分
- should:結果應該滿足查詢條件,但可以不滿足。會影響相關性得分
- mast_not:結果必須不滿足查詢條件。不會影響相關性得分
搜索分析
group_by_state
:利用terms
將bank
中的所有賬戶索引根據字段state
分組- 如果指定
order
,則按照其指定的參數排序否則按照分組的大小倒序排列返回,一次返回10個 average_balance
:按照字段balance
,以計算每個state
的平均帳戶餘額- 除了基本的聚合功能外,Elasticsearch 還提供了專門的聚合方法,用於在多個字段上操作並分析特定類型的數據,例如:ip、日期、地理數據。
請求
curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
'
返回結果
{
"took" : 255,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1000,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_state" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 827,
"buckets" : [
{
"key" : "CO",
"doc_count" : 14,
"average_balance" : {
"value" : 32460.35714285714
}
},
}
]
}
}
}