安裝Kibana
這裏主要是爲了使用Kibana的Dev Tools控制檯方便訪問ES
這裏直接使用docker-compose安裝,並帶有倆個elasticsearch組成的僞集羣:
version: '3.7'
networks:
esnet:
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
container_name: elasticsearch
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- http.cors.enabled=true
- http.cors.allow-origin=*
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata1:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- esnet
elasticsearch2:
image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
container_name: elasticsearch2
environment:
- cluster.name=docker-cluster
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- "discovery.zen.ping.unicast.hosts=elasticsearch"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata2:/usr/share/elasticsearch/data
networks:
- esnet
kibana:
image: docker.elastic.co/kibana/kibana:6.5.1
environment:
- SERVER_NAME=kibana
- ELASTICSEARCH_URL=http://elasticsearch:9200
- XPACK_MONITORING_ENABLED=true
ports:
- 5601:5601
networks:
- esnet
volumes:
esdata1:
driver: local
esdata2:
driver: local
index API
新增 index
PUT employee
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
可以在新增時對index進行定製化配置,詳細配置可參考官網的:index settings
刪除index
DELETE employee
插入document
如果不需要對index做定製化配置,可以通過以下API自動創建index並插入數據:
PUT /employee/_doc/1
{
"name" : "zhangsan",
"age" : 28,
"signature":"I like watching movies",
"hobby" : ["book","music"]
}
這裏使用PUT請求在es中新增了一個員工信息,es中的數據都是存儲在index中,從6.x版本開始,一個index下只能有一個type,並且推薦設置爲 “_doc”,因爲在7.x版本中棄用了type的概念,舊版API中type的位置在7.x中只能是“_doc”。
查詢API
單個查詢document
GET /employee/_doc/1
驗證單個document是否存在
HEAD employee/_doc/1
簡單搜索
這裏使用_search API默認做index下的全查詢:
GET employee/_doc/_search
還可以使用 q 參數添加查詢條件:
GET employee/_doc/_search?q=name:zhangsan
查詢表達式搜索
表達式全查詢
GET /employee/_doc/_search
{
"query":{
"match_all":{ }
}
}
帶條件的表達式查詢
employee 索引中 name 爲 zhangsan的員工
GET /employee/_doc/_search
{
"query":{
"match":{
"name" : "zhangsan"
}
}
}
帶條件和過濾器的查詢
employee 索引中 age 大於27且 name 是 zhangsan
GET /employee/_doc/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "zhangsan"
}
},
"filter": {
"range": {
"age": {
"gt": 27
}
}
}
}
}
}
- must 中爲必須匹配的條件
- filter 中爲過濾條件,range是一個範圍過濾器
全文檢索
查詢有 signature 中包含 dislike watching movies 的 employee
GET /employee/_doc/_search
{
"query":{
"match":{
"signature":"dislike watching movies"
}
}
}
這裏貼一下結果中的 hits 部分
"hits" : {
"total" : 3,
"max_score" : 1.1064433,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1064433,
"_source" : {
"name" : "lisi",
"age" : 27,
"signature" : "I dislike watching movies,I like reading",
"hobby" : [
"movie",
"music"
]
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.29748765,
"_source" : {
"name" : "zhangsan",
"age" : 28,
"signature" : "I like watching movies",
"hobby" : [
"book",
"music"
]
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.27407023,
"_source" : {
"name" : "wangwu",
"age" : 26,
"signature" : "I also like watching movies",
"hobby" : [
"book",
"game"
]
}
}
]
}
三個 employee 的 signature 並沒有含有全部的 dislike watching movies,但是查詢出的員工的signature字段至少會含有其中一個單詞,並且各個員工的 _score 字段值不一樣,完全包含dislike watching movies三個單詞的lisi員工份數最高,且這三個 employee 的順序也是按照 _score 字段從高到低排列的。這個 _score 是文檔的相關性得分。
短語精確匹配
GET /employee/_doc/_search
{
"query":{
"match_phrase":{
"signature":"dislike watching movies"
}
}
}
匹配條件與上一個一樣,只是將查詢API從 match 變爲 match_phrase,還是貼出查詢結果的 hist 部分:
"hits" : {
"total" : 1,
"max_score" : 1.1064433,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1064433,
"_source" : {
"name" : "lisi",
"age" : 27,
"signature" : "I dislike watching movies,I like reading",
"hobby" : [
"movie",
"music"
]
}
}
]
}
結果只有一條,這一條的signature字段必定包含 dislike watching movies 這個短語。
高亮搜索
api是 highlight ,注意它是與 query 同級的。
GET /employee/_doc/_search
{
"query":{
"match":{
"signature":"dislike watching movies"
}
},
"highlight":{
"fields":{
"signature":{}
}
}
}
下面是查詢結果:
"hits" : {
"total" : 3,
"max_score" : 1.1064433,
"hits" : [
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1064433,
"_source" : {
"name" : "lisi",
"age" : 27,
"signature" : "I dislike watching movies,I like reading",
"hobby" : [
"movie",
"music"
]
},
"highlight" : {
"signature" : [
"I <em>dislike</em> <em>watching</em> <em>movies</em>,I like reading"
]
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.29748765,
"_source" : {
"name" : "zhangsan",
"age" : 28,
"signature" : "I like watching movies",
"hobby" : [
"book",
"music"
]
},
"highlight" : {
"signature" : [
"I like <em>watching</em> <em>movies</em>"
]
}
},
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.27407023,
"_source" : {
"name" : "wangwu",
"age" : 26,
"signature" : "I also like watching movies",
"hobby" : [
"book",
"game"
]
},
"highlight" : {
"signature" : [
"I also like <em>watching</em> <em>movies</em>"
]
}
}
]
}
查詢的每一個結果中多了一個 highlight 字段,該字段中會將目標字段中符合查詢條件的單詞用 <em>標籤包上。
聚合分析
這裏做一個最受歡迎的 hobby:
GET /employee/_doc/_search
{
"aggs":{
"all_hobby":{
"terms":{
"field":"hobby"
}
}
}
}
- aggs 表示聚合api開始
- all_hobby 爲此次聚合統計名稱,任意定義
- terms 爲聚合api中的分詞統計api,可以對指定字段分詞,並統計每個詞組在全文中的出現次數
- field 指定分析字段
但是這裏執行報錯了:
root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [hobby] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
根據提示,文本類型字段的 fielddata 屬性默認爲關閉的,需要手動開啓:
PUT /employee/_mapping/_doc
{
"properties":{
"hobby":{
"type":"text",
"fielddata":"true"
}
}
}
這裏將hobby的fielddata設值爲true後,ES會對hobby生成一個反向的倒排索引,類似於數據庫中的索引,爲了做分析、統計等功能。但是額外的索引會佔用內存,建議不要在數據量較多的字段設置。也可以使用keyword字段來做分析、統計,像這樣:
GET /employee/_doc/_search
{
"aggs":{
"all_hobby":{
"terms":{
"field":"hobby.keyword"
}
}
}
}
再次聚合統計,結果的 buckets 如下:
"buckets" : [
{
"key" : "book",
"doc_count" : 2
},
{
"key" : "music",
"doc_count" : 2
},
{
"key" : "game",
"doc_count" : 1
},
{
"key" : "movie",
"doc_count" : 1
}
]
可以看到最受歡迎的hobby是 book 。
也可以和 query 一起使用,它會在查詢的結果中進行聚合統計,像這樣:
GET /employee/_doc/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gt": 26
}
}
}
}
},
"aggs": {
"all_hobby": {
"terms": {
"field": "hobby.keyword"
}
}
}
}
多層聚合
比如統計不同的 hobby 包含員工的平均年齡:
GET /employee/_doc/_search
{
"aggs":{
"all_hobby":{
"terms":{
"field":"hobby.keyword"
},
"aggs":{
"avg_age":{
"avg":{
"field":"age"
}
}
}
}
}
}
結果如下:
"buckets" : [
{
"key" : "book",
"doc_count" : 2,
"avg_age" : {
"value" : 27.0
}
},
{
"key" : "music",
"doc_count" : 2,
"avg_age" : {
"value" : 27.5
}
},
{
"key" : "game",
"doc_count" : 1,
"avg_age" : {
"value" : 26.0
}
},
{
"key" : "movie",
"doc_count" : 1,
"avg_age" : {
"value" : 27.0
}
}
]
這裏結果看起來比較複雜,拿出第一個:
{
"key" : "book",
"doc_count" : 2,
"avg_age" : {
"value" : 27.0
}
}
- key 分析的詞組,從字段中分詞獲取
- doc_count 是分詞統計結果,也就是key中的詞組在全文中有多少個員工包含它
- avg_age 這個是內層聚合分析名稱,在查詢時自定義的
- value 是平均年齡,這個平均年齡是針對於上層的統計結果而言的,在這裏就是對 hobby 含有 book 詞組的倆個員工計算他們的平均年齡。