Elasticsearch 6.5.1學習筆記(二)簡單API

安裝Kibana

這裏主要是爲了使用Kibana的Dev Tools控制檯方便訪問ES
這裏直接使用docker-compose安裝,並帶有倆個elasticsearch組成的僞集羣:

version: '3.7'
networks:
  esnet:
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
    container_name: elasticsearch
    environment:
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - http.cors.enabled=true
      - http.cors.allow-origin=*
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - esnet
  elasticsearch2:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
    container_name: elasticsearch2
    environment:
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.ping.unicast.hosts=elasticsearch"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - esdata2:/usr/share/elasticsearch/data
    networks:
      - esnet
  kibana: 
    image: docker.elastic.co/kibana/kibana:6.5.1
    environment: 
      - SERVER_NAME=kibana
      - ELASTICSEARCH_URL=http://elasticsearch:9200
      - XPACK_MONITORING_ENABLED=true
    ports: 
      - 5601:5601
    networks: 
      - esnet
volumes:
  esdata1:
    driver: local
  esdata2:
    driver: local

index API

新增 index

PUT employee
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

可以在新增時對index進行定製化配置,詳細配置可參考官網的:index settings

刪除index

DELETE employee

插入document

如果不需要對index做定製化配置,可以通過以下API自動創建index並插入數據:

PUT /employee/_doc/1
{
  "name" : "zhangsan",
  "age" : 28,
  "signature":"I like watching movies",
  "hobby" : ["book","music"]
}

這裏使用PUT請求在es中新增了一個員工信息,es中的數據都是存儲在index中,從6.x版本開始,一個index下只能有一個type,並且推薦設置爲 “_doc”,因爲在7.x版本中棄用了type的概念,舊版API中type的位置在7.x中只能是“_doc”。

查詢API

單個查詢document

GET /employee/_doc/1

驗證單個document是否存在

HEAD employee/_doc/1

簡單搜索

這裏使用_search API默認做index下的全查詢:

GET employee/_doc/_search

還可以使用 q 參數添加查詢條件:

GET employee/_doc/_search?q=name:zhangsan

查詢表達式搜索

表達式全查詢

GET /employee/_doc/_search
{
  "query":{
    "match_all":{   }
  }
}

帶條件的表達式查詢

employee 索引中 name 爲 zhangsan的員工

GET /employee/_doc/_search
{
  "query":{
    "match":{
      "name" : "zhangsan"
    }
  }
}

帶條件和過濾器的查詢

employee 索引中 age 大於27且 name 是 zhangsan

GET /employee/_doc/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "name": "zhangsan"
        }
      },
      "filter": {
        "range": {
          "age": {
            "gt": 27
          }
        }
      }
    }
  }
}
  1. must 中爲必須匹配的條件
  2. filter 中爲過濾條件,range是一個範圍過濾器

全文檢索

查詢有 signature 中包含 dislike watching movies 的 employee

GET /employee/_doc/_search
{
  "query":{
    "match":{
      "signature":"dislike watching movies"
    }
  }
}

這裏貼一下結果中的 hits 部分

"hits" : {
    "total" : 3,
    "max_score" : 1.1064433,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1064433,
        "_source" : {
          "name" : "lisi",
          "age" : 27,
          "signature" : "I dislike watching movies,I like reading",
          "hobby" : [
            "movie",
            "music"
          ]
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.29748765,
        "_source" : {
          "name" : "zhangsan",
          "age" : 28,
          "signature" : "I like watching movies",
          "hobby" : [
            "book",
            "music"
          ]
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.27407023,
        "_source" : {
          "name" : "wangwu",
          "age" : 26,
          "signature" : "I also like watching movies",
          "hobby" : [
            "book",
            "game"
          ]
        }
      }
    ]
  }

三個 employee 的 signature 並沒有含有全部的 dislike watching movies,但是查詢出的員工的signature字段至少會含有其中一個單詞,並且各個員工的 _score 字段值不一樣,完全包含dislike watching movies三個單詞的lisi員工份數最高,且這三個 employee 的順序也是按照 _score 字段從高到低排列的。這個 _score 是文檔的相關性得分。

短語精確匹配

GET /employee/_doc/_search
{
  "query":{
    "match_phrase":{
      "signature":"dislike watching movies"
    }
  }
}

匹配條件與上一個一樣,只是將查詢API從 match 變爲 match_phrase,還是貼出查詢結果的 hist 部分:

"hits" : {
    "total" : 1,
    "max_score" : 1.1064433,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1064433,
        "_source" : {
          "name" : "lisi",
          "age" : 27,
          "signature" : "I dislike watching movies,I like reading",
          "hobby" : [
            "movie",
            "music"
          ]
        }
      }
    ]
  }

結果只有一條,這一條的signature字段必定包含 dislike watching movies 這個短語。

高亮搜索

api是 highlight ,注意它是與 query 同級的。

GET /employee/_doc/_search
{
  "query":{
    "match":{
      "signature":"dislike watching movies"
    }
  },
  "highlight":{
      "fields":{
        "signature":{}
      }
    }
}

下面是查詢結果:

"hits" : {
    "total" : 3,
    "max_score" : 1.1064433,
    "hits" : [
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1064433,
        "_source" : {
          "name" : "lisi",
          "age" : 27,
          "signature" : "I dislike watching movies,I like reading",
          "hobby" : [
            "movie",
            "music"
          ]
        },
        "highlight" : {
          "signature" : [
            "I <em>dislike</em> <em>watching</em> <em>movies</em>,I like reading"
          ]
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.29748765,
        "_source" : {
          "name" : "zhangsan",
          "age" : 28,
          "signature" : "I like watching movies",
          "hobby" : [
            "book",
            "music"
          ]
        },
        "highlight" : {
          "signature" : [
            "I like <em>watching</em> <em>movies</em>"
          ]
        }
      },
      {
        "_index" : "employee",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.27407023,
        "_source" : {
          "name" : "wangwu",
          "age" : 26,
          "signature" : "I also like watching movies",
          "hobby" : [
            "book",
            "game"
          ]
        },
        "highlight" : {
          "signature" : [
            "I also like <em>watching</em> <em>movies</em>"
          ]
        }
      }
    ]
  }

查詢的每一個結果中多了一個 highlight 字段,該字段中會將目標字段中符合查詢條件的單詞用 <em>標籤包上。

聚合分析

這裏做一個最受歡迎的 hobby:

GET /employee/_doc/_search
{
  "aggs":{
    "all_hobby":{
      "terms":{
        "field":"hobby"
      }
    }
  }
}
  1. aggs 表示聚合api開始
  2. all_hobby 爲此次聚合統計名稱,任意定義
  3. terms 爲聚合api中的分詞統計api,可以對指定字段分詞,並統計每個詞組在全文中的出現次數
  4. field 指定分析字段

但是這裏執行報錯了:

root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [hobby] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],

根據提示,文本類型字段的 fielddata 屬性默認爲關閉的,需要手動開啓:

PUT /employee/_mapping/_doc
{
  "properties":{
    "hobby":{
      "type":"text",
      "fielddata":"true"
    }
  }
}

這裏將hobby的fielddata設值爲true後,ES會對hobby生成一個反向的倒排索引,類似於數據庫中的索引,爲了做分析、統計等功能。但是額外的索引會佔用內存,建議不要在數據量較多的字段設置。也可以使用keyword字段來做分析、統計,像這樣:

GET /employee/_doc/_search
{
  "aggs":{
    "all_hobby":{
      "terms":{
        "field":"hobby.keyword"
      }
    }
  }
}

再次聚合統計,結果的 buckets 如下:

"buckets" : [
        {
          "key" : "book",
          "doc_count" : 2
        },
        {
          "key" : "music",
          "doc_count" : 2
        },
        {
          "key" : "game",
          "doc_count" : 1
        },
        {
          "key" : "movie",
          "doc_count" : 1
        }
      ]

可以看到最受歡迎的hobby是 book 。
也可以和 query 一起使用,它會在查詢的結果中進行聚合統計,像這樣:

GET /employee/_doc/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gt": 26
          }
        }
      }
    }
  },
  "aggs": {
    "all_hobby": {
      "terms": {
        "field": "hobby.keyword"
      }
    }
  }
}

多層聚合

比如統計不同的 hobby 包含員工的平均年齡:

GET /employee/_doc/_search
{
  "aggs":{
    "all_hobby":{
      "terms":{
        "field":"hobby.keyword"
      },
      "aggs":{
        "avg_age":{
          "avg":{
            "field":"age"
          }
        }
      }
    }
  }
}

結果如下:

"buckets" : [
        {
          "key" : "book",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 27.0
          }
        },
        {
          "key" : "music",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 27.5
          }
        },
        {
          "key" : "game",
          "doc_count" : 1,
          "avg_age" : {
            "value" : 26.0
          }
        },
        {
          "key" : "movie",
          "doc_count" : 1,
          "avg_age" : {
            "value" : 27.0
          }
        }
      ]

這裏結果看起來比較複雜,拿出第一個:

 {
          "key" : "book",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 27.0
          }
        }
  1. key 分析的詞組,從字段中分詞獲取
  2. doc_count 是分詞統計結果,也就是key中的詞組在全文中有多少個員工包含它
  3. avg_age 這個是內層聚合分析名稱,在查詢時自定義的
  4. value 是平均年齡,這個平均年齡是針對於上層的統計結果而言的,在這裏就是對 hobby 含有 book 詞組的倆個員工計算他們的平均年齡。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章