第二章 Elasticsearch入门

添加索引

添加的方法有很多,简单的来说就是将JSON格式的数据放到Elasticsearch 索引中。
以下请求都在shell中,貌似不太专业,你可以直接在Kibana操作。
每个请求后面缀上?pretty?pretty=true,会以JSON的格式展示在shell中,例如:

curl -X GET "localhost:9200/bank/_mapping?pretty" -H 'Content-Type: application/json'

添加一个

直接使用PUT请求,将指定索引添加到文档的索引。如果该请求customer尚不存在,此请求将自动创建该索引,添加ID为1

添加命令

curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d' { "name": "John Doe" } '

返回值

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

通过id查看文档

curl -X GET "localhost:9200/customer/_doc/1?pretty"

返回值

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 26,
  "_primary_term" : 4,
  "found" : true,
  "_source" : {
    "name": "John Doe"
  }
}

批量索引文档

如果一次性添加一个以上的索引,则可以使用批量添加API,可以优化运行速度。这取决于,文档的大小和复杂性、索引编制和搜索负载以及集群的可用资源
Elasticsearch 官方提供的示例数据
运行命令

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

返回结果,后面会详述其含义

health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank  l7sSYV2cQXmu6_4rJWVIww   5   1       1000            0    128.6kb        128.6kb

查看索引结构

命令

curl  'http://localhost:9200/bank/_mapping'

返回结果

{
    "bank": {
        "mappings": {
            "properties": {
                "account_number": {
                    "type": "long"
                },
                "address": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "age": {
                    "type": "long"
                },
                "balance": {
                    "type": "long"
                },
                "city": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "email": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "employer": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "firstname": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "gender": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "lastname": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "state": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }
            }
        }
    }
}

简单搜索

每个搜索都是独立的,Elasticsearch 在请求中不维护任何状态信息

命令

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

根据from和size进行分页查询

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}
'

返回结果

{
  "took" : 160,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "0",
        "_score" : null,
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "[email protected]",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [
          0
        ]
      },
      # 省略其余结果
    ]
  }
}

结果说明

  1. took Elasticsearch 运行查询花费时间,以毫秒为单位
  2. time_out 请求是否超时(在请求的时候可以设定)
  3. _shards 搜索了多少分片,以及成功、失败或是跳过了多少(这里涉及到分布式部署)
  4. max_source 找到的最相关文档的分数
  5. hits.total.value 找到了多少匹配的文档
  6. hits.sort 文档的排序位置(不按照默认的相关性得分排序,在搜索时指定排序字段)
  7. hits._source 文档的相关性得分

match查询(字段的类型,会有差异,后续文章中会详述)

匹配字段包含某个单词。address字段包含mail或lane

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "address": "mill lane" } }
}'

match_phrase查询

词组搜索,某字段包含某个短语。address字段包含mill lane这个词组

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
'

利用bool查询组合多个查询条件

命令解释:age = 40 && state != "ID"

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
'

必须指定某个查询条件的匹配程度(类似于与或非)

  1. must:结果必须满足查询条件。会影响相关性得分
  2. should:结果应该满足查询条件,但可以不满足。会影响相关性得分
  3. mast_not:结果必须不满足查询条件。不会影响相关性得分

搜索分析

  1. group_by_state:利用termsbank中的所有账户索引根据字段state分组
  2. 如果指定order,则按照其指定的参数排序否则按照分组的大小倒序排列返回,一次返回10个
  3. average_balance:按照字段balance,以计算每个state的平均帐户余额
  4. 除了基本的聚合功能外,Elasticsearch 还提供了专门的聚合方法,用于在多个字段上操作并分析特定类型的数据,例如:ip、日期、地理数据。

请求

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

返回结果

{
  "took" : 255,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 827,
      "buckets" : [
        {
          "key" : "CO",
          "doc_count" : 14,
          "average_balance" : {
            "value" : 32460.35714285714
          }
        },
        }
      ]
    }
  }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章