第二章 Elasticsearch入門

添加索引

添加的方法有很多,簡單的來說就是將JSON格式的數據放到Elasticsearch 索引中。
以下請求都在shell中,貌似不太專業,你可以直接在Kibana操作。
每個請求後面綴上?pretty?pretty=true,會以JSON的格式展示在shell中,例如:

curl -X GET "localhost:9200/bank/_mapping?pretty" -H 'Content-Type: application/json'

添加一個

直接使用PUT請求,將指定索引添加到文檔的索引。如果該請求customer尚不存在,此請求將自動創建該索引,添加ID爲1

添加命令

curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d' { "name": "John Doe" } '

返回值

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

通過id查看文檔

curl -X GET "localhost:9200/customer/_doc/1?pretty"

返回值

{
  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 26,
  "_primary_term" : 4,
  "found" : true,
  "_source" : {
    "name": "John Doe"
  }
}

批量索引文檔

如果一次性添加一個以上的索引,則可以使用批量添加API,可以優化運行速度。這取決於,文檔的大小和複雜性、索引編制和搜索負載以及集羣的可用資源
Elasticsearch 官方提供的示例數據
運行命令

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

返回結果,後面會詳述其含義

health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank  l7sSYV2cQXmu6_4rJWVIww   5   1       1000            0    128.6kb        128.6kb

查看索引結構

命令

curl  'http://localhost:9200/bank/_mapping'

返回結果

{
    "bank": {
        "mappings": {
            "properties": {
                "account_number": {
                    "type": "long"
                },
                "address": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "age": {
                    "type": "long"
                },
                "balance": {
                    "type": "long"
                },
                "city": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "email": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "employer": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "firstname": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "gender": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "lastname": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "state": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }
            }
        }
    }
}

簡單搜索

每個搜索都是獨立的,Elasticsearch 在請求中不維護任何狀態信息

命令

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ]
}

根據from和size進行分頁查詢

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number": "asc" }
  ],
  "from": 10,
  "size": 10
}
'

返回結果

{
  "took" : 160,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "0",
        "_score" : null,
        "_source" : {
          "account_number" : 0,
          "balance" : 16623,
          "firstname" : "Bradshaw",
          "lastname" : "Mckenzie",
          "age" : 29,
          "gender" : "F",
          "address" : "244 Columbus Place",
          "employer" : "Euron",
          "email" : "[email protected]",
          "city" : "Hobucken",
          "state" : "CO"
        },
        "sort" : [
          0
        ]
      },
      # 省略其餘結果
    ]
  }
}

結果說明

  1. took Elasticsearch 運行查詢花費時間,以毫秒爲單位
  2. time_out 請求是否超時(在請求的時候可以設定)
  3. _shards 搜索了多少分片,以及成功、失敗或是跳過了多少(這裏涉及到分佈式部署)
  4. max_source 找到的最相關文檔的分數
  5. hits.total.value 找到了多少匹配的文檔
  6. hits.sort 文檔的排序位置(不按照默認的相關性得分排序,在搜索時指定排序字段)
  7. hits._source 文檔的相關性得分

match查詢(字段的類型,會有差異,後續文章中會詳述)

匹配字段包含某個單詞。address字段包含mail或lane

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "address": "mill lane" } }
}'

match_phrase查詢

詞組搜索,某字段包含某個短語。address字段包含mill lane這個詞組

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": { "match_phrase": { "address": "mill lane" } }
}
'

利用bool查詢組合多個查詢條件

命令解釋:age = 40 && state != "ID"

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }
}
'

必須指定某個查詢條件的匹配程度(類似於與或非)

  1. must:結果必須滿足查詢條件。會影響相關性得分
  2. should:結果應該滿足查詢條件,但可以不滿足。會影響相關性得分
  3. mast_not:結果必須不滿足查詢條件。不會影響相關性得分

搜索分析

  1. group_by_state:利用termsbank中的所有賬戶索引根據字段state分組
  2. 如果指定order,則按照其指定的參數排序否則按照分組的大小倒序排列返回,一次返回10個
  3. average_balance:按照字段balance,以計算每個state的平均帳戶餘額
  4. 除了基本的聚合功能外,Elasticsearch 還提供了專門的聚合方法,用於在多個字段上操作並分析特定類型的數據,例如:ip、日期、地理數據。

請求

curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
'

返回結果

{
  "took" : 255,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 827,
      "buckets" : [
        {
          "key" : "CO",
          "doc_count" : 14,
          "average_balance" : {
            "value" : 32460.35714285714
          }
        },
        }
      ]
    }
  }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章