ElasticSearch 入門

 

ElasticSearch學習筆記

1、ElasticSearch安裝

替換 ik分詞器 :版本要對應,如果不對應,會報錯..

需要Java JDK 配置。

 

2、ElasticSearch簡單的CRUD

1> 創建索引------>> 類型------>>文檔

給字段確定類型

PUT /schools/_mapping/school

{

    "properties":{

        "TimeFormat":{

            "type":"date",

            "format":"yyyy-MM-dd HH:mm:ss"

        }

    }

}

 

創建index student typearticle 的 字段subject 類型爲text 使用ik_max_word 分詞器的文檔。

PUT /student/?pretty

{

        "settings" : {

        "analysis" : {

            "analyzer" : {

                "ik" : {

                    "tokenizer" : "ik_max_word"

                }

            }

        }

    },

    "mappings" : {

        "article" : {

            "dynamic" : true,

            "properties" : {

                "subject" : {

                    "type" : "text",

                    "analyzer" : "ik_max_word"

                }

            }

        }

    }

}

如果不手動指定,分詞器就不會默認使用ik .且以上只能針對文檔中的字段指定

以下針對index 進行指定使用ik分詞器

PUT /students

{

    "settings" : {

        "index" : {

            "analysis.analyzer.default.type": "ik_max_word"

        }

    }

}

A .  單條插入

PUT http://localhost:9200/movies/movie/3

{

    "title": "To Kill a Mockingbird",

    "director": "Robert Mulligan",

    "year": 1962

}

PUT  url/index/type/id

{

字段:

字段:

字段:

....

 

}

使用以上格式創建索引、類型、文檔

 

{ "_index": "movies", "_type": "movie", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true }

Version,爲1,result 爲:created

 

B. 批量插入

POST /schools/_bulk

{"index":{"_index":"schools","_type":"school","_id":"1"}}

{"name":"Central School","description":"CBSE Affiliation","street":"Nagan","city":"paprola","state":"HP","zip":"176115","location":[31.8955385,76.8380405],"fees":2000,"tags":["Senior Secondary","beautiful campus"],"rating":"3.5"}

{"index":{"_index":"schools","_type":"school","_id":"2"}}

{"name":"Saint Paul School","description":"ICSE Afiliation","street":"Dawarka","city":"Delhi","state":"Delhi","zip":"110075","location":[28.5733056,77.0122136],"fees":5000,"tags":["Good Faculty","Great Sports"],"rating":"4.5"}

{"index":{"_index":"schools","_type":"school","_id":"3"}}

{"name":"Crescent School","description":"State Board Affiliation","street":"Tonk Road","city":"Jaipur","state":"RJ","zip":"176114","location":[26.8535922,75.7923988],"fees":2500,"tags":["Well equipped labs"],"rating":"4.5"}

使用_bulk 進行批量的插入數據。

 

 

2> 修改文檔

現在,在索引中有了一部電影信息,接下來來了解如何更新它,添加一個類型列表。要做到這一點,只需使用相同的ID索引它。使用與之前完全相同的索引請求,但類型擴展了JSON對象

PUT  http://localhost:9200/movies/movie/3

{

    "title": "To Kill a Mockingbird",

    "director": "Robert Mulligan",

    "year": 1962,

    "genres": ["Crime", "Drama", "Mystery"]

}

 

響應如下:

{ "_index": "movies", "_type": "movie", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false }

Version,變爲了2,result 爲:updated

 

修改文檔的單個字段 script   inline)

 

POST schools/school/_update_by_query

{

  "script": {

    "inline": "ctx._source.TimeFormat ='2016-09-08 15:20:30';ctx._source.zip='1766889'"

  },

  "query":{

      "term":{

          "city":"delhi"

      }

  }

  

}

 

 

3> 刪除文檔

爲了通過ID從索引中刪除單個指定的文檔,使用與獲取索引文檔相同的URL,只是這裏將HTTP方法更改爲DELETE

 

DELETE http://localhost:9200/movies/movie/3

 

返回響應:

{

   "_index": "movies",

   "_type": "movie",

   "_id": "1",

   "_version": 2,

   "result": "deleted",

   "_shards": {

      "total": 2,

      "successful": 1,

      "failed": 0

   },

   "_seq_no": 5,

   "_primary_term": 1

}

 

4> 查詢文檔

爲了通過ID從索引中查詢單個指定的文檔,使用與獲取索引文檔相同的URL,只是這裏將HTTP方法更改爲GET

 

GET http://localhost:9200/movies/movie/3

 

條件搜索

 

常用查詢:

全文本查詢:針對文本

1、查詢全部:match_all

2、模糊匹配: match (類似sql 的 like)

3、全句匹配: match_phrase (類似sql 的 = )

4、多字段匹配:multi_match (多屬性查詢)

5、語法查詢:query_string (直接寫需要配置的 關鍵字 )

6、字段查詢 : term (針對某個屬性的查詢,這裏注意 term 不會進行分詞,比如 在 es 中 存了 “火鍋” 會被分成 “火/鍋” 當你用 term 去查詢 “火時能查到”,但是查詢 “火鍋” 時,就什麼都沒有,而 match 就會將詞語分成 “火/鍋”去查)

7、範圍查詢:range ()

字段查詢:針對結構化數據,如數字,日期 。。。

 

分頁:

“from”: 10,

“size”: 10

 

constant_score: 固定分數。

 

filter: 查詢: (query 屬於類似就可以查出來,而 filter 類似 = 符號,要麼成功,要麼失敗,沒有中間值,查詢速度比較快

 

1、查詢全部:match_all

POST _search

{

   "query": {

      "match_all": {}

   }

}

 

2、模糊匹配: match (類似sql 的 like)

POST /schools/school/_search

{

    "query": {

        "match": {

            "name":"Saint Paul School"

          

        }

    }

}

 

使用 match 進行搜索時:搜索內容通過分詞器進行分詞後,與文本分詞後的結果進行匹配,如上例:搜索 /schools/school/ 中的name 字段中 Saint Paul School 進過分詞的所有匹配項 ,只要name中有分詞其中之一就會被匹配。

 

3、全句匹配: match_phrase (類似sql 的 = )

 

POST /schools/school/_search

{

    "query": {

        "match_phrase": {

            "name":"Saint Paul School"

          

        }

    }

}

 

使用 match_phrase進行搜索時:搜索內容通過分詞器進行分詞後,與文本分詞後的結果進行連續,精確的匹配,如上例:搜索 /schools/school/ 中的name 字段中 Saint Paul School 進過分詞的所有匹配項 ,只有name中同時有Saint Paul School 三個連續的分詞纔會被匹配。相當於是對 sql中 =的用法,但可以忽略 空格。

 

4、多字段匹配:multi_match (多屬性查詢)

 

POST /schools/school/_search

{

    "query": {

        "multi_match": {

            "query":"Saint Paul School",

            "fields": [

               "name","tags"

            ]

          

        }

    }

}

 

multi_match 可以對多字段進行模糊搜索, query 中的搜索字段會被分詞,並各自匹配,fields 字段用來確定搜索的字段。

 

5、語法查詢:query_string (直接寫需要配置的 關鍵字 )

 

POST /schools/school/_search

{

    "query": {

        "query_string": {

            "query":"Saint Paul School",

            "fields": [

               "name","tags"

            ]

          

        }

    }

}

 

query_string 可以對多字段進行模糊搜索, query 中的搜索字段會被分詞,並各自匹配,fields 字段用來確定搜索的字段。

6、字段查詢 term

POST /schools/school/_search

{

    "query": {

        "term": {

            "name":"Saint Paul School"

          

        }

    }

}

 

Term 搜索時,需要沒有空格,不會進行分詞,還需要條件全小寫。要不然查不出來....

 

7、範圍查詢:range ()

 

POST /schools/school/_search

{

    "query": {

        "range": {

           "fees": {

              "from": 1000,

              "to": 2500

           }

        }

       

    }

}

 

組合查詢不好使,大概需要 bool 查詢....

 

8、bool 查詢

 

POST /schools/school/_search

{

    "query": {

        "bool": {

            "must": [

               {

                   "range": {

                      "fees": {

                         "from": 1000,

                         "to": 3000

                      }

                   }

               },

               {

                   "match": {

                      "name": "School"

                   }

               },

               {

                   "wildcard": {

                      "zip": {

                         "value": "17*15"

                      }

                   }

               }

               

            ],

            "boost": 1,

            "must_not": [

               {

                      "term": {

                         "name": {

                            "value": "to"

                         }

                      }

               }

            ]

  "should": [

            {

               "match": {

                  "city": "paprola"

               }

            }

         ]

        }

 

       

    }

}

9、高亮設置

 

POST /schools/school/_search

{

    "query": {

        "match": {

           "name": "Saint school"

        }

    },

    "highlight": {

        "fields": {

            "name":{}

        }

    }

}

 

10、分頁 from 當前行數,從0開始(是行數,不是頁碼!!)  size 展示條數(下圖,第二行開始,查一條數據)

POST /schools/school/_search

{

    "query": {

        "match": {

           "name": "Saint school"

        }

    },

    "highlight": {

        "fields": {

            "name":{}

        }

    }

    , "from": 1

    , "size": 1

}

11、過濾查詢 ,查詢多個filtersort 以數組的形式查詢。

 

POST /schools/school/_search

{

    "query": {

        "bool": {

            "must": [

               {

                   "match": {

                      "name": "school"

                   }

               }

            ],

            "filter":[{

                "exists": {

                   "field": "name"

                }

                

            },

        {

                "range": {

                   "fees": {

                      "from": 10,

                      "to": 2000

                   }

                }

                

            }

            ]

        }

         

 

    }

    , "from": 1

    , "size": 10

    , "sort": [

       {

          "fees": {

             "order": "desc"

          }

       }

    ]

}

11.1、 id過濾器

11.2、 range 過濾器

11.3、exists 過濾器

11.4、term/terms 過濾器

 

POST /schools/school/_search

{

    "query": {

        "bool": {

            "must": [

               {

                   "match": {

                      "name": "school"

                   }

               }

            ],

            "filter":[{

                "exists": {

                   "field": "name"

                }

                

            },

        {

                "range": {

                   "fees": {

                      "from": 10,

                      "to": 5000

                   }

                }

                

            },

                    {

                "ids":{

                    "values":[1,2,3]

                }

                

            },{

                "term":{

                    "street":"tonk"

                }

            }

            ]

        }

         

 

    }

    , "from": 0

    , "size": 10

    , "sort": [

       {

          "fees": {

             "order": "desc"

          }

       }

    ]

}

 

11、聚合(Aggregations)

聚合提供了功能可以分組並統計你的數據。理解聚合最簡單的方式就是可以把它粗略的看做SQLGROUP BY 操作和SQL 的聚合函數。

ES中常用的聚合:

Metric(度量聚合) :度量聚合主要針對number類型的數據,需要ES做比較多的計算工作

Bucketing (桶聚合):劃分不同的“桶”,將數據分配到不同的“桶”裏。非常類似sql中的group By 語句的含義。

 

ES中的聚合API(格式)

"aggregations" : {          // 表示聚合操作,可以使用aggs替代

  "<aggregation_name>" : {  // 聚合名,可以是任意的字符串。用做響應的key,便於快速取得正確的響應數據。

    "<aggregation_type>" : {   // 聚合類別,就是各種類型的聚合,如min

      <aggregation_body>    // 聚合體,不同的聚合有不同的body

   }

   [,"aggregations" : { [<sub_aggregation>]+ } ]? // 嵌套的子聚合,可以有0或多個

 }

 [,"<aggregation_name_2>" : { ... } ]* // 另外的聚合,可以有0或多個

}

1. 度量(metric)聚合

A、avg平均值聚合 (min) 最小值聚合、(max)最大值聚合、(sum)相加和聚合 、(stats)以上4種打包聚合

query": {

      "match": {

         "name": "Saint school"

      }

   },

   "highlight": {

      "fields": {

         "name": {}

      }

   },

   "aggregations":

      {

         "fees_avg": {

            "avg": {

               "field": "fees"

            }

         },         "fees_min": {

            "min": {

               "field": "fees"

            }

         },         "fees_max": {

            "max": {

               "field": "fees"

            }

         },         "fees_sum": {

            "sum": {

               "field": "fees"

            }

         },        "fees_stats": {

            "stats": {

               "field": "fees"

            }

         }

      }

   ,

   "from": 0,

   "size": 10

}

 

2. 桶(bucketing)聚合

自定義區間範圍的聚合rangeto不包含自身

POST /schools/school/_search

{

   "query": {

      "match": {

         "name": "Saint school"

      }

   },

   "highlight": {

      "fields": {

         "name": {}

      }

   },

   "aggregations": {

      "fees_range": {

         "range": {

            "field": "fees",

            "ranges": [

               {

                  "from": 0,

                  "to": 2000

               },

               {

                  "from": 2000,

                  "to": 3000

               },

               {

                  "from": 3000,

                  "to": 5001

               }

            ]

         }

      }

   },

   "from": 0,

   "size": 10

}

 

自定義分組依據Term(不能選擇text類型的field)

POST /schools/school/_search

{

   "query": {

      "match": {

         "name": "Saint school"

      }

   },

   "highlight": {

      "fields": {

         "name": {}

      }

   },

   "aggregations": {

      "fees_term": {

         "terms": {

            "field": "location",

            "size":3

            

         }

      }

   },

   "from": 0,

   "size": 10

}

 

時間區間聚合(Date Range Aggregation)

# 時間區間聚合專門針對date類型的字段,它與Range Aggregation的主要區別是其可以使用時間運算表達式。

#now+10y:表示從現在開始的第10年。

#now+10M:表示從現在開始的第10個月。

#1990-01-10||+20y:表示從1990-01-01開始後的第20年,即2010-01-01

#now/y:表示在年位上做舍入運算。

POST /schools/school/_search

{

   "query": {

      "match": {

         "name": "Saint school"

      }

   },

   "highlight": {

      "fields": {

         "name": {}

      }

   },

   "aggregations": {

      "fees_term": {

         "terms": {

            "field": "location",

            "size":3

            

         }

      },

      "time_aggs":{

          "date_range":{

              "field":"TimeFormat",

              "format":"yyyy-MM-dd",

              "ranges":[

                  {

                  "from":"now/y",

                  "to":"now"

                  },

                                    {

                  "from":"now/y-1y",

                  "to":"now/y"

                  },

                                    {

                  "from":"now/y-3y",

                  "to":"now/y-1y"

                  }

                  

                  ]

          }

      }

   },

   "from": 0,

   "size": 10

}

直方圖聚合(Histogram Aggregation)

# Histogram Aggregation

#直方圖聚合,它將某個number類型字段等分成n份,統計落在每一個區間內的記錄數。它與前面介紹的Range聚合

# 非常像,只不過Range可以任意劃分區間,而Histogram做等間距劃分。既然是等間距劃分,那麼參數裏面必然有距離參數,就是interval參數。

 

POST /schools/school/_search

{

   "query": {

      "match": {

         "name": "Saint school"

      }

   },

   "highlight": {

      "fields": {

         "name": {}

      }

   },

   "aggregations": {

      "fees_aggs":{

          "histogram":{

              "field":"fees",

              "interval":1000

             

          }

      },      "time_agg":{

          "date_histogram":{

              "field":"TimeFormat",

              "interval":"year",

              "format":"yyyy-MM_dd"

             

          }

      }

   },

   "from": 0,

   "size": 10

}

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章