ES 命令和相關介紹

 MVC 配置(非Boot)

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"
    xmlns:elasticsearch="http://www.springframework.org/schema/data/elasticsearch"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
        http://www.springframework.org/schema/data/elasticsearch http://www.springframework.org/schema/data/elasticsearch/spring-elasticsearch-1.0.xsd">
    <!-- 配置es包掃描 -->
    <elasticsearch:repositories base-package="com.huangliwei.elasticsearch.dao" />

    <!-- 配置service包掃描 -->
    <context:component-scan base-package="com.huangliwei.elasticsearch.service" />

    <!-- 配置elasticsearch連接 -->
    <elasticsearch:transport-client id="client"
        cluster-nodes="127.0.0.1:9300" />

    <!-- springdata整合elasticsearch提供template -->
    <bean id="elasticsearchTemplate"
        class="org.springframework.data.elasticsearch.core.ElasticsearchTemplate">
        <constructor-arg name="client" ref="client"></constructor-arg>
    </bean>
</beans>

2018年Q2, Elasticsearch 更新到6.2版本, 6.3版本還未正式發佈,如果準備在生產環境使用,比較推薦使用較老的5.6.x版本或2.x版本,一方面比較穩定、另外資料也比較多

如果使用Java技術棧,你很可能會使用Spring Boot全家桶,當前Spring Boot更新到2.x版本, 默認spring-boot-starter-data-elasticsearch 默認的ES版本爲5.6.9;如果你仍然使用Spring Boot 1.x版本,那麼默認的Elastisearch版本爲2.x

客戶端

Java技術棧目前有三種可以選擇 Node Client, Transport Client, Rest API, 需要註明的是,官方已經標明NodeClient 已經過期,Transport Client 將在7.x版本開始不再支持, 最終會在7.x 統一到Rest API。目前Transport Client使用範圍比較廣;Rest API方式兼容性較好;除非在In-memory模式下運行單元測試,否則不推薦NodeClient

 

1.2ElasticSearch的基本概念


Index
  類似於mysql數據庫中的database


Type
  類似於mysql數據庫中的table表,es中可以在Index中建立type(table),通過mapping進行映射。


Document
  由於es存儲的數據是文檔型的,一條數據對應一篇文檔即相當於mysql數據庫中的一行數據row,一個文檔中可以有多個字段也就是mysql數據庫一行可以有多列。


Field
  es中一個文檔中對應的多個列與mysql數據庫中每一列對應


Mapping
  可以理解爲mysql或者solr中對應的schema,只不過有些時候es中的mapping增加了動態識別功能,感覺很強大的樣子,其實實際生產環境上不建議使用,最好還是開始制定好了對應的schema爲主。


indexed
  就是名義上的建立索引。mysql中一般會對經常使用的列增加相應的索引用於提高查詢速度,而在es中默認都是會加上索引的,除非你特殊制定不建立索引只是進行存儲用於展示,這個需要看你具體的需求和業務進行設定了。


Query DSL
  類似於mysql的sql語句,只不過在es中是使用的json格式的查詢語句,專業術語就叫:QueryDSL

GET/PUT/POST/DELETE

分別類似與mysql中的select/update/delete…

 

ElasticSearch Head (安裝插件)

ElasticSearch Head

 

使用ElasticSearch API 實現CRUD(ES基本命令操作)

分詞查看:

#添加索引
#帶自定義配置的添加:
PUT /lib/
{
  "settings":{
      "index":{
        "number_of_shards": 5,   # 默認5個,不能修改
        "number_of_replicas": 1   # 隨時修改,默認1個
        }
      }
}
# 使用默認配置的添加: lib前後加 / 和不加沒區別,後面加了/ 反而在kibana中沒提示
PUT lib


# 查看索引命令
GET /lib/_settings   # 查詢單個
GET _all/_settings   # 查詢所有

 

# 添加文檔:
# 指定id爲1,如果不指定,需要用post提交,elasticsearch自動生成
PUT /lib/user/1 
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}


 # 這裏沒有id,自動生成,使用POST
POST /lib/user/   
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         23,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}


# _source元數據分析其實就是我們在添加文檔時request body中的內容指定返回的結果中含有哪些字段:
GET /lib/user/1
GET /lib/user/
GET /lib/user/1?_source=age,interests   # _source:指定查詢id爲1的age和interests兩個字段

 

# 覆蓋更新,相當於重新插入,這裏沒覆蓋到的字段,更新後就沒了,可能會引起丟失字段
PUT /lib/user/1    
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         36,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}

# 正確更新
POST /lib/user/1/_update  
{
  "doc":{
      "age":33
      }
}


# 刪除一個文檔
DELETE /lib/user/1

# 刪除一個索引
DELETE /lib


 

使用es提供的Multi Get API:

使用Multi Get API可以通過索引名、類型名、文檔id一次得到一個文檔集合,文檔可以來自同一個索引庫,也可以來自不同索引庫

#  可以指定具體的字段  也可以不指定
GET /_mget
{
    "docs":[
       {
           "_index": "lib",
           "_type": "user",
           "_id": 1,
           "_source": "interests"   # 可以指定具體的字段
       },
       {
           "_index": "lib",
           "_type": "user",
           "_id": 2,
           "_source": ["age","interests"]  # 可以指定具體的字段
       }
     ]
}

# 獲取同索引同類型下的不同文檔:

GET /lib/user/_mget
{
    "docs":[
       {
           "_id": 1
       },
       {
           "_type": "user",
           "_id": 2
       }
     ]
}
GET /lib/user/_mget
{
   "ids": ["1","2"]
}

 

使用Bulk API 實現批量操作

bulk的格式:(_mget只能獲取,bulk能增刪改查)

 # create:文檔不存在時創建
 # update:更新文檔
 # index:創建新文檔或替換已有文檔
 # delete:刪除一個文檔


POST lib/user/_bulk
{"delete":{"_index":"lib","_type":"user","_id":"1"}}


# 批量添加

POST /lib2/books/_bulk
{"index":{"_id":1}}       # 插入_id爲1,內容爲java、55的
{"title":"Java","price":55}
{"index":{"_id":2}}       # 插入_id爲2...
{"title":"Html5","price":45}
{"index":{"_id":3}}
{"title":"Php","price":35}
{"index":{"_id":4}}
{"title":"Python","price":50}


# 批量獲取

GET /lib2/books/_mget
{
"ids": ["1","2","3","4"]
}



# 刪除:沒有請求體

POST /lib2/books/_bulk
{"delete":{"_index":"lib2","_type":"books","_id":4}}
{"create":{"_index":"tt","_type":"ttt","_id":"100"}}
{"name":"lisi"}
{"index":{"_index":"tt","_type":"ttt"}}
{"name":"zhaosi"}
{"update":{"_index":"lib2","_type":"books","_id":"4"}}
{"doc":{"price":58}}


bulk一次最大處理多少數據量:

bulk會把將要處理的數據載入內存中,所以數據量是有限制的,最佳的數據量不是一個確定的數值,
    它取決於你的硬件,你的文檔大小以及複雜性,你的索引以及搜索的負載。
  一般建議是1000-5000個文檔,大小建議是5-15MB,默認不能超過100M,
    可以在es的配置文件(即$ES_HOME下的config下的elasticsearch.yml)中。


 支持的數據類型:

(1)核心數據類型(Core datatypes)

字符型:string,string 類型包括text 和 keyword

最主要區別:text會被分詞,keyword不會被分詞
text類型被用來索引長文本,在建立索引前會將這些文本進行分詞,轉化爲詞的組合,建立索引。
允許es來檢索這些詞語。text類型不能用來排序和聚合。Keyword類型不需要進行分詞,
可以被用來檢索過濾、排序和聚合。keyword 類型字段只能用本身來進行檢索


數字型:long, integer, short, byte, double, float
日期型:date
布爾型:boolean
二進制型:binary

 (2)複雜數據類型(Complex datatypes)

數組類型(Array datatype):數組類型不需要專門指定數組元素的type,例如:
字符型數組: [ “one”, “two” ]
整型數組:[ 1, 2 ]
數組型數組:[ 1, [ 2, 3 ]] 等價於[ 1, 2, 3 ]
對象數組:[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]
對象類型(Object datatype):_ object _ 用於單個JSON對象;
嵌套類型(Nested datatype):_ nested _ 用於JSON數組;

 (3)地理位置類型(Geo datatypes)

地理座標類型(Geo-point datatype):_ geo_point _ 用於經緯度座標;
地理形狀類型(Geo-Shape datatype):_ geo_shape _ 用於類似於多邊形的複雜形狀;

(4)特定類型(Specialised datatypes)

IPv4 類型(IPv4 datatype):_ ip _ 用於IPv4 地址;
Completion 類型(Completion datatype):_ completion 提供自動補全建議;
Token count 類型(Token count datatype): token_count _ 用於統計做了標記的字段的index數目,該值會一直增加,不會因爲過濾條件而減少。
mapper-murmur3
類型:通過插件,可以通過 _ murmur3 _ 來計算 index 的 hash 值;
附加類型(Attachment datatype):採用 mapper-attachments
插件,可支持_ attachments _ 索引,例如 Microsoft Office 格式,Open Document 格式,ePub, HTML 等。

支持的屬性:

"store":false  //是否單獨設置此字段的是否存儲而從_source字段中分離,默認是false,只能搜索,不能獲取值

"index": true  // 分詞,不分詞是:false,設置成false,字段將不會被索引,默認每個都會創建索引

"analyzer":"ik"  // 指定分詞器,默認分詞器爲standard analyzer

"boost":1.23  // 字段級別的分數加權,默認值是1.0

"doc_values":false  // 對not_analyzed字段,默認都是開啓,分詞字段不能使用,對排序和聚合能提升較大性能,節約內存

"fielddata":{"format":"disabled"}  // 針對分詞字段,參與排序或聚合時能提高性能,不分詞字段統一建議使用doc_value

"fields":{"raw":{"type":"string","index":"not_analyzed"}} //可以對一個字段提供多種索引模式,同一個字段的值,一個分詞,一個不分詞
       
"ignore_above":100 //超過100個字符的文本,將會被忽略,不被索引

"include_in_all":ture //設置是否此字段包含在_all字段中,默認是true,除非index設置成no選項

"index_options":"docs"//4個可選參數docs(索引文檔號) ,freqs(文檔號+詞頻),positions(文檔號+詞頻+位置,通常用來距離查詢),offsets(文檔號+詞頻+位置+偏移量,通常被使用在高亮字段)分詞字段默認是position,其他的默認是docs

"norms":{"enable":true,"loading":"lazy"}//分詞字段默認配置,不分詞字段:默認{"enable":false},存儲長度因子和索引時boost,建議對需要參與評分字段使用 ,會額外增加內存消耗量

"null_value":"NULL"//設置一些缺失字段的初始化值,只有string可以使用,分詞字段的null值也會被分詞

"position_increament_gap":0//影響距離查詢或近似查詢,可以設置在多值字段的數據上火分詞字段上,查詢時可指定slop間隔,默認值是100

"search_analyzer":"ik"//設置搜索時的分詞器,默認跟ananlyzer是一致的,比如index時用standard+ngram,搜索時用standard用來完成自動提示功能

"similarity":"BM25"//默認是TF/IDF算法,指定一個字段評分策略,僅僅對字符串型和分詞類型有效

"term_vector":"no"//默認不存儲向量信息,支持參數yes(term存儲),with_positions(term+位置),with_offsets(term+偏移量),with_positions_offsets(term+位置+偏移量) 對快速高亮fast vector highlighter能提升性能,但開啓又會加大索引體積,不適合大數據量用

 映射的分類:


# dynamic設置可以適用在根對象上或者object類型的任意字段上,給索引lib2創建映射類型

# dynamic 的值爲下列3個:

# true:默認值。動態添加字段
# false:忽略新字段
# strict:如果碰到陌生字段,拋出異常

POST /lib2
{
    "settings":{
    "number_of_shards" : 3,
    "number_of_replicas" : 0
    },
     "mappings":{
      "books":{
        "properties":{
            "title":{"type":"text"},
            "name":{"type":"text","index":false},
            "publish_date":{"type":"date","index":false},
            "price":{"type":"double"},
            "number":{"type":"integer"}
        }
      }
     }
}




POST /lib2
{
    "settings":{
    "number_of_shards" : 3,
    "number_of_replicas" : 0
    },
     "mappings":{
      "books":{
        "properties":{
            "title":{"type":"text"},
            "name":{"type":"text","index":false},
            "publish_date":{"type":"date","index":false},
            "price":{"type":"double"},
            "number":{
                "type":"object",
                "dynamic":true
            }
        }
      }
     }
}

 

 2.7基本查詢(Query查詢)

PUT /lib3
{
    "settings":{
    "number_of_shards" : 3,
    "number_of_replicas" : 0
    },
     "mappings":{
      "user":{
        "properties":{
            "name": {"type":"text"},
            "address": {"type":"text"},
            "age": {"type":"integer"},
            "interests": {"type":"text"},
            "birthday": {"type":"date"}
        }
      }
     }
}

PUT lib3/user/1
{
  "name":"zhang san",
  "address": "shatian",
  "age":18,
  "interests": "drink,dance",
  "birthday": "2018-05-21"
}
PUT lib3/user/2
{
  "name":"li si",
  "address": "tongfan",
  "age":28,
  "interests": "shopping",
  "birthday": "2000-01-21"
}
PUT lib3/user/3
{
  "name":"wang wu",
  "address": "guangfeng",
  "age":20,
  "interests": "reading,drink,dance",
  "birthday": "2015-05-21"
}

GET /lib3/user/_search?q=name:lisi
GET /lib3/user/_search?q=name:zhaoliu&sort=age:desc   # 排序字段是long或者integer等數值

term查詢和terms查詢(不分詞)


# term query會去倒排索引中尋找確切的term,它並不知道分詞器的存在。這種查詢適合
# keyword 、numeric、date沒有分詞的。【直接查name沒有結果,因爲name是text,term中輸入的詞不會被分詞,match會被分詞】

term:查詢某個字段裏含有某個關鍵詞的文檔


# 精確查找,term中的field不能有空格等,因爲倒排索引後都是一堆的詞,而不是短語,(短語使用# match_phrase),
# 所以如果字段內容是:a,b,c ,使用term去查b,c 肯定是查不到的。因爲term中field不會被分詞,那麼就是# 說需要匹配b,c整體,
# 但是a,b,c被拆分了a b c 所以匹配不到。


# term:查找interests 字段爲 changge
GET /lib3/user/_search/
{
  "query": {
      "term": {"interests": "changge"}
  }
}


# terms:查詢某個字段裏含有多個關鍵詞的文檔

GET /lib3/user/_search
{
    "query":{
        "terms":{
            "interests": ["hejiu","changge"]
       }
   }
}


# 控制查詢返回的數量
GET /lib3/user/_search
{
    "from":0,    # 從哪個開始
    "size":2,    # 取幾個
    "query":{
        "terms":{
            "interests": ["hejiu","changge"]
        }
    }
}


# 返回版本號
GET /lib3/user/_search
{
    "version":true,   # 默認沒有版本號
    "query":{
        "terms":{
            "interests": ["hejiu","changge"]
        }
    }
}


match查詢(分詞)

# match query知道分詞器的存在,會對filed進行分詞操作,然後再查詢
GET /lib3/user/_search
{
    "query":{
        "match":{
            "name": "zhaoliu"
        }
    }
}

GET /lib3/user/_search
{
    "query":{
        "match":{
            "age": 20
        }
    }
}

# match_all:查詢所有文檔
GET /lib3/user/_search
{
  "query": {
    "match_all": {}
  }
}


# multi_match:可以指定多個字段
GET /lib3/user/_search
{
    "query":{
        "multi_match": {
            "query": "lvyou",
            "fields": ["interests","name"]
         }
    }
}


# match_phrase:短語匹配查詢

# ElasticSearch引擎首先分析(analyze)查詢字符串,從分析後的文本中構建短語查詢,這意味着必須匹配 
# 短語中的所有分詞,並且保證各個分詞的相對位置不變:
GET lib3/user/_search
{
  "query":{  
      "match_phrase":{  
         "interests": "duanlian,shuoxiangsheng"
      }
   }
}


# 指定返回的字段
GET /lib3/user/_search
{
    "_source": ["address","name"],
    "query": {
        "match": {
            "interests": "changge"
        }
    }
}


# 控制加載的字段
GET /lib3/user/_search
{
    "query": {
        "match_all": {}
    },
    
    "_source": {
          "includes": ["name","address"],  # 包含的字段
          "excludes": ["age","birthday"]   # 排除的字段
      }
}


# 使用通配符 *
GET /lib3/user/_search
{
    "_source": {
          "includes": "addr*",
          "excludes": ["name","bir*"]
        
    },
    "query": {
        "match_all": {}
    }
}

# 使用sort實現排序
# desc:降序,asc升序:
GET /lib3/user/_search
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
           "age": {
               "order":"asc"
           }
        }
    ]
}

GET /lib3/user/_search
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
           "age": {
               "order":"desc"
           }
        }
    ]
}



# 前綴匹配查詢
GET /lib3/user/_search
{
  "query": {
    "match_phrase_prefix": {
        "name": {
            "query": "zhao"   # 匹配一個詞的前綴
        }
    }
  }
}


# range:實現範圍查詢

# from,to:這2個默認包含邊界
# gte:大於等於,gt大於;
# lte :小於等於,lt小於;
# include_lower:是否包含範圍的左邊界,默認是true
# include_upper:是否包含範圍的右邊界,默認是true
GET /lib3/user/_search
{
    "query": {
        "range": {
            "birthday": {
                "from": "1990-10-10",
                "to": "2018-05-01"
            }
        }
    }
}

GET /lib3/user/_search
{
    "query": {
        "range": {
            "age": {
                "from": 20,
                "to": 25,
                "include_lower": true,
                "include_upper": false
            }
        }
    }
}



# wildcard查詢(wildcard中文:通配符)
# 允許使用通配符* 和 ?來進行查詢*代表0個或多個字符?代表任意一個字符
GET /lib3/user/_search
{
    "query": {
        "wildcard": {
             "name": "zhao*"
        }
    }
}
GET /lib3/user/_search
{
    "query": {
        "wildcard": {
             "name": "li?i"
        }
    }
}


# fuzzy實現模糊查詢,性能低value:查詢的關鍵字boost:查詢的權值,默認值是1.0
# fuzziness :參考,默認0.5,填寫“auto”,或者>5表示能編輯2次,老實說,沒太明白……,估計也不常用吧
# prefix_length:指明區分詞項的共同前綴長度,默認是0,前綴必須匹配串的長度
# max_expansions:查詢中的詞項可以擴展的數目,默認可以無限大
GET /lib3/user/_search
{
    "query": {
        "fuzzy": {
             "interests": "chagge"
        }
    }
}

GET /lib3/user/_search
{
    "query": {
        "fuzzy": {
             "interests": {
                 "value": "chagge"
             }
        }
    }
}




# 高亮搜索結果
GET /lib3/user/_search
{
    "query":{
        "match":{
            "interests": "changge"
        }
    },
    "highlight": {
        "fields": {
             "interests": {}
        }
    }
}




# boost提升權重創建mapping時,將title的權重設置爲普通的2倍,那麼最終匹配的得分會受影響,以達到獲# 取用戶想要的排名結果。

#設置mapping時設定。
PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "boost": 2     # 將title的值匹配的權重設置爲普通的2倍
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}


#查詢時設定:
GET lib3/user/_search
{
  "query": {
    "match": {
      "name": {
        "query": "zhan si san",
        "boost":"2"    # 查詢提升匹配的分值_score
      }
    }
  }
}



# Filter查詢filter是不計算相關性的,同時可以cache。因此,filter速度要快於query。
# 插入數據等待查詢
POST /lib4/items/_bulk
{"index": {"_id": 1}}
{"price": 40,"itemID": "ID100123"}
{"index": {"_id": 2}}
{"price": 50,"itemID": "ID100124"}
{"index": {"_id": 3}}
{"price": 25,"itemID": "ID100124"}
{"index": {"_id": 4}}
{"price": 30,"itemID": "ID100125"}
{"index": {"_id": 5}}
{"price": null,"itemID": "ID100127"}

# 簡單的過濾查詢
GET /lib4/items/_search
{ 
       "post_filter": {
             "term": {
                 "price": 40
             }
       }
}

GET /lib4/items/_search
{
      "post_filter": {
          "terms": {
                 "price": [25,40]
              }
        }
}

GET /lib4/items/_search
{
    "post_filter": {
        "term": {
            "itemID": "ID100123"  # 這裏無法查詢出來,因爲itemId是text類型會被分詞,分詞後存
儲爲小寫,所以這裏有2種處理辦法,第一種,把id改爲小寫:id100123,第二種設置字段index:false
          }
      }
}

# 查看分詞器分析的結果,不希望商品id字段被分詞,則重新創建映射
# GET /lib4/_mapping

DELETE lib4

PUT /lib4
{
    "mappings": {
        "items": {
            "properties": {
                "itemID": {
                    "type": "text",
                    "index": false
                }
            }
        }
    }
}




# bool過濾查詢可以實現組合過濾查詢
{
    "bool": {
        "must": [],
        "should": [],
        "must_not": []
    }
}
# must:必須滿足的條件—and
# filter 不must不同,filter分值被忽略,過濾器字句在過濾器上下文執行,
# should:可以滿足也可以不滿足的條件–or
# must_not:不需要滿足的條件–not

GET /lib4/items/_search
{
    "post_filter": {
          "bool": {
               "should": [
                    {"term": {"price":25}},
                    {"term": {"itemID": "id100123"}}
                   
                  ],
                "must_not": {
                    "term":{"price": 30}
                   }
                       
                }
             }
}

GET /lib4/items/_search
{
    "post_filter": {
          "bool": {
                "should": [
                    {"term": {"itemID": "id100123"}},
                    {
                      "bool": {
                          "must": [
                              {"term": {"itemID": "id100124"}},
                              {"term": {"price": 40}}
                            ]
                          }
                    }
                  ]
                }
            }
}


# 範圍過濾
# gt: >     lt: <     gte: >=     lte: <=
GET /lib4/items/_search
{
     "post_filter": {
          "range": {
              "price": {
                   "gt": 25,
                   "lt": 50
                }
            }
      }
}

GET /lib4/items/_search
{
  "query": {
    "bool": {
      "filter": {
          "exists":{
             "field":"price"
         }
      }
    }
  }
}

# 過濾非空
GET /lib4/items/_search
{
    "query" : {
        "constant_score" : {
            "filter": {
                "exists" : { "field" : "price" }
            }
        }
    }
}




# 過濾器緩存
# ElasticSearch提供了一種特殊的緩存,即過濾器緩存(filter cache),用來存儲過濾器的結果,被緩
# 的過濾器並不需要消耗過多的內存(因爲它們只存儲了哪些文檔能與過濾器相匹配的相關信息),而且可供
# 後續所有與之相關的查詢重複使用,從而極大地提高了查詢性能。

# 注意:ElasticSearch並不是默認緩存所有過濾器

以下過濾器默認不緩存:
numeric_range
script
geo_bbox
geo_distance
geo_distance_range
geo_polygon
geo_shape
and
or
not

以下默認是開啓緩存:
exists,missing,range,term,terms

# 開啓方式:在filter查詢語句後邊加上
"_catch":true



# 聚合查詢

# sum
GET /lib4/items/_search
{
  "size":0,
  "aggs": {
     "price_of_sum": {
         "sum": {
           "field": "price"
         }
     }
  }
}

#min
GET /lib4/items/_search
{
  "size": 0, 
  "aggs": {
     "price_of_min": {
         "min": {
           "field": "price"
         }
     }
  }
}


# max
GET /lib4/items/_search
{
  "size": 0, 
  "aggs": {
     "price_of_max": {
         "max": {
           "field": "price"
         }
     }
  }
}


# avg
GET /lib4/items/_search
{
  "size":0,
  "aggs": {
     "price_of_avg": {
         "avg": {
           "field": "price"
         }
     }
  }
}


# cardinality:求基數,互不相同的數,類似distinct,不包含null。(null,1,2,3,基數爲3)
GET /lib4/items/_search
{
  "size":0,
  "aggs": {
     "price_of_cardi": {
         "cardinality": {
           "field": "price"
         }
     }
  }
}


# terms:分組
GET /lib4/items/_search
{
  "size":0,
  "aggs": {
     "price_group_by": {
         "terms": {
           "field": "price"
         }
     }
  }
}


# 對那些有唱歌興趣的用戶按年齡分組
GET /lib3/user/_search
{
  "query": {
      "match": {
        "interests": "changge"
      }
   },
   "size": 0, 
   "aggs":{
       "age_group_by":{
           "terms": {
             "field": "age",
             "order": {
               "avg_of_age": "desc"
             }
           },
           "aggs": {
             "avg_of_age": {
               "avg": {
                 "field": "age"
               }
             }
           }
       }
   }
}

聚合補充:

 

基於groovy腳本執行partial update

es有內置的腳本支持,可以基於groovy腳本實現複雜的操作

# 修改年齡
POST /lib/user/4/_update
{
  "script": "ctx._source.age+=1"
}



# 修改名字
POST /lib/user/4/_update
{
  "script": "ctx._source.last_name+='hehe'"
}



# 添加愛好
POST /lib/user/4/_update
{
  "script": {
    "source": "ctx._source.interests.add(params.tag)",
    "params": {
      "tag":"picture"
    }
  }
}



# 刪除愛好
POST /lib/user/4/_update
{
  "script": {
    "source": "ctx._source.interests.remove(ctx._source.interests.indexOf(params.tag))",
    "params": {
      "tag":"picture"
    }
  }
}



# 刪除文檔
POST /lib/user/4/_update
{
  "script": {
    "source": "ctx.op=ctx._source.age==params.count?'delete':'none'",
    "params": {
        "count":29   # 如果這裏是數值類型,不要加引號
    }
  }
}



# upsert (如果存在執行script,如果不存在則,插入更新)
POST /lib/user/4/_update
{
  "script": "ctx._source.age += 1",

  "upsert": {
     "first_name" : "Jane",
     "last_name" :   "Lucy",
     "age" :  20,
     "about" :       "I like to collect rock albums",
     "interests":  [ "music" ]
  }
}



# 指定分ik分詞器  配合 拼音分詞 
# 地址 put localhost:9200/ceshi1

{
	"settings" : {
		"number_of_shards": 2,
		"number_of_replicas": 0,
		"analysis" : {
			"analyzer" : {
				"default" : {
					"tokenizer" : "ik_max_word"
				},
				"pinyin_analyzer" : {
	                "tokenizer" : "my_pinyin"
	            }
			},
			"tokenizer" : {
	            "my_pinyin" : {
	                "type" : "pinyin",
	                "keep_separate_first_letter" : false,
	                "keep_full_pinyin" : true,
	                "keep_original" : true,
	                "limit_first_letter_length" : 16,
	                "lowercase" : true,
	                "remove_duplicated_term" : true
	            }
	        }
		}   
	},
	"mappings" : {
		"blog1" : {
			"properties" : {
				"name" : {
					"type" : "text",
					"analyzer" : "ik_max_word",
					"search_analyzer" : "ik_max_word"
				},
				"title" : {
					"type" : "text",
					"analyzer" : "ik_smart"
				},
				"ceshi" : {
					"type" : "keyword"
				}
			}
		}
	}
}


# 對該index 下添加字段
# 地址 put localhost:9200/ceshi1/blog1/_mapping

{
	"properties" : {
		"shijian" : {
			"type" : "date",
			"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
		},
		"namepinyin" :{
			"type" : "text",
			"analyzer" : "ik_max_word",
			"fields" : {
				"pinyin" : {
					"type" : "text",
					"analyzer" : "pinyin_analyzer",
					"search_analyzer" : "pinyin_analyzer"
				}
			}
		}
	}
}



# 測試使用拼音搜索
# GET localhost:9200/ceshi1/blog1/_search

{
  "query":{
    "match": {
      "namepinyin.pinyin": "guo"
    }
  }
}

商城設置案例: 

# ik、拼音、數字ngram分詞混用 settings 設置  (商城案例)
{
	"index": {
		"analysis": {
			"analyzer": {
				"full_pinyin_analyzer": {
					"tokenizer" : "ik_smart",
					"filter": [
						"full_pinyin",
						"unique"
					]
				},
				"first_letter_pinyin_analyzer": {
					"tokenizer" : "ik_smart",
					"filter": [
						"first_letter_pinyin"
					]
				},
				"edge_ngram_analyzer": {
				   "type": "custom",
				   "tokenizer": "standard",
				   "filter": [
					  "lowercase",
					  "edge_ngram_filter"
				   ]
				}
			},
			"filter": {
				"full_pinyin": {
					"type": "pinyin",
					"lowercase": true,
					"keep_full_pinyin": false,
					"keep_joined_full_pinyin": true,
					"keep_none_chinese": true,
					"keep_none_chinese_together": true,
					"none_chinese_pinyin_tokenize": false
				},
				"first_letter_pinyin": {
					"type": "pinyin",
					"keep_first_letter": true,
					"keep_separate_first_letter": true,
					"keep_full_pinyin": false,
					"keep_joined_full_pinyin": false,
					"limit_first_letter_length": 16,
					"keep_original": false,
					"lowercase": true
				},
				"edge_ngram_filter": {
				   "type": "edge_ngram",
				   "min_gram": 1,
				   "max_gram": 10
				}
			}
		}
	}
}



# mapping 設置 個字段類型設置 (商城案例)

{
	"goods": {
		"_all": {
			"enabled": false
		},
		"properties": {
			"appCommission": {
				"type": "double"
			},
			"appPrice0": {
				"type": "double"
			},
			"appPrice1": {
				"type": "double"
			},
			"appPrice2": {
				"type": "double"
			},
			"appPriceMin": {
				"type": "double"
			},
			"appUsable": {
				"type": "byte"
			},
			"batchNum0": {
				"type": "integer"
			},
			"batchNum1": {
				"type": "integer"
			},
			"batchNum2": {
				"type": "integer"
			},
			"batchPrice0": {
				"type": "double"
			},
			"batchPrice1": {
				"type": "double"
			},
			"batchPrice2": {
				"type": "double"
			},
			"brandEnglish": {
				"type": "text",
				"index": "not_analyzed"
			},
			"brandId": {
				"type": "long"
			},
			"brandName": {
				"type": "text",
				"index": "not_analyzed"
			},
			"categoryId": {
				"type": "integer"
			},
			"categoryIds": {
				"type": "integer"
			},
			"categoryName": {
				"type": "text",
				"index": "not_analyzed"
			},
			"commissionRate": {
				"type": "integer"
			},
			"commissionTotal": {
				"type": "double"
			},
			"commonId": {
				"type": "integer"
			},
			"evaluateNum": {
				"type": "integer"
			},
			"freightArea": {
				"type": "integer"
			},
			"freightTemplateId": {
				"type": "integer"
			},
			"goodsFavorite": {
				"type": "integer"
			},
			"goodsFreight": {
				"type": "double"
			},
			"goodsModal": {
				"type": "byte"
			},
			"goodsName": {
				"type": "text",
				"fields": {
					"full_pinyin": {
						"type": "text",
						"analyzer": "full_pinyin_analyzer"
					},
					"first_letter": {
						"type": "text",
						"analyzer": "first_letter_pinyin_analyzer"
					},
					"goodsName": {
						"type": "text",
						"analyzer": "ik_smart",
						"search_analyzer": "ik_smart"
					},
					"goodsNameMax": {
						"type": "text",
						"analyzer": "ik_max_word",
						"search_analyzer": "ik_max_word"
					}
				}
			},
			"jingle": {
				"type": "text",
				"analyzer": "ik_smart",
				"search_analyzer": "ik_smart"
			},
			"specString": {
				"type": "text",
				"analyzer": "edge_ngram_analyzer",
				"search_analyzer": "standard"
			},
			"goodsSpecList": {
				"type": "nested",
				"properties": {
					"goodsId": {
						"type": "integer"
					},
					"spec": {
						"type": "text",
						"analyzer": "edge_ngram_analyzer",
						"search_analyzer": "standard"
					}
				}
			},
			"areaInfo": {
				"type": "text",
				"index": "no"
			},
			"goodsRate": {
				"type": "integer"
			},
			"goodsSaleNum": {
				"type": "integer"
			},
			"goodsState": {
				"type": "byte"
			},
			"goodsStatus": {
				"type": "byte"
			},
			"goodsVerify": {
				"type": "byte"
			},
			"imageName": {
				"type": "text",
				"index": "no"
			},
			"goodsImageList": {
				"type": "nested",
				"properties": {
					"imageId": {
						"type": "integer"
					},
					"commonId": {
						"type": "integer"
					},
					"colorId": {
						"type": "integer"
					},
					"imageName": {
						"type": "text",
						"index": "no"
					},
					"imageSort": {
						"type": "integer"
					},
					"isDefault": {
						"type": "integer"
					},
					"imageSrc": {
						"type": "text",
						"index": "no"
					}
				}
			},
			"isDistribution": {
				"type": "byte"
			},
			"isGift": {
				"type": "byte"
			},
			"isOwnShop": {
				"type": "byte"
			},
			"ordersCount": {
				"type": "integer"
			},
			"promotionId": {
				"type": "integer"
			},
			"promotionState": {
				"type": "byte"
			},
			"promotionType": {
				"type": "byte"
			},
			"promotionStartTime": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"promotionEndTime": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"sellerId": {
				"type": "integer"
			},
			"storeId": {
				"type": "integer"
			},
			"storeName": {
				"type": "text",
				"analyzer": "ik_smart",
				"search_analyzer": "ik_smart"
			},
			"labelIdList": {
				"type": "integer"
			},
			"joinBigSale": {
				"type": "integer"
			},
			"unitName": {
				"type": "text",
				"index": "no"
			},
			"usableVoucher": {
				"type": "byte"
			},
			"webCommission": {
				"type": "double"
			},
			"webPrice0": {
				"type": "double"
			},
			"webPrice1": {
				"type": "double"
			},
			"webPrice2": {
				"type": "double"
			},
			"webPriceMin": {
				"type": "double"
			},
			"webUsable": {
				"type": "byte"
			},
			"WechatCommission": {
				"type": "double"
			},
			"wechatPrice0": {
				"type": "double"
			},
			"wechatPrice1": {
				"type": "double"
			},
			"wechatPrice2": {
				"type": "double"
			},
			"wechatPriceMin": {
				"type": "double"
			},
			"wechatUsable": {
				"type": "byte"
			},
			"searchBoost": {
				"type": "integer"
			},
			"extendString0": {
				"type": "text",
				"index": "no"
			},
			"extendString1": {
				"type": "text",
				"index": "no"
			},
			"extendString2": {
				"type": "text",
				"index": "no"
			},
			"extendString3": {
				"type": "text",
				"index": "no"
			},
			"extendString4": {
				"type": "text",
				"index": "no"
			},
			"extendString5": {
				"type": "text",
				"index": "no"
			},
			"extendString6": {
				"type": "text",
				"index": "no"
			},
			"extendString7": {
				"type": "text",
				"index": "no"
			},
			"extendString8": {
				"type": "text",
				"index": "no"
			},
			"extendString9": {
				"type": "text",
				"index": "no"
			},
			"extendInt0": {
				"type": "integer"
			},
			"extendInt1": {
				"type": "integer"
			},
			"extendInt2": {
				"type": "integer"
			},
			"extendInt3": {
				"type": "integer"
			},
			"extendInt4": {
				"type": "integer"
			},
			"extendInt5": {
				"type": "integer"
			},
			"extendInt6": {
				"type": "integer"
			},
			"extendInt7": {
				"type": "integer"
			},
			"extendInt8": {
				"type": "integer"
			},
			"extendInt9": {
				"type": "integer"
			},
			"extendPrice0": {
				"type": "double"
			},
			"extendPrice1": {
				"type": "double"
			},
			"extendPrice2": {
				"type": "double"
			},
			"extendPrice3": {
				"type": "double"
			},
			"extendPrice4": {
				"type": "double"
			},
			"extendPrice5": {
				"type": "double"
			},
			"extendPrice6": {
				"type": "double"
			},
			"extendPrice7": {
				"type": "double"
			},
			"extendPrice8": {
				"type": "double"
			},
			"extendPrice9": {
				"type": "double"
			},
			"extendTime0": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime1": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime2": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime3": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime4": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime5": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime6": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime7": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime8": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			},
			"extendTime9": {
				"type": "date",
				"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
			}
		}
	}
}

 

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>6.7.1</version>
        </dependency>


        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>6.7.1</version>
        </dependency>

簡單例子:

private static String host = "localhost";
    private static int prot = 9200;

# 高級端
RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost(host, prot)));


  # 一.通過文檔插入時的ID查詢  GET請求獲取
        GetRequest getRequest = new GetRequest("index", "blog", "1");
        GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
        System.out.println(getResponse.getSource());
        client.close();


  # 二.搜索API  該SearchRequest用於具有與搜索文件,彙總,建議做,也要求提供高亮顯示所產生的文件的方式中的任何操作。(1)
        SearchRequest searchRequest = new SearchRequest();
#        大多數搜索參數都會添加到SearchSourceBuilder。它爲搜索請求正文中的所有內容提供了setter。(2)
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
#        添加match_all查詢到SearchSourceBuilder。(查詢所有)(3)
        searchSourceBuilder.query(QueryBuilders.matchAllQuery());
#        添加SearchSourceBuilder到SeachRequest。(4)
        searchRequest.source(searchSourceBuilder);
#        指定只能在哪些文檔庫中查詢:可以添加多個且沒有限制,中間用逗號隔開(5)
        searchRequest.indices("index");
        SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
#        返回數據 Hits 循環遍歷獲取
        for (SearchHit hit : response.getHits()) {
            String json = hit.getSourceAsString();
            System.out.println(json);
        }
        client.close();



#       插入數據組裝
        Map<String, Object> jsonMap = new HashMap<>();
        jsonMap.put("id", "2");
        jsonMap.put("title", "PHP設計模式");
        jsonMap.put("content", "這個是測試的第二個插入數據");
        jsonMap.put("postdate", "2018-12-13");
        jsonMap.put("url", "www.baidu.com");

#       內置json轉換
        XContentBuilder builder = XContentFactory.jsonBuilder();
        builder.startObject();
        {
            builder.field("title", "PHP設計模式");
            builder.field("content", "這個是測試的第二個插入數據");
            builder.timeField("postdate", new Date());
            builder.field("url", "www.baidu.com");

        }
        builder.endObject();

#       插入
        IndexRequest indexRequest = new IndexRequest("index", "blog", "1");
        indexRequest.source(jsonMap);

        IndexResponse indexResponse = client.index(indexRequest, RequestOptions.DEFAULT);
        System.out.println(indexResponse.getResult());
        client.close();

java 操作加入集羣案例: 

ES官網:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-term-vectors.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章