ElasticSearch是什麼
ElasticSearch是一個基於Lucene的搜索服務器。它提供了一個分佈式多用戶能力的全文搜索引擎,基於RESTful web接口,Elasticsearch是用Java開發的,並作爲Apache許可條款下的開放源碼發佈,是當前流行的企業級搜索引擎。設計用於雲計算中,能夠達到實時搜索,穩定,可靠,快速,安裝使用方便。
ElasticSearch使用場景
ElasticSearch(後面簡稱ES)主要用來存儲半結構化數據,實現搜索和分析功能;
ES的存儲方式是JSON
主要的核心算法是倒排索引
常用工具
Head插件(基於Node.js需手動安裝):
訪問和管理工具:http://XXXX:9100/
在地址欄輸入 http://XXXX:9200/
連接 Kibana插件:
命令執行工具: http://XXXX:5601
存儲結構
ES、ArteryBase、GreenPlum數據存儲結構對比
結構層次
字段=>文檔=>類型=>索引=>分片(shard)=>節點(node)=>集羣(cluster)
副本
是爲了保證在某一個節點down的時候,其所擁有的分片數據不丟失 用戶可以自定義分片數量、副本數量
JSON語法
查詢:
創建索引(是否存在分詞器;使用動態模板)
增加一個文檔,ES中_ID問題
定義搜索字段、搜索條件、限制返回數量
term、match、match_phrase的使用和區別
ES裏面的聚合函數
query與filter的使用和區別
詳細樣例見如下腳本
1.基礎查詢用法
1)查詢
分詞
IK分詞器
不進行分詞的索引
PUT /syltest
{
"mappings": {
"syltest": {
"dynamic_templates": [{
"es": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
GET /syltest/_analyze?analyzer=ik_max_word
{
"text":"es測試用例"
}
進行分詞的索引
PUT /syltest
{
"mappings": {
"syltest": {
"properties": {
"title":{
"type": "text",
"analyzer": "ik_max_word"
},
"content":{
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
}
動態模板
PUT /syltest
{
"mappings": {
"syltest": {
"dynamic_templates":[{
"es":{
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}]
}
}
}
動態模板,對某一字段進行分詞過濾
PUT /syltest
{
"mappings": {
"syltest": {
"dynamic_templates":[
{
"eh":{
"match": "content",
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"analyzer": "not_analyzed"
}
}
},{
"es":{
"match_mapping_type": "string",
"mapping": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}]
}
}
}
term語句:完全匹配,精確查找,不會對查詢體進行分詞
GET syltest/syltest/_search
{
"query": {
"term": {
"title": "測試"
}
},
"size": 20,
"_source":{
"includes":[
"title"
]
,"excludes":[]
}
}
分詞查詢match和match_phrase區別:
會對查詢體進行分詞,match_phrase會要求查詢體分詞後順序符合
GET /syltest/syltest/_search
{
"query": {
"match": {
"title": "用例測試"
}
}
}
GET /syltest/syltest/_search
{
"query": {
"match_phrase": {
"title": "用例測試"
}
}
}
2)聚合函數
groupBy用法
GET /etllog/_search
{
"aggs" : {
"groupBy" : {
"terms" : { "field" : "dbsource.keyword" }
}
}
}
count(distinct)用法(會有準確度問題)
GET /etllog/_search
{
"aggs": {
"distinct": {
"cardinality": {
"field": "dbsource.keyword"
}
}
}
}
3)過濾器
查詢置於 filter 語句內不進行評分或相關度的計算,所以所有的結果都會返回一個默認評分 1 。
GET syltest/syltest/_search
{
"filter": {
"term": {
"title": "測試"
}
}
}
GET syltest/syltest/_search
{
"query": {
"bool": {
"filter": {
"term": {
"title": "測試"
}
}
}
}
}
使用query和filter查詢的話,需要使用 {query:{filtered:{}}} 來包含這兩個查詢語法。他們的好處是,藉助於filter的速度可以快速過濾出文檔,然後再由query根據條件來匹配。
JAVA_API:
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html
package com.thunisoft.test;
import org.elasticsearch.action.search.SearchAction;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.QueryParseContext;
import org.elasticsearch.plugins.SearchPlugin;
import org.elasticsearch.search.SearchModule;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.xpack.client.PreBuiltXPackTransportClient;
import java.io.IOException;
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.ArrayList;
import java.util.Collections;
/**
* @ProjectName: CallAble
* @Package: com.thunisoft.test
* @ClassName: ESTest
* @Author: songyulin
* @CreateDate: 2018/8/2 15:41
* @UpdateRemark: 更新說明
* @Version: 1.0
*/
public class ESTest {
static TransportClient client = null;
static {
Settings settings = Settings.builder()
.put("cluster.name", "esCluster")
.put("xpack.security.transport.ssl.enabled", false)
.put("client.transport.ping_timeout", "120s")
.put("indices.store.throttle.type", "none")
.build();
try {
client =new PreBuiltXPackTransportClient(settings).addTransportAddress(
new InetSocketTransportAddress(InetAddress.getByName("IP"),端口號));
} catch (UnknownHostException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws IOException {
queryMethod();
//queryMethodSecond();
}
//嵌套query查詢方式
public static void queryMethod(){
QueryBuilder query = QueryBuilders.termQuery("title", "測試");
SearchResponse sResponse = client.prepareSearch("syltest").setTypes("syltest").setQuery(query).setSize(20)
.execute().actionGet();
System.out.println(sResponse);
}
//直接讀json方式
public static void queryMethodSecond() throws IOException {
String jsonStr = "{\n" +
" \"query\": {\n" +
" \"term\": {\n" +
" \"title\": \"測試\"\n" +
" }\n" +
" }\n" +
"}";
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
SearchModule searchModule = new SearchModule(Settings.EMPTY, false, new ArrayList<SearchPlugin>());
try (
XContentParser parser = XContentFactory.xContent(XContentType.JSON)
.createParser( new NamedXContentRegistry(searchModule.getNamedXContents()), jsonStr))
{
searchSourceBuilder.parseXContent(new QueryParseContext(parser));
}
SearchRequestBuilder searchRequestBuilder = new SearchRequestBuilder(client, SearchAction.INSTANCE);
SearchResponse searchResponse = searchRequestBuilder.setSource(searchSourceBuilder).setIndices("syltest")
.execute().actionGet();
System.out.println(searchResponse);
}
}
POM:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>5.6.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.6.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId> <
artifactId>x-pack-transport</artifactId>
<version>5.6.1</version>
</dependency>
ES5新增特性
引入新的字段類型Text/Keyword 來替換 String
keyword類型的數據只能完全匹配,適合那些不需要分詞的數據,對過濾、聚合非常友好, text當然就是全文檢索需要分詞的字段類型了。
舊索引刪除機制
舊節點包含舊的索引數據時,重新啓用節點會加載舊的索引數據,es5.0會在集羣狀態信息裏面保留500個刪除的索引信息,所以如果發現這個索引是已經刪除過的就會自動清理,不會再重複加進來
java client API
新的基於HTTP協議的客戶端對Elasticsearch的依賴解耦,沒有jar包衝突,提供了集羣節點自動發現、日誌處理、節點請求失敗自動進行請求輪詢,充分發揮Elasticsearch的高可用能力,並且性能不相上下。
ES執行計劃profile
profile API提供有關搜索請求中各個組件的執行的 詳細計時信息。它使用戶能夠深入瞭解如何在較低級別執行 搜索請求,以便用戶可以理解爲什麼某些請求很慢, 並採取措施來改進它們。
Lucene執行計劃Explain
GET syltest/syltest/_search {
"profile": true,
"query": {
"term": {
"title": "測試"
}
}
}
學習資源
學習資料
網頁版“ElasticSearch權威指南”:
https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html
PDF版“ElasticSearch權威指南”
官方的ES使用文檔:
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/index.html