elasticsearch索引快速入門-實時全文搜索引擎

一.es是什麼

  Search & Analyze Data in Real Time

  核心的功能就是搜索,全文搜索框架,接近實時的搜索強力搜索引擎依賴Lucene,新上傳,修改的索引同步速度接近實時

優勢:

1.分佈式,水平擴容,高可用

2.實時搜索,提供分詞功能

3.提供強力的restfulAPI

二.場景介紹

     tb級別的數據量,需要提供全文搜索功能,並且實時返回匹配的結果如下

 

 


例如在一個入口搜索一個組合的關鍵詞,得到最匹配的結果列表,並且是實時返回,索引中存着很多的商品  tb級別) 用火鍋 辣 這樣的組合單詞去搜索索引中的title字段

1.【通州區】麻合辣重慶九宮格火鍋

2. 【平谷城區】北京嗨辣激情火鍋

分詞器會把titel 【通州區】麻合辣重慶九宮格火鍋 進行一個拆分 [通,州,區,麻,合,辣,重,慶,九,宮,格,火,鍋] ,之後進行單詞匹配,並給匹配的結果打分(關聯性)之後利用打分的結果進行排序,返回最匹配的結果

更詳細有關分詞器內容可以查看官方文檔

 

3 安裝(單機版)

https://www.elastic.co/downloads/elasticsearch

下載後解壓進入bin目錄  

輸入./elasticsearch


看到上圖表示啓動成功

4 es詞彙

es有很多新的名詞例如node  document index  type  id理解這些詞組纔能有一個好的開始

node 集羣中的一個節點;

index :一個索引是一個包含某些特性類似數據的集合

type:在一個索引裏面,可以定義一個或多個types, 一個type是邏輯 分類你的索引數據

document:一個文本是一個能被索引的基礎單位

對比mysql數據關係如下

  mysql:   db -table - row 

        es:     index-type-id

mysql的庫等同於es的index,table等同於type,row等同於id;

五. restful API

https://github.com/bly2k/files/blob/master/accounts.zip?raw=true  1000條批量json數據

提取它放到當前命令後目錄輸入

curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary "@accounts.json" 

這個操作會上傳1000條數據進入bank下面的account type下

批處理命令 _bulk

?pretty 漂亮的格式返回 

下列是列舉各類的查詢語法

分頁

curl -XPOST 'localhost:9200/hotelswitch/_search?pretty' -d ' { "query": { "match_all": {} }, "from": 10, "size": 10 }'

排序: 

curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' 
{ "query": { "match_all": {} }, "sort": { "balance": { "order": "desc" } } }'

返回部分字段 -在source 裏面指定

curl -XPOST 'localhost:9200/hotelswitch/_search?pretty' -d ' { "query": { "match": {"account_number":20} }, "_source": ["account_number", "email"] }'

查詢語句  空格代表或查詢 
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "match": { "address": "mill lane" } } }'

組合查詢

curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }'
範圍過濾器
curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } }'

聚合函數 類似於sql  的group by 

curl -XPOST 'localhost:9200/bank/_search?pretty' -d ' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" } } } }'

更多詳細的restful API可以看官方文檔

六  java client

1.maven引入依賴jar包

<dependency>
 <groupId>org.elasticsearch</groupId>
 <artifactId>elasticsearch</artifactId>
 <version>2.4.0</version>
</dependency>
 

2.上傳索引和文本

public class elasticSearch_local {
    private static final Logger logger = LoggerFactory.getLogger(elasticSearch_local.class);
 private static Random r=new Random();
 static int []  typeConstant =new int[]{0,1,2,3,4,5,6,7,8,9,10};
 static String []  roomTypeNameConstant =new String[]{"標準大牀房","標準小牀房","豪華大房","主題情侶房間"};
 public static void main (String []agre) throws Exception {
        //http://bj1.lc.data.sankuai.com/ test 80 online 9300
 // on startup
 //初始化client實列 連接本機的es 9300端口
 TransportClient client = TransportClient.builder().build()
                .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
 long startTime = System.currentTimeMillis();
 for (int i=0;i<1000;i++) {
            //上傳數據第一個參數爲索引,第二個爲type,source是文本
 IndexResponse response = client.prepareIndex("hotel", "room")
                    .setSource(getEsDataString()
                    )
                    .get();
 }
        logger.info(" run 1000 index consume time : "+(System.currentTimeMillis()-startTime));
 }

    public static XContentBuilder getEsDataString () throws Exception{
        SimpleDateFormat sp =new SimpleDateFormat("yyyy-MM-dd");
 Date d =new Date();
 int offset  = r.nextInt(15);
 //es的原生api 提供json數據的轉換 jsonBuilder.field(key,value).endObject();
 XContentBuilder object=   jsonBuilder()
                .startObject().field("gmtCreate", (System.currentTimeMillis()-(864000008*offset))+"").field("gmtModified",(System.currentTimeMillis()-(864000008*offset))+"")
                .field("sourceType",typeConstant[r.nextInt(10)]+"").field("partnerId",r.nextInt(999999999)+"").field("poiId",r.nextInt(999999999)+"")
                .field("roomType",r.nextInt(999999999)+"").field("roomName",roomTypeNameConstant[r.nextInt(4)]).field("bizDay",r.nextInt(999999999)+"")
                .field("status",typeConstant[r.nextInt(10)]+"").field("freeCount",r.nextInt(99999)+"").field("soldPrice",r.nextInt(99999)+"")
                .field("marketPrice",r.nextInt(99999)+"").field("ratePlanId",r.nextInt(99999)+"").field("accessCode",r.nextInt(999999999)+"")
                .field("basePrice",r.nextInt(999999999)+"").field("memPrice",r.nextInt(999999999)+"").field("priceCheck",typeConstant[r.nextInt(10)]+"")
                .field("shardPart",typeConstant[r.nextInt(10)]+"").field("sourceCode",typeConstant[r.nextInt(10)]+"").field("realRoomType",r.nextInt(999999999)+"")
                .field("typeLimitValue",typeConstant[r.nextInt(10)]+"").field("openInventoryByAccessCodeList","").field("closeInventoryByAccessCodeList","")
                .field("openOrClose","1").field("openInventoryByAccessCodeListSize",r.nextInt(999999999)+"").field("openInventoryByAccessCodeListIterator",r.nextInt(999999999)+"")
                .field("closeInventoryByAccessCodeListSize",r.nextInt(999999999)+"").field("closeInventoryByAccessCodeListIterator",r.nextInt(999999999)+"")
                .field("datetime", sp.format(d))
                .endObject();
 return object;
 }
}

 

3.查詢代碼

public class elasticSearch_formeituanSearch {
    private static final Logger logger = LoggerFactory.getLogger(elasticSearch_formeituanSearch.class);
 public static void main (String []agre) throws Exception {
        //http://bj1.lc.data.sankuai.com/ test 80 online 9300
 // on startup
 //連接到集羣 初始化客戶端
 TransportClient client = TransportClient.builder().build()
                .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
 /*QueryBuilder queryBuilder = QueryBuilders
 .disMaxQuery()
 .add(QueryBuilders.termQuery("roomName", "豪華大牀"))
 .add(QueryBuilders.termQuery("status", "0"));*/
 //查詢條件 在匹配文字的時候一定用matchQuery termQuery 用於精確匹配 匹配數字 ,long型 term查詢不會分詞
 QueryBuilder qb = boolQuery().must(matchQuery("roomName", "豪華大房")) ;
/* QueryBuilder qb = boolQuery()
 .must(matchQuery("roomName", "豪華大房"))
 .must(matchQuery("status", "0"))
 .must(matchQuery("sourceCode", "4"))
 .must(matchQuery("typeLimitValue", "5"))
 .must(matchQuery("soldPrice", "11673"));*/

 SearchResponse response = client.prepareSearch("hotel")   //hotel索引
 .setTypes("room")  //room type
 .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)   //搜索類型
 .setQuery(qb)              // Query
 .setPostFilter(QueryBuilders.rangeQuery("datetime").gte("2016-10-20").lte("2016-10-21").format("yyyy-MM-dd"))  //在查詢到的結果後 進行日期過濾
 .setFrom(0).setSize(10).setExplain(true)  //分頁
 .execute()  //執行
 .actionGet();

 long count =response.getHits().getTotalHits(); //命中的結果
 System.out.println(count);
 SearchHit[] hits =response.getHits().getHits();
 for (SearchHit hit : hits) {
            System.out.println(hit.getSource());
 }
    }

}

 

4 刪除數據

public class elasticSearch_fordelete {
  private static final Logger logger = LoggerFactory.getLogger(elasticSearch_fordelete.class);
 public static void main (String []agre) throws Exception {
TransportClient client = TransportClient.builder().build()
                .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("127.0.0.1"), 9300));
 //匹配所有 Scroll便利數據 每次讀取1000條 while循環中 會重新拉取數據 大數據建議用Scroll
 QueryBuilder qb = matchAllQuery();
 SearchResponse response = client.prepareSearch("hotelindex")
                .setTypes("poidata")
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                .addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
                .setScroll(new TimeValue(60000))
                .setQuery(qb)
                .setFrom(0)
                .setSize(50)
                .execute()
                .actionGet();

 long count =response.getHits().getTotalHits();
 while (true) {
            for (SearchHit hit : response.getHits().getHits()) {
                client.prepareDelete(hit.getIndex(),hit.getType(),hit.getId()).get();
 }
            try {
                response = client.prepareSearchScroll(response.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
 //Break condition: No hits are returned
 if (response.getHits().getHits().length == 0) {
                    break;
 }
            }catch (Exception e){
                e.printStackTrace();
 }
        }
    }
}

 

搜索區別-

 //查詢條件 在匹配文字的時候一定用matchQuery termQuery用於精確匹配匹配數字long型term查詢不會分詞

 

match_query :全文搜索 首先分析單詞

term_query:精確查詢-不分析單詞

 

 

Mapings: 

  建立字段映射多種數據類型

  注意 已經存在的索引不能夠重新被映射

  索引的幾種建立方式

  https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html



需要源碼的請加技術羣:468246651


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章