Elasticsearch[2.0] ☞ Java Client API

Elasticsearch 2.0 ☞ Java Client API

迷惑的詞語:
noise:噪音。這個應該在elasticsearch裏面咋理解啊
hop :跳躍。這個又應該咋理解?是連接的意思嗎

Preface (前言)

這章節介紹了使用Java API作爲elasticsearch客戶端。
所有的elasticsearch操作使用的都是客戶端對象執行。
所有的操作本質上都是異步的(無論是接受一個監聽器或者在返回以後)。
此外,客戶端還提供了批量執行的操作。
注意:所有的API本質上都是通過Java API執行的(因爲elasticsearch內部執行的時候都會轉換成java 客戶端的)

Maven Repository (Maven資料庫)

Deploying in JBoss EAP6 module 部署在JBoss EAP6 模塊

Client 客戶端

你可以使用Java 客戶端可以在如下多個方面:

  • 在現有集羣上執行標準的index、get、delete和search操作
  • 在一個正在運行的集羣上面進行
  • 全部節點都啓動時候當你想要在你的APP中調用elasticsearch時候或者你想進行單元和集成測試時候

獲取elasticsearch 客戶端是非常簡單的,獲取一個客戶端最常見的方法使:

  1. 創建一個節點來充當你集羣裏面的節點
  2. 然後從這個節點發起求情來獲取一個客戶端

另一種方法是創建一個TransportClient來連接集羣
注意

官方建議使用和集羣想匹配的客戶端,不然你有可能會碰到一些不兼容的問題。

Node Client 節點客戶端

實例化一個節點的客戶端是獲取一個能夠對elasticsearch執行操作的客戶端的最簡單辦法。

import static org.elasticsearch.node.NodeBuilder.*;

//on startup

Node node = nodeBuilder().node();
Client client = node.client();

//on shutdown

node.close();

當你啓動一個Node時候,它已經加入了一個elasticsearch集羣。你可以通過簡單的設置cluster.name參數來達到連接不同的集羣,或者你也可以通過clusterName方法直接創建
兩種配置方式:

  • 配置文件的方式

    在你項目的/src/main/resources / elasticsearch.yml文件中定義cluster.name屬性
    cluster.name: yourclustername

  • 在Java中配置

    Node node = nodeBuilder().clusterName(“yourclustername”).node();
    Client client = node.client();

使用客戶端的好處是操作是自動路由到需要進行操作的節點上面進行的,不執行“double hop”。例如,在執行index操作的時候它是在shard上自動執行的。

當你啓動一個Node時候,最重要的是判斷是否應該保存數據。換句話說就是,shard和indices是否應該被分配給它。很多戶時候我們僅僅是希望客戶端只是一個客戶端,並沒有shard分配給它們。這可以通過將node.data簡單的設置爲false或者node.client配置爲true即可(the NodeBuilder respective helper methods on it)。

import static org.elasticsearch.node.NodeBuilder.*;

// on startup

// Embedded node clients behave just like standalone nodes,
// which means that they will leave the HTTP port open!
Node node =
    nodeBuilder()
        .settings(Settings.settingsBuilder().put("http.enabled", false))
        .client(true)
    .node();

Client client = node.client();

// on shutdown

node.close();

另外一種場景是當我們在進行單元或者集成測試時候啓動一個節點。在這種情況下我們只想啓動一個“local”節點(與“local”的discovery and transpor)。再次,這也是一個簡單的辦法,在節點啓動的時候設置爲local。注意,“local”在這裏是指本地的JVM(實際的類裝載器),這意味着如果本地的JVM上同時開啓了兩個服務將組成一個集羣。

import static org.elasticsearch.node.NodeBuilder.*;

// on startup

Node node = nodeBuilder().local(true).node();
Client client = node.client();

// on shutdown

node.close();

節點客戶端的缺點
嵌入一個節點客戶端到你的應用程序中這種最簡單的方法連接elasticsearch集羣會帶來一些負面的影響:

  • 頻繁的啓動和停止一個或多個節點誇集羣創建時候會產生不必要的噪音(unnecessary noise)
  • 嵌入式節點的客戶端將響應外部的請求,就像任何其他的客戶端一樣。
    • 你幾乎總是要禁用HTTP爲嵌入式節點的客戶端

Transport Client 傳輸客戶端

TransportClient使用transport模塊連接遠程的Elasticsearch集羣。它不會加入集羣,但只會有一個或多個初始傳輸地址與集羣以流的方式對每個操作進行交互(雖然大多數動作可能是“two hop”操作)

// on startup

Client client = TransportClient.builder().build()
        .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300))
        .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host2"), 9300));

// on shutdown

client.close();

現在你就可以設置你的集羣名稱了,如果你使用集羣不是默認的“elasticsearch”

Settings settings = Settings.settingsBuilder()
        .put("cluster.name", "myClusterName").build();
Client client = TransportClient.builder().settings(settings).build();
//Add transport addresses and do something with the client...

或者使用 elasticsearch.yml文件,具體查看上一節Node Client

客戶端也可以嗅探集羣,客戶端可以將集羣的其他數據節點添加到連接列表中供自己使用。在這種情況下,請注意,將會使用上面addTransportAddress中填寫的地址去嗅探其他地址(就是“publish”的地址)。爲了啓用這個功能我們需要將client.transport.sniff設置爲true。

Settings settings = Settings.settingsBuilder()
        .put("client.transport.sniff", true).build();
TransportClient client = TransportClient.builder().settings(settings).build();

其他的一些transport client級別設置參數還包括:

Parameter Description
client.transport.ignore_cluster_name Set to true to ignore cluster name validation of connected nodes. (since 0.19.4)設置爲true則忽略集羣名稱驗證
client.transport.ping_timeout The time to wait for a ping response from a node. Defaults to 5s.連接超時
client.transport.nodes_sampler_interval How often to sample / ping the nodes listed and connected. Defaults to 5s.設置多久去操作節點取樣和ping一次,默認是5秒

Document APIs 文檔API

這部分文檔主要描述CRUD APIs部分

  • Index API
  • Get API
  • Delete API
  • Update API

多種文件API(Multi-document APIs)

  • Multi Get APIs
  • Bulk API

注意:
所有的CRUD APIs都是單個索引API(single-index)。這個索引參數接受單個的索引名稱或者analias指向單個的索引。

Index API

index API允許一個JSON類型文檔的索引轉換成爲一個具體的指標,並且使其可以搜索。

生成JSON 文檔

生成一個JSON文件有以下幾種不同的方法:
手動使用原生的byte[]或者一個字符串
使用map將會自動的轉換成一個等價的JSON
使用一個第三方的序列化庫,例如Jackson
使用 elasticsearch內置XContentFactory.jsonBuilder()

在內部,以上每種方式最終都會被轉換成byte[](所以一個String會被轉換爲一個byte[]) 。因此,如果對象已經是這種形式了那麼就可以直接使用它。該jsonBuilder是高度優化的JSON生成器,直接構造一個byte[]

  • Do it Yourself(不需要手動,其實就是使用字符串自己拼接JSON)
    這裏沒有什麼難的,注意的就是日期編碼格式
String json = "{" +
        "\"user\":\"kimchy\"," +
        "\"postDate\":\"2013-01-30\"," +
        "\"message\":\"trying out Elasticsearch\"" +
    "}";
  • 使用map
    map 本身就是一個key:value的形式,它就代表了一個JSON格式:
Map < String ,  Object > json =  new  HashMap < String ,  Object >(); 
json . put ( "user" , "kimchy" ); 
json . put ( "postDate" , new  Date ()); 
json . put ( "message" , "trying出Elasticsearch“);
  • 序列化的beans
    Elasticsearch已經使用了Jackson。所以你能夠用它序列化你的bean成爲一個JSON。
import com.fasterxml.jackson.databind.*;

// instance a json mapper
ObjectMapper mapper = new ObjectMapper(); // create once, reuse

// generate json
byte[] json = mapper.writeValueAsBytes(yourbeaninstance);
  • Use Elasticsearch helpers 使用 elasticsearch助手
    elasticsearch提供了內部的輔助函數來生成JSON內容。
import static org.elasticsearch.common.xcontent.XContentFactory.*;

XContentBuilder builder = jsonBuilder()
    .startObject()
        .field("user", "kimchy")
        .field("postDate", new Date())
        .field("message", "trying out Elasticsearch")
    .endObject()

請注意。你也可以添加數組(startArray(String)和endArray())到該方法。順便說一下這個field方法可以接受很多種對象類型。你可以直接通過使用數字、日期甚至其他XContentBuilder對象。如果你需要查看生成的JSON字符串你可以使用string()方法。

String json = builder.string();

Index document

在下面的例子中索引的JSON文檔將裝換成所謂的twitter,在一個名爲tweet的索引,ID爲1:

import static org.elasticsearch.common.xcontent.XContentFactory.*;

IndexResponse response = client.prepareIndex("twitter", "tweet", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        .get();

請注意,你也可以索引你的文檔爲一個JSON字符串,並且你不需要給定一個ID:

String json = "{" +
        "\"user\":\"kimchy\"," +
        "\"postDate\":\"2013-01-30\"," +
        "\"message\":\"trying out Elasticsearch\"" +
    "}";

IndexResponse response = client.prepareIndex("twitter", "tweet")
        .setSource(json)
        .get();

IndexResponse會給你一個報告:

// Index name
String _index = response.getIndex();
// Type name
String _type = response.getType();
// Document ID (generated or not)
String _id = response.getId();
// Version (if it's the first time you index this document, you will get: 1)
long _version = response.getVersion();
// isCreated() is true if the document is a new one, false if it has been updated
boolean created = response.isCreated();

更多的信息查看REST index文檔
Operation Threading操作線程
索引API允許你設置線程執行的時候將實際操作的API在同一個節點上執行(即,API執行shard時候分配在同一臺機器上)。
選擇一個不同的線程上執行操作,或在調用線程上執行它(注意,API任然是異步的 )。默認情況下operationThreaded設置爲True(將會在不同的線程上執行)

GET API

GET API允許我們從索引根據ID獲取一個JSON格式的document。在下面的例子中從索引twitter獲取一個type爲tweet,ID等於1的索引:

GetResponse response = client.prepareGet(“twitter”, “tweet”, “1”).get();
關於GET API更多的信息請查看REST GET
Operation Threading
在執行GET API的時候線程模型允許我們在同一個節點上面執行API的實際操作(也就是GET API是被分配在同一個服務器上面執行shard的)

默認operationThreaded是true也就是默認是多線程執行(在不同節點上面執行shard的),下面是一個將operationThreaded設置爲false的例子:

GetResponse response = client.prepareGet("twitter", "tweet", "1")
        .setOperationThreaded(false)
        .get();

Delete API

delete API允許我們在特定索引中通過id刪除JSON document。

DeleteResponse response = client.prepareDelete(“twitter”, “tweet”, “1”).get();
For more information on the delete operation, check out the delete API docs.
Operation Threading
和GET API一樣,默認是true。

DeleteResponse response = client.prepareDelete("twitter", "tweet", "1")
        .setOperationThreaded(false)
        .get();

UPDATE API

你可以創建一個UpdateRequest發送給客戶端

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("type");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
        .startObject()
            .field("gender", "male")
        .endObject());
client.update(updateRequest).get();

或者使用prepareUpdate()方法:

client.prepareUpdate("ttl", "doc", "1")   //1
        .setScript(new Script("ctx._source.gender = \"male\""  , ScriptService.ScriptType.INLINE, null, null))
        .get();

client.prepareUpdate("ttl", "doc", "1") //2
        .setDoc(jsonBuilder()               
            .startObject()
                .field("gender", "male")
            .endObject())
        .get();

腳本也可以是一個本地文件名稱,這時候將使用如下這個參數
ScriptService.ScriptType.FILE

第二個種寫法表示將被合併到現有的文檔之中。
Note that you can’t provide both script and doc.

Update by script 通過腳本更新

UpdateRequest updateRequest = new UpdateRequest("ttl", "doc", "1")
        .script(new Script("ctx._source.gender = \"male\""));
client.update(updateRequest).get();

更新合併到現有document中

The update API also support passing a partial document, which will be merged into the existing document (simple recursive merge, inner merging of objects, replacing core “keys/values” and arrays). For example:
通過簡單的內部遞歸合併到現有文檔,核心概念是“keys/values”數組

UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
        .doc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject());
client.update(updateRequest).get();

插入

也支持插入操作,如果文件已經存在,插入元素的內容將作爲document的新元素加入到doc中:

IndexRequest indexRequest = new IndexRequest("index", "type", "1")
        .source(jsonBuilder()
            .startObject()
                .field("name", "Joe Smith")
                .field("gender", "male")
            .endObject());
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
        .doc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject())
        .upsert(indexRequest);              
client.update(updateRequest).get();

如果文檔不存在則indexRequest執行添加操作
If the document index/type/1 already exists, we will have after this operation a document like:
如果文檔index/type/1已經存在,我們將會將原有文檔更新成如下:

{
    "name"  : "Joe Dalton",
    "gender": "male"        
}

This field is added by the update request
如果上面文件不存在那我們將會新建一個document:

{
    "name" : "Joe Smith",
    "gender": "male"
}

Multi Get API 多維查詢

The multi get API allows to get a list of documents based on their index, type and id:
多維查詢允許獲取一個基於index、type和id的的列表

MultiGetResponse multiGetItemResponses = client.prepareMultiGet()
    .add("twitter", "tweet", "1")           
    .add("twitter", "tweet", "2", "3", "4") 
    .add("another", "type", "foo")          
    .get();

for (MultiGetItemResponse itemResponse : multiGetItemResponses) { 
    GetResponse response = itemResponse.getResponse();
    if (response.isExists()) {                      
        String json = response.getSourceAsString(); 
    }
}

get by a single id
or by a list of ids for the same index / type
you can also get from another index
iterate over the result set
you can check if the document exists
access to the _source field

Bulk API 批操作

import static org.elasticsearch.common.xcontent.XContentFactory.*;

BulkRequestBuilder bulkRequest = client.prepareBulk();

// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "tweet", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        );

bulkRequest.add(client.prepareIndex("twitter", "tweet", "2")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "another post")
                    .endObject()
                  )
        );

BulkResponse bulkResponse = bulkRequest.get();
if (bulkResponse.hasFailures()) {
    // process failures by iterating through each bulk response item
}

Using Bulk Processor

BulkProcessor類提供了一個簡單的接口,自動刷新基於數量(number)或大小(size)請求的批操作,或者在給定時期。

import org.elasticsearch.action.bulk.BulkProcessor;
import org.elasticsearch.common.unit.ByteSizeUnit;
import org.elasticsearch.common.unit.ByteSizeValue;
import org.elasticsearch.common.unit.TimeValue;

BulkProcessor bulkProcessor = BulkProcessor.builder(
        client,   //1
        new BulkProcessor.Listener() {
            @Override
            public void beforeBulk(long executionId,
                                   BulkRequest request) { ... }  //2

            @Override
            public void afterBulk(long executionId,
                                  BulkRequest request,
                                  BulkResponse response) { ... }  //3

            @Override
            public void afterBulk(long executionId,
                                  BulkRequest request,
                                  Throwable failure) { ... }  //4
        })
        .setBulkActions(10000)  //5
        .setBulkSize(new ByteSizeValue(1, ByteSizeUnit.GB)) //6
        .setFlushInterval(TimeValue.timeValueSeconds(5)) //7
        .setConcurrentRequests(1) //8
        .build();

解釋:
1. Add your elasticsearch client
2. This method is called just before bulk is executed. You can for example see the numberOfActions with request.numberOfActions()
3. This method is called after bulk execution. You can for example check if there was some failing requests with response.hasFailures()
4. This method is called when the bulk failed and raised a Throwable
5. We want to execute the bulk every 10 000 requests
6. We want to flush the bulk every 1gb
7. We want to flush the bulk every 5 seconds whatever the number of requests
8. Set the number of concurrent requests. A value of 0 means that only a single request will be allowed to be executed. A value of 1 means 1 concurrent request is allowed to be executed while accumulating new bulk requests.

Then you can simply add your requests to the BulkProcessor:

bulkProcessor.add(new IndexRequest(“twitter”, “tweet”, “1”).source(/* your doc here */));
bulkProcessor.add(new DeleteRequest(“twitter”, “tweet”, “2”));

By default, BulkProcessor:
- sets bulkActions to 1000
- sets bulkSize to 5mb
- does not set flushInterval
- sets concurrentRequests to 1

When all documents are loaded to the BulkProcessor it can be closed by using awaitClose or close methods:

bulkProcessor.awaitClose(10, TimeUnit.MINUTES);

or

bulkProcessor.close();

Both methods flush any remaining documents and disable all other scheduled flushes if they were scheduled by setting flushInterval. If concurrent requests were enabled the awaitClose method waits for up to the specified timeout for all bulk requests to complete then returns true, if the specified waiting time elapses before all bulk requests complete, false is returned. The close method doesn’t wait for any remaining bulk requests to complete and exists immediately.

Search APIs 搜索API

The search API allows one to execute a search query and get back search hits that match the query. It can be executed across one or more indices and across one or more types. The query can provided using the query Java API. The body of the search request is built using the SearchSourceBuilder. Here is an example:
搜索API允許執行一個查詢並且返回相關匹配到的結果。它可以跨越一個或者多個indices或者多個types。查詢可以使用Java的查詢API。搜索請求的主體(body)使用SearchSourceBuilder。這是一個例子:

import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchType;
import org.elasticsearch.index.query.QueryBuilders.*;
SearchResponse response = client.prepareSearch("index1", "index2")
        .setTypes("type1", "type2")
        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
        .setQuery(QueryBuilders.termQuery("multi", "test"))                 // Query
        .setPostFilter(QueryBuilders.rangeQuery("age").from(12).to(18))     // Filter
        .setFrom(0).setSize(60).setExplain(true)
        .execute()
        .actionGet();

請注意,所有參數都是可選的,如果是最簡單的搜索你可以這麼寫:

// MatchAll on the whole cluster with all default options
SearchResponse response = client.prepareSearch().execute().actionGet();

* Note *

Although the Java API defines the additional search types QUERY_AND_FETCH and DFS_QUERY_AND_FETCH, these modes are internal optimizations and should not be specified explicitly by users of the API.

For more information on the search operation, check out the REST search docs.

Using scrolls in Java

Read the scroll documentation first!

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .setSearchType(SearchType.SCAN)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
while (true) {

    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }
    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
    //Break condition: No hits are returned
    if (scrollResp.getHits().getHits().length == 0) {
        break;
    }
}

### MultiSearch API ###
多條件查詢
See MultiSearch API Query documentation

SearchRequestBuilder srb1 = node.client()
    .prepareSearch().setQuery(QueryBuilders.queryStringQuery("elasticsearch")).setSize(1);
SearchRequestBuilder srb2 = node.client()
    .prepareSearch().setQuery(QueryBuilders.matchQuery("name", "kimchy")).setSize(1);

MultiSearchResponse sr = node.client().prepareMultiSearch()
        .add(srb1)
        .add(srb2)
        .execute().actionGet();

// You will get all individual responses from MultiSearchResponse#getResponses()
long nbHits = 0;
for (MultiSearchResponse.Item item : sr.getResponses()) {
    SearchResponse response = item.getResponse();
    nbHits += response.getHits().getTotalHits();
}

Using Aggreations 聚合查詢

The following code shows how to add two aggregations within your search:

SearchResponse sr = node.client().prepareSearch()
    .setQuery(QueryBuilders.matchAllQuery())
    .addAggregation(
            AggregationBuilders.terms("agg1").field("field")
    )
    .addAggregation(
            AggregationBuilders.dateHistogram("agg2")
                    .field("birth")
                    .interval(DateHistogramInterval.YEAR)
    )
    .execute().actionGet();

// Get your facet results
Terms agg1 = sr.getAggregations().get("agg1");
DateHistogram agg2 = sr.getAggregations().get("agg2");

See Aggregations Java API documentation for details.

Terminate After

The maximum number of documents to collect for each shard, upon reaching which the query execution will terminate early. If set, you will be able to check if the operation terminated early by asking for isTerminatedEarly() in the SearchResponse onject:
設置當查詢達到最多多少記錄時候可以終止查詢。如果你設置了,那麼你將能夠檢查這個操作在達到isTerminatedEarly()時候終止SearchResponse操作。

SearchResponse sr = client.prepareSearch(INDEX)
    .setTerminateAfter(1000)    
    .get();

if (sr.isTerminatedEarly()) {
    // We finished early
}

以上代碼表示在1000行以後將提前結束

Count API Count API

Aggregations 聚合API

Percolate API 過濾API

Query DSL 查詢DSL

Indexde Scripts API 索引腳本API

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章