spark12--ElasticSearch安裝, 插件, curl操作, Java操作

文章目錄

一 ElasticSearch介紹

Elasticsearch 是一個開源的搜索引擎，建立在一個全文搜索引擎庫 Apache Lucene™ 基礎之上。 Lucene 可以說是當下最先進、高性能、全功能的搜索引擎庫–無論是開源還是私有。

但是 Lucene 僅僅只是一個庫。爲了充分發揮其功能，你需要使用 Java 並將 Lucene 直接集成到應用程序中。更糟糕的是，您可能需要獲得信息檢索學位才能瞭解其工作原理。Lucene 非常複雜。

Elasticsearch 也是使用 Java 編寫的，它的內部使用 Lucene 做索引與搜索，但是它的目的是使全文檢索變得簡單，通過隱藏 Lucene 的複雜性，取而代之的提供一套簡單一致的 RESTful API。

然而，Elasticsearch 不僅僅是 Lucene，並且也不僅僅只是一個全文搜索引擎。它可以被下面這樣準確的形容：

一個分佈式的實時文檔存儲，每個字段可以被索引與搜索
一個分佈式實時分析搜索引擎
能勝任上百個服務節點的擴展，並支持 PB 級別的結構化或者非結構化數據

二 ElasticSearch安裝運行

2.1 linux 安裝

解壓

tar -zxvf elasticsearch

在前臺(foregroud)[或後臺]啓動 Elasticsearch

./bin/elasticsearch [-d] // 如果是後臺啓動可以加參數-d

測試 Elasticsearch 是否啓動成功

curl 'http://localhost:9200/?pretty'

2.2 Windows安裝

使用解壓軟件解壓到文件夾, 推薦使用WinRAR
進入到解壓的路徑的bin目錄下, 雙擊elasticsearch.bat, 看到以下界面
進入網頁測試是否安裝成功

2.3 Windows安裝可視化插件

2.3.1 方案一：聯網的情況下，可以使用plugin命令。

elasticsearch/bin/plugin.bat -install mobz/elasticsearch-head
運行es
打開http://localhost:9200/_plugin/head/

2.3.2 方案二：可以直接在git上下載源碼到本地運行。

在地址欄輸入es服務器的ip地址和端口，點connect就可以連接到集羣。下面是連接後的視圖。這是主界面，在這裏可以看到es集羣的基本信息（如：節點情況，索引情況）

2.4 Windows安裝curl

第一步：工具下載：http://curl.haxx.se/download.html

第二步: 解壓文件, 爲了方便運行，不出現中文路徑。
第二步：安裝
【使用方式一】：在curl.exe目錄中使用
解壓下載後的壓縮文件，通過cmd命令進入到curl.exe所在的目錄。
進入到該目錄後，執行curl --help測試：
【使用方式二】：放置在system32中
解壓下載好的文件，拷貝curl.exe文件到C:\Windows\System32
然後就可以在DOS窗口中任意位置，使用curl命令了。

【使用方式三】：配置環境變量（推薦）
在系統高級環境變量中，配置
CURL_HOME ----- “你的curl目錄位置”
path ---- 末尾添加 “;%CURL_HOME%;”
這樣與上面方式二的效果相同。

三 curl操作ElasticSearch

3.1 創建一個索引

Elasticsearch 命令的一般格式是：REST VERBHOST:9200/index/doc-type— 其中 REST VERB 是 PUT、GET 或DELETE。（使用 curlL -X 動詞前綴來明確指定 HTTP 方法。）
要創建一個索引，可在你的 shell 中運行以下命令：

curl -XPUT "http://localhost:9200/blog01/"

儘管 Elasticsearch 是無模式的，但它在幕後使用了 Lucene，後者使用了模式。不過 Elasticsearch 爲你隱藏了這種複雜性。實際上，你可以將 Elasticsearch 文檔類型簡單地視爲子索引或表名稱。但是，如果你願意，可以指定一個模式，所以你可以將它視爲一種模式可選的數據存儲。

3.2 插入一個文檔

要在 /blog01 索引下創建一個類型，可插入一個文檔。
要將包含 “Deck the Halls” 的文檔插入索引中，可運行以下命令：

curl -XPUT "http://localhost:9200/blog01/article/1" -d  "{"""id""": """1""", """title""": """Whatiselasticsearch"""}"

前面的命令使用 PUT 動詞將一個文檔添加到 /article文檔類型，併爲該文檔分配 ID 爲1。URL 路徑顯示爲index/doctype/ID（索引/文檔類型/ID）。

3.3 查看文檔

要查看該文檔，可使用簡單的 GET 命令：

curl -XGET "http://localhost:9200/blog01/article/1"

Elasticsearch 使用你之前 PUT 進索引中的 JSON 內容作爲響應：

3.4 更新文檔

如果你認識到title字段寫錯了，並想將它更改爲 Whatislucene 怎麼辦？可運行以下命令來更新文檔：

curl -XPUT "http://localhost:9200/blog01/article/1" -d "{"""id""": """1""", """title""": """Whatislucene"""}"

因爲此命令使用了相同的唯一 ID爲1，所以該文檔會被更新。

3.5 搜索文檔

是時候運行一次基本查詢了，此查詢比你運行來查找 “Get the Halls” 文檔的簡單 GET 要複雜一些。文檔 URL 有一個內置的 _search 端點用於此用途。在標題中找到所有包含單詞 lucene 的數據：

curl -XGET "http://localhost:9200/blog01/article/_search?q=title:'Whatislucene'"

3.6 檢查搜索返回對象

上圖中給出了 Elasticsearch 從前面的查詢返回的數據。
在結果中，Elasticsearch 提供了多個 JSON 對象。第一個對象包含請求的元數據：看看該請求花了多少毫秒 (took) 和它是否超時 (timed_out)。_shards 字段需要考慮 Elasticsearch 是一個集羣化服務的事實。甚至在這個單節點本地部署中，Elasticsearch 也在邏輯上被集羣化爲分片。在往後看可以觀察到 hits 對象包含：
• total 字段，它會告訴你獲得了多少個結果
• max_score，用於全文搜索
• 實際結果
實際結果包含 fields 屬性，因爲你將 fields 參數添加到了查詢中。否則，結果中會包含 source，而且包含完整的匹配文檔。_index、_type 和 _id 分別表示索引、文檔類型、ID；_score 指的是全文搜索命中長度。這 4 個字段始終會在結果中返回。

3.7 刪除文檔

暫時不要刪除該文檔，知道如何刪除它就行了：

curl -XDELETE "http://localhost:9200/blog01/article/1"

四使用Java操作客戶端（入門）

Elasticsearch 的 Java 客戶端非常強大；它可以建立一個嵌入式實例並在必要時運行管理任務。
運行一個 Java 應用程序和 Elasticsearch 時，有兩種操作模式可供使用。該應用程序可在 Elasticsearch 集羣中扮演更加主動或更加被動的角色。在更加主動的情況下（稱爲 Node Client），應用程序實例將從集羣接收請求，確定哪個節點應處理該請求，就像正常節點所做的一樣。（應用程序甚至可以託管索引和處理請求。）另一種模式稱爲 Transport Client，它將所有請求都轉發到另一個 Elasticsearch 節點，由後者來確定最終目標。

4.1 新建 maven項目, 基於maven的pom 導入依賴

 <dependencies>
  	<dependency>
  		<groupId>org.elasticsearch</groupId>
  		<artifactId>elasticsearch</artifactId>
  		<version>2.4.0</version>
  	</dependency>
  	<dependency>
  		<groupId>junit</groupId>
  		<artifactId>junit</artifactId>
  		<version>4.12</version>
  	</dependency>
  </dependencies>

當直接在ElasticSearch 建立文檔對象時，如果索引不存在的，默認會自動創建，映射採用默認方式

ElasticSearch 服務默認端口 9300
Web 管理平臺端口 9200

4.2 Java 操作

4.2.1 獲取客戶端

public class ESTest {
    private Client client;
	@Before
	public void getClient() throws Exception {
	    // 設置一個hashmap集合, 用來存儲配置信息
	    final HashMap<String, String> map = new HashMap<>();
	    // 配置集羣名字, 如果集羣名稱是elasticsearch, 則無需配置
	    map.put("cluster.name", "es");
	
	    final Settings.Builder settings = Settings.builder().put(map);
	// es 的javaapi提供的端口是9300
	    // 添加多個節點, 目的是爲了防止其中一個實例由於網絡出現問題導致傳輸失敗, 添加多個節點後在出現網絡問題後會自動啓動另一個節點
	    client = TransportClient.builder().settings(settings).build()
	            .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("node1"), 9300))
	            .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("node2"), 9300))
	            .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("node3"), 9300));
	}
}

4.2.2 獲取結果


@Test
public static void getResponse(SearchResponse searchResponse) {
    // 獲取搜索到的結果
    SearchHits hits = searchResponse.getHits();
    // 輸出檢索到的條數
    System.out.println("查詢到的數據有 " + hits.getTotalHits() + " 條");

    // 遍歷搜索到的結果
    Iterator<SearchHit> it = hits.iterator();
    while (it.hasNext()) {
        SearchHit searchHit = it.next();

        // 獲取系統id   _id
        System.out.println("系統id: " + searchHit.getId());
        // 獲取私有id
        System.out.println("私有id: " + searchHit.getSource().get("id"));

        // 打印一整行數據
        System.out.println("row: " + searchHit.getSourceAsString());

        // 獲取title
        System.out.println("title: " + searchHit.getSource().get("title"));
        // 獲取content
        System.out.println("content: " + searchHit.getSource().get("content"));
    }
}

4.2.3 寫入數據

4.2.3.1 使用json來創建文檔並寫入數據

@Test
public void createDoc_1() {
    // json字符串
    String source = "{\"id\":\"1\",\"title\":\"三緘其口\",\"content\":\"孔師收徒，算是震驚大陸的大事，別管是名師，還是什麼，只要是修煉者，獲此殊榮，必然會興奮的東南西北都找不着。\"}";
    // 創建文檔, 定義索引, 文檔類型, 主鍵id
    final IndexResponse indexResponse = client.prepareIndex("blog", "article", "1").setSource(source).get();

    // 獲取相應信息
    System.out.println("index: " + indexResponse.getIndex());
    System.out.println("type: " + indexResponse.getType());
    System.out.println("id: " + indexResponse.getId());
    System.out.println("version: " + indexResponse.getVersion());
    System.out.println("isCreated: : " + indexResponse.isCreated());
}

4.2.3.2 使用map插入數據

@Test
public void createDoc_2() {
    HashMap<String, Object> source = new HashMap<>();
    source.put("id", 2);
    source.put("title", "悲催的林琅");
    source.put("content", "兩年前，屠殺他滿門的時候，對方不過通玄境初期，本以爲就算進步，巔峯就是極限了，沒想到……和他一樣，也是宗師中期");

    IndexResponse indexResponse = client.prepareIndex("blog", "article", "2").setSource(source).get();

    // 獲取相應信息
    System.out.println("index: " + indexResponse.getIndex());
    System.out.println("type: " + indexResponse.getType());
    System.out.println("id: " + indexResponse.getId());
    System.out.println("version: " + indexResponse.getVersion());
    System.out.println("isCreated: : " + indexResponse.isCreated());
    client.close();
}

4.2.4 使用es 幫助類來實現插入數據

@Test
public void createDoc_3() throws Exception {
    XContentBuilder source = XContentFactory.jsonBuilder()
            .startObject()
            .field("id", 3)
            .field("title", "Apache Hadoop")
            .field("content", "The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.")
            .endObject();

    IndexResponse indexResponse = client.prepareIndex("blog", "article", "3").setSource(source).get();
    // 獲取相應信息
    System.out.println("index: " + indexResponse.getIndex());
    System.out.println("type: " + indexResponse.getType());
    System.out.println("id: " + indexResponse.getId());
    System.out.println("version: " + indexResponse.getVersion());
    System.out.println("isCreated: : " + indexResponse.isCreated());

    client.close();
}

4.2.5 搜索

4.2.5.1 搜索文檔, 單個索引

@Test
public void getData_1() {
    GetResponse getResponse = client.prepareGet("blog", "article", "1").get();
    System.out.println(getResponse.getSourceAsString());
    client.close();
}

4.2.5.2 搜索文檔, 多個索引

public void getData_2() {
    // 將需要檢索的數據依次使用add方法加入
    MultiGetResponse multiGetItemResponses = client.prepareMultiGet()
            .add("blog", "article", "1")
            .add("blog", "article", "2")
            .add("blog", "article", "3")
            .get();

    // 遍歷所有獲取到的數據
    for (MultiGetItemResponse multiGetItemRespons : multiGetItemResponses) {
        GetResponse response = multiGetItemRespons.getResponse();
        // 判斷數據是否爲空, 只有存在纔會打印
        if (response.isExists()) {
            System.out.println(response.getSourceAsString());
        }
    }
    client.close();
}

4.2.6 更新數據

4.2.6.1 使用update更新

@Test
public void updateData_1() throws Exception {
    UpdateRequest request = new UpdateRequest();
    request.index("blog");
    request.type("article");
    request.id("1");

    request.doc(XContentFactory.jsonBuilder()
            .startObject()
            .field("id", "1")
            .field("title", "Modules")
            .field("content", "The project includes these modules:\n" +
                    "\n" +
                    "Hadoop Common: The common utilities that support the other Hadoop modules.\n" +
                    "Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.\n" +
                    "Hadoop YARN: A framework for job scheduling and cluster resource management.\n" +
                    "Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.\n" +
                    "Hadoop Ozone: An object store for Hadoop")
            .endObject());

    client.update(request).get();
    client.close();
}

4.2.6.2 使用幫助類更新

@Test
public void updateData_2() throws Exception {
    // 使用幫助類實現更新數據
    client.update(new UpdateRequest("blog", "article", "2")
            .doc(XContentFactory.jsonBuilder()
                    .startObject()
                    .field("id", "2")
                    .field("title", "Who Uses Hadoop?")
                    .field("content", "A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page.")
                    .endObject()));
    client.close();
}

4.2.7 更新文檔數據

設置一個查詢條件, 查詢id, 如果查詢不到數據, 則添加, 如果查到, 則更新

@Test
public void updateData_3() throws Exception {
    IndexRequest source = new IndexRequest("blog", "article", "4")
            .source(XContentFactory.jsonBuilder()
                    .startObject()
                    .field("id", "4")
                    .field("title", "Getting Started")
                    .field("content", "The Hadoop documentation includes the information you need to get started using Hadoop. Begin with the Single Node Setup which shows you how to set up a single-node Hadoop installation. Then move on to the Cluster Setup to learn how to set up a multi-node Hadoop installation.")
                    .endObject());
    // 設置更新的數據
    UpdateRequest updateRequest = new UpdateRequest("blog", "article", "4")
            .doc(XContentFactory.jsonBuilder()
                    .startObject()
                    .field("title", "this is update")
                    .endObject())
            .upsert(source);

    client.update(updateRequest).get();
    client.close();
}

4.2.8 刪除文檔數據

@Test
public void delData() {
    client.prepareDelete("blog", "article", "4").get();
    client.close();
}

4.2.9 檢索

   /**
     * 檢索: SearchResponse, 支持各種查詢
     * 注意:
     * es自帶了分詞器, 對英文單詞有效果, 但是對於中文只能作爲一個分字器, 即不能將中文的詞彙進行分離
     * 所以, 如果想要實現對中文單詞進行分離, 需要用到第三方的分詞器, 現在主流的分詞器是IK
     */
@Test
public void search() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            .setQuery(QueryBuilders.queryStringQuery("this")).get();
    this.getResponse(searchResponse);

    client.close();
}

4.2.10 創建索引, 使用ik分詞器

@Test
public void createIndex() {
    // 創建索引
    client.admin().indices().prepareCreate("blog").get();
    // 刪除索引
    client.admin().indices().prepareDelete("blog").get();
}

4.2.11 創建映射器, 指定分詞器

@Test
public void createIndexMapping() throws Exception {
    XContentBuilder mappingBuilder = XContentFactory.jsonBuilder()
            .startObject()
            .startObject("article")
            .startObject("properties")
            .startObject("id")
            .field("type", "integer")
            .field("store", "yes")
            .endObject()
            .startObject("title")
            .field("type", "string")
            .field("store", "yes")
            .field("analyzer", "ik")
            .endObject()
            .startObject("content")
            .field("type", "string")
            .field("store", "yes")
            .field("analyzer", "ik")
            .endObject()
            .endObject()
            .endObject()
            .endObject();
    PutMappingRequest source = Requests.putMappingRequest("blog").type("article").source(mappingBuilder);
    client.admin().indices().putMapping(source).get();
    client.close();
}

4.2.12 查詢

4.2.12.1 查詢所有數據

@Test
public void queryAll() {
    SearchResponse searchResponse = client.prepareSearch("blog").setTypes("article").setQuery(QueryBuilders.matchAllQuery()).get();
    this.getResponse(searchResponse);
    client.close();
}

4.2.12.2 按條件查詢

@Test
public void queryString() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            .setQuery(QueryBuilders.queryStringQuery("悲催").field("title").field("content"))
            .get();

    this.getResponse(searchResponse);
    client.close();

}

4.2.12.3 通配符查詢

@Test
public void wildcardQuery() {
    SearchResponse searchResponse = client.prepareSearch("blog").setTypes("article")
            .setQuery(QueryBuilders.wildcardQuery("content", "震*"))
            .get();
    this.getResponse(searchResponse);
    client.close();
}

4.2.12.4 詞條查詢

@Test
public void termQuery() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            .setQuery(QueryBuilders.termQuery("title", "悲催"))
            .get();
    this.getResponse(searchResponse);
    client.close();
}

4.2.12.5 字段匹配查詢

@Test
public void fieldMatchQuery() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            // fuzziness 指的是匹配度
            .setQuery(QueryBuilders.matchQuery("title", "悲催").analyzer("ik").fuzziness(1))
            .get();
    this.getResponse(searchResponse);
    client.close();
}

4.2.12.6 id查詢

@Test
public void idQuery() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            .setQuery(QueryBuilders.idsQuery().ids("2", "4"))
            .get();
    this.getResponse(searchResponse);
    client.close();
}

4.2.12.7 相似度查詢

@Test
public void fuzzyQuery() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            .setQuery(QueryBuilders.fuzzyQuery("title", "updata"))
            .get();
    this.getResponse(searchResponse);
    client.close();
}

4.2.12.8 範圍查詢

@Test
public void rangeQuery() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            .setQuery(QueryBuilders.rangeQuery("id").gt(2).lte(3))
            .get();
    this.getResponse(searchResponse);
    client.close();
}

4.2.12.9 boolean查詢, 需要和其他的查詢結合在一起

@Test
public void boolQuery() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            .setQuery(QueryBuilders.boolQuery()
                    .must(QueryBuilders.termQuery("title", "update"))
                    .should((QueryBuilders.rangeQuery("id").from(1)).to(3)))
            .get();
    this.getResponse(searchResponse);
    client.close();
}

4.2.12.10 排序查詢, 需要指定降序排序或者升序排序

和bool查詢類似, 需要和其他查詢一起使用

@Test
public void sortQuery() {
    SearchResponse searchResponse = client.prepareSearch("blog")
            .setTypes("article")
            .setQuery(QueryBuilders.boolQuery()
                    .must(QueryBuilders.termQuery("title", "update"))
                    .must(QueryBuilders.rangeQuery("id").from(1).to(3)))
            .addSort("id", SortOrder.DESC)
            .get();
    this.getResponse(searchResponse);
    client.close();
}