文章目錄

elasticsearch plugin的action

總結

elasticsearch plugin的action

logstash提供了多達40多種的output plugin用於將處理後的數據輸出到下游系統。其中最爲常見的輸出端自然是elasticsearch plugin。而elasticsearch plugin提供了3種操作類型，分別是index，create，update。

對於這三種action，官方文檔上的解釋如下：

Value type is string
Default value is “index”
Protocol agnostic (i.e. non-http, non-java specific) configs go here Protocol agnostic methods The Elasticsearch action to perform. Valid actions are:

index: indexes a document (an event from Logstash).
delete: deletes a document by id (An id is required for this action)
create: indexes a document, fails if a document by that id already exists in the index.
update: updates a document by id. Update has a special case where you can upsert — update a document if not already present. See the upsert option. NOTE: This does not work and is not supported in Elasticsearch 1.x. Please upgrade to ES 2.x or greater to use this feature with Logstash!
A sprintf style string to change the action based on the content of the event. The value %{[foo]} would use the foo field for the action

For more details on actions, check out the Elasticsearch bulk API documentation

這段解釋沒有針對不同的情況給予詳細的說明，因此，以下問題讓人費解：

index與document_id 的關係
index與create、update的異同

而讓人不知道該選擇何種類型的操作。在這裏，我們進行深入的解釋

create

相當於：

POST _bulk
{ "create" : { "_index" : "<index>", "_id" : "<id>" } }
{ "<field>" : "<value>" }

其具體行爲是：

必須指定document_id ，若不指定，直接返回失敗
如果 document_id 在elasticsearch中不存在，創建一條新的文檔，使用document_id 作爲該文檔的_id。
如果 document_id 在elasticsearch中存在，直接返回失敗

只適合用在像是用戶創建之後就不能再更新的場景。

index

index是action的默認值。
相當於：

POST _bulk
{ "index" : { "_index" : "<index>", "_id" : "<id>" } }
{ "<field>" : "<value>" }

或

POST _bulk
{ "index" : { "_index" : "<index>" } }
{ "<field>" : "<value>" }

其具體行爲是：

沒有document_id 的情況下，會將數據索引到elasticsearch當中，並且由elasticsearch生成該文檔的_id。因爲沒有指定document_id ，重複的數據會生成多條內容相同但_id不同的文檔。
有document_id 的情況下：
- 如果 document_id 在elasticsearch中不存在，創建一條新的文檔，使用document_id 作爲該文檔的_id。
- 如果 document_id 在elasticsearch中存在，直接更新該文檔。

注意：如果指定了document_id ，會造成寫入的效率降低，因爲額外增加了查詢該document_id 是否存在的過程。因此，在日誌分析等只寫入不更新的場景，就不要嘗試自己去指定id。

update

相當於:

POST _bulk
{ "update" : { "_index" : "<index>", "_id" : "<id>" } }
{ "<field>" : "<value>" }

其具體行爲是：

必須指定document_id ，若不指定，直接無法啓動pipeline
有document_id 的情況下：
- 如果 document_id 在elasticsearch中不存在，創建一條新的文檔，使用document_id 作爲該文檔的_id。
  - doc_as_upsert爲true，使用event的值做爲文檔的值
  - scripted_upsert爲true，使用script作爲文檔的值
  - 其餘情況，使用upsert作爲文檔的值
- 如果 document_id 在elasticsearch中存在，直接更新該文檔。

doc_as_upsert

input {
  stdin {
    add_field => {
      id => "my_id"
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "logstash-%{+YYYY.MM.dd}"
    document_id => "%{id}"
    action => "update"
    doc_as_upsert => true
    # 不能同時設置 doc_as_upsert => ture和 upsert
    # upsert => '{"message":"hello"}'
  }
  stdout {}
}

在標準輸入輸入“aaaaa”，文檔內容爲：

{
  "_index" : "logstash-2019.09.24-000001",
  "_type" : "_doc",
  "_id" : "my_id",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "host" : "lexlideMacBook-Pro.local",
    "id" : "my_id",
    "@timestamp" : "2019-09-24T09:52:41.661Z",
    "@version" : "1",
    "message" : "aaaaa"
  }
}

可以看到，這樣的邏輯其實是和index一樣的

upsert

input {
  stdin {
    add_field => {
      id => "my_id"
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "logstash-%{+YYYY.MM.dd}"
    document_id => "%{id}"
    action => "update"
    # 不能同時設置 doc_as_upsert => ture和 upsert
    #doc_as_upsert => true
    upsert => '{"message":"hello"}'
  }
  stdout {}
}

在標準輸入輸入“aaaaa”，文檔內容爲：

{
  "_index" : "logstash-2019.09.24-000001",
  "_type" : "_doc",
  "_id" : "my_id",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "message" : "hello"
  }
}

這種情況下，可以爲id不存在的文檔指定自定義的數據內容

異常

如果都不設置，則會拋出異常：

[2019-09-24T17:59:26,671][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>404, :action=>["update", {:_id=>"my_id", :_index=>"logstash", :_type=>"_doc", :routing=>nil, :retry_on_conflict=>1}, #<LogStash::Event:0x61b3501a>], :response=>{"update"=>{"_index"=>"logstash-2019.09.24-000001", "_type"=>"_doc", "_id"=>"my_id", "status"=>404, "error"=>{"type"=>"document_missing_exception", "reason"=>"[_doc][my_id]: document missing", "index_uuid"=>"XWAUuIwqTtmZMnnhnVnMkA", "shard"=>"0", "index"=>"logstash-2019.09.24-000001"}}}}

總結

總的來說，大部分情況下，我們直接使用默認的index操作就可以了，只在一些特殊的情況下使用別的action：

創建指定id的文檔，並且只能創建一次的場景下使用create
對指定id的文檔進行更新，並且在文檔不存在的情況下，需要使用自定義的數據內容而非原始數據內容的，使用update

logstash的elasticsearch output plugin：不同action的區別（index/create/update）

文章目錄

elasticsearch plugin的action

create

index

update

doc_as_upsert

upsert

異常

總結

解鎖 Elastic 最新的數據採集模塊 - Ingest manager 和 Elastic Agent

Elastic Stack超實用技巧 5分鐘教你玩轉各種場景

滴滴基於 ElasticSearch 的一站式搜索中臺實踐（轉）

滴滴 Elasticsearch 多集羣架構實踐（轉）

Elasticsearch開發進階指南——如何選擇合適的ES版本

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結