ElasticSearch版本：5.4.5

Ingest

在ES中，Ingest的存在主要是爲了對數據進行預處理，其大概的工作流程如下：

預先定義若干的pipeline，分別對其進行配置，其中每個pipeline都會設定若干的processor，而在processor中定義瞭如何對數據進行處理。
點接收到數據之後，根據請求參數中指定的管道流 id，找到對應的已註冊管道流，對數據進行處理，然後將處理過後的數據，按照 Elasticsearch 標準的 indexing 流程繼續運行。

管道和索引是獨立的，並且管道的配置數據會保存在ClusterState中，可以通過state接口進行查詢。

給大家一個例子，畢竟發現網上很多例子都是跑不起來的:

# 建立管道
curl -XPUT localhost:9200/_ingest/pipeline/my-pipeline-id?pretty -d '{
    "description" : "describe pipeline",
    "processors" : [
        {
            "convert" : {
                "field": "foo",
                "type": "integer"
            }
        }
    ]
}'

# 測試管道
curl -XPOST localhost:9200/_ingest/pipeline/_simulate -d '
{
    "pipeline" : {
        "description" : "describe pipeline",
        "processors" : [
            {
                "set" : {
                    "field": "foo",
                    "value": "bar"
                }
            }
        ]
    },
    "docs" : [
        {
            "_index": "index",
            "_type": "type",
            "_id": "id",
            "_source": {
                "foo" : "bar"
            }
        }
    ]
}'

探究Processor

A processor implementation may modify the data belonging to a document.
Whether changes are made and what exactly is modified is up to the implementation.
之前也提到Processor的作用是對原始數據進行預處理（新增、獲取、刪除、模擬仿真），在ES中給我們提供了多達20多種不同的Processor，根據業務需求去研究。接下來探究以下Porcessor接口。

interface Processor

定義了每個Processor都必須實現的接口。

    //數據處理的主要流程
    void execute(IngestDocument ingestDocument) throws Exception;
    // 獲取processor的類型
    String getType();
    // 獲取processor的標籤
    String getTag();
   //  通過factory來獲取processor
    interface Factory {
        /**
         * Creates a processor based on the specified map of maps config.
         *
         * @param processorFactories Other processors which may be created inside this processor
         * @param tag The tag for the processor
         * @param config The configuration for the processor
         *
         * <b>Note:</b> Implementations are responsible for removing the used configuration keys, so that after
         * creating a pipeline ingest can verify if all configurations settings have been used.
         */
        Processor create(Map<String, Processor.Factory> processorFactories, String tag,
                         Map<String, Object> config) throws Exception;
    }

IngestDocument

Represents a single document being captured before indexing and holds the source and metadata (like id, type and index
Processor的處理對象，存儲了doc的數據以及元數據。主要包含了兩個對象：

Map<String, Object> sourceAndMetadata
Map<String, Object> ingestMetadata

在sourceAndMetadata除了doc的source數據還包含一些其他的狀態信息，比如說

index
type
id
routing: 可以省略
parent: 可以省略
timestamp: 可以省略
ttl: 可以省略

對於sourceAndMetadata, ingestDocument中提供了若干接口用於對原始數據進行操作：

removeField
appendFieldValue
setFieldValue
getFieldValue

# 示例
# 在原始數據中添加field
document.setFieldValue(%targetField%, %newValue%);

ElasticSearch源碼-探索ingest的processor源碼

Ingest

探究Processor

interface Processor

IngestDocument

C語言學習-探索編譯過程

Lucene隨筆-Lucene的索引文件格式

Lucene隨筆-聊聊IndexWriter

Lucene隨筆-關於double類型轉換成Long

ElasicSearch源碼-集羣啓動

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結