自定義的NiFi Processor的步驟

1.環境要求

1.1 NiFi是使用java編寫的,所以需要JDK

1.2 maven中需要的項目依賴

<dependencies>
    <dependency>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-api</artifactId>
        <version>${nifi.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-utils</artifactId>
        <version>${nifi.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-processor-utils</artifactId>
        <version>${nifi.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.nifi</groupId>
        <artifactId>nifi-mock</artifactId>
        <version>${nifi.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
        <scope>test</scope>
    </dependency>
      <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>1.7.12</version>
        <scope>test</scope>
     </dependency>
</dependencies>

1.2.1 nifi-api
1.2.2 nifi-utils
1.2.3 提供Process抽象類接口的nifi-processor-utils
1.2.4 測試的nifi-mock以及junit
1.2.5 ?

好像還需要??plugin提供了一個將類打包成nifi組件的nar包打包方式(類似於war包),打包部分需要nifi-api依賴,其他組件在之後可以看到對應的作用。

1.3 idea下進行開發

(網上有些方法是使用命令行搭建項目骨架,我操作的時候發現存在一些error,所以還是在IDEA下操作吧,方便簡單)

2.Developing

2.1 new處理器的文件

在/src/main/resources/META-INF/services/目錄下new一個文件org.apache.nifi.processor.Processor,這個類似於配置文件,指向自定義的Processor所在的位置,如:

rocks.nifi.examples.processors.JsonProcessor

2.2 new一個自定義的processor

Define a simple java class as defined in the setup process 如:(rocks.nifi.examples.processors.JsonProcessor)

2.2.1 Apache Nifi Processor Header

//不需要關注上下文
@SideEffectFree

//processor的標籤
@Tags({"JSON","SHA0W.PUB"})

//processor的備註
@CapabilityDescription("Fetch value from json path.")

//Finally most processors will just extend the AbstractProcessor, for more complicated tasks it may be required to go a level deeper for the AbstractSessionFactoryProcessor.
public class JsonProcessor extends AbstractProcessor{
}

2.2.2 Variable Declaration

爲processor添加properties,Relationship.There is a large selection of validators in nifi-processor-utils package in the offical developer guide.

//properties用於存儲這個processor中配置了的配置參數
private List<PropertyDescriptor> properties;
//relationship用於存儲這個processor中配置的數據去向關係。
private Set<Relationship> relationships;

public static final String MATCH_ATTR = "match";

public static final PropertyDescriptor JSON_PATH = new PropertyDescriptor.Builder()
        // 參數名,輸入框前展示的內容
        .name("Json Path")
        // 是否必填
        .required(true)
        // 添加過濾器
        .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
        //內容添加完成後構建
        .build();

public static final Relationship SUCCESS = new Relationship.Builder()
        .name("SUCCESS")
        .description("Succes relationship")
        .build();
        
//多個選項型的屬性值定義如下
public static final AllowableValue EXTENSIVE = new AllowableValue("Extensive", "Extensive",
    "Everything will be logged - use with caution!");

public static final PropertyDescriptor LOG_LEVEL = new PropertyDescriptor.Builder()
  .name("Amount to Log")
  .description("How much the Processor should log")
  .allowableValues(REGULAR, VERBOSE, EXTENSIVE)
  .defaultValue(REGULAR.getValue())
  ...
  .build();

2.2.3 Apache Nifi Init

The init function is called at the start of Apache Nifi. Remember that this is a highly multi-threaded environment and be careful what you do in this space. This is why both the list of properties and the set of relationships are set with unmodifiable collections. I put the getters for the properties and relationships here as well.兩個get方法主要用於頁面正常顯示
init主要用於加載processor中定義的Relationship和PropertyDescriptor

@Override
public void init(final ProcessorInitializationContext context){
    List<PropertyDescriptor> properties = new ArrayList<>();
    properties.add(JSON_PATH);
    // 防止多線程ADD
    this.properties = Collections.unmodifiableList(properties);
    Set<Relationship> relationships = new HashSet<>();
    relationships.add(SUCCESS);
    this.relationships = Collections.unmodifiableSet(relationships);
}

//兩個get方法主要用於頁面正常顯示
@Override
public Set<Relationship> getRelationships(){
    return relationships;
}

@Override
public List<PropertyDescriptor> getSupportedPropertyDescriptors(){
    return properties;
}

2.2.4 The onTrigger method

The onTrigger method is called when ever a flow file is passed to the processor. For more details on the context and session variables please again refer to the official developer guide.處理單位是fowfile,當每個數據流碎片來到時,具體要執行什麼樣的操作,需要根據這個方法來判斷,負責實現業務邏輯的方法:

 @Override
    public void onTrigger(ProcessContext processContext, ProcessSession processSession) throws ProcessException {
      
        final AtomicReference<String> value = new AtomicReference<>();

		//我們首先需要根據session來獲取到要處理的flowfile
        FlowFile flowFile = processSession.get();

        //read(FlowFile, InputStream),read方法用於讀取flow中的內容
        //write(FlowFile, OutputStream),write方法用於向flow中寫數據
        //write(flowfile,processorStream),同時處理輸入和輸出,所有的操作基本都放在了函數的回調方法中。數據處理完成後,需要根據處理結果的不同,將處理結果分發出去。所以第三種方法只適合業務邏輯以及代碼較爲簡單的處理組件。
        //對於業務邏輯比較複雜的processor,儘量選擇使用先讀取數據,之後處理數據,然後重新回寫數據的形式,inputstreamcallback和oitputstreamcallback都需要用到,以減少針對flowfile讀寫的消耗
       
        //read方法用於讀取flow中的內容
        processSession.read(flowFile, in -> {
            try{
                String json = IOUtils.toString(in);
                String result = JsonPath.read(json, "$.hello");
                value.set(result);
            }catch(Exception ex){
                ex.printStackTrace();
                getLogger().error("Failed to read json string.");
            }
        });
        
        // Write the results to an attribute,write方法用於向flow中寫數據
        String results = value.get();
        if(results != null && !results.isEmpty()){
            flowFile = processSession.putAttribute(flowFile, "match", results);
        }

        // To write the results back out ot flow file
        flowFile = processSession.write(flowFile, out -> out.write(value.get().getBytes()));
        
        //Finally every flow file that is generated needs to be deleted or transfered.
        processSession.transfer(flowFile, SUCCESS);

    }

In general you pull the flow file out of session. Read and write to the flow files and add attributes where needed. To work on flow files nifi provides 3 callback interfaces.

2.2.5 InputStreamCallback

For reading the contents of the flow file through a input stream.

session.read(flowfile, new InputStreamCallback() {
    @Override
    public void process(InputStream in) throws IOException {
        try{
            //Using Apache Commons to read the input stream out to a string.
            String json = IOUtils.toString(in);
            //Use JsonPath to attempt to read the json and set a value to the pass on.
            String result = JsonPath.read(json, "$.hello");
            value.set(result);
        }catch(Exception ex){
            // It would normally be best practice in the case of a exception to pass the original flow file to a Error relation point in the case of an exception.
            ex.printStackTrace();
            getLogger().error("Failed to read json string.");
        }
    }
});  

2.2.6 OutputStreamCallback

For writing to a flowfile, this will over write not concatenate.We simply write out the value we recieved in the InputStreamCallback

flowfile = session.write(flowfile, new OutputStreamCallback() {
    @Override
    public void process(OutputStream out) throws IOException {
        out.write(value.get().getBytes());
    }
});

2.2.7 StreamCallback

This is for both reading and writing to the same flow file. With both the outputstreamcallback and streamcall back remember to assign it back to a flow file. This processor is not in use in the code and could have been. The choice was deliberate to show a way of moving data out of callbacks and back in.

flowfile = session.write(flowfile, new OutputStreamCallback() {
    @Override
     public void process(OutputStream out) throws IOException {
        out.write(value.get().getBytes());
    }
});

2.3 Test

應該先在項目裏測試看是否符合設計規範

3.Deployment

3.1 打包

在文件路徑下,進入命令行界面,並執行mvn clean install命令

3.2 上傳

找到[INFO] Installing D:\ideaSpace\nifi-1.3.0\self-define\first-processors\nifi-demo-nar\target\nifi-demo-nar-1.0.nar to D:\SoftWares\apache-maven-3.2.3\repo\first\nifi-demo-nar\1.0\nifi-demo-nar-1.0.nar中的nifi-demo-nar-1.0.nar
將nar後綴的文件上傳至Nifi的服務器的lib目錄下

3.3 重啓Nifi進入UI

在UI界面下就可以使用了!

參考以下:
https://blog.csdn.net/mianshui1105/article/details/75313480
https://blog.csdn.net/larygry/article/details/89092573
https://blog.csdn.net/yitengtongweishi/article/details/88807934
https://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
https://blog.csdn.net/yitengtongweishi/article/details/88807934
https://blog.csdn.net/mianshui1105/article/details/75313480

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章