文章目錄
1.環境要求
1.1 NiFi是使用java編寫的,所以需要JDK
1.2 maven中需要的項目依賴
<dependencies>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-api</artifactId>
<version>${nifi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-utils</artifactId>
<version>${nifi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-processor-utils</artifactId>
<version>${nifi.version}</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-mock</artifactId>
<version>${nifi.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.12</version>
<scope>test</scope>
</dependency>
</dependencies>
1.2.1 nifi-api
1.2.2 nifi-utils
1.2.3 提供Process抽象類接口的nifi-processor-utils
1.2.4 測試的nifi-mock以及junit
1.2.5 ?
好像還需要??plugin提供了一個將類打包成nifi組件的nar包打包方式(類似於war包),打包部分需要nifi-api依賴,其他組件在之後可以看到對應的作用。
1.3 idea下進行開發
(網上有些方法是使用命令行搭建項目骨架,我操作的時候發現存在一些error,所以還是在IDEA下操作吧,方便簡單)
2.Developing
2.1 new處理器的文件
在/src/main/resources/META-INF/services/目錄下new一個文件org.apache.nifi.processor.Processor,這個類似於配置文件,指向自定義的Processor所在的位置,如:
rocks.nifi.examples.processors.JsonProcessor
2.2 new一個自定義的processor
Define a simple java class as defined in the setup process 如:(rocks.nifi.examples.processors.JsonProcessor)
2.2.1 Apache Nifi Processor Header
//不需要關注上下文
@SideEffectFree
//processor的標籤
@Tags({"JSON","SHA0W.PUB"})
//processor的備註
@CapabilityDescription("Fetch value from json path.")
//Finally most processors will just extend the AbstractProcessor, for more complicated tasks it may be required to go a level deeper for the AbstractSessionFactoryProcessor.
public class JsonProcessor extends AbstractProcessor{
}
2.2.2 Variable Declaration
爲processor添加properties,Relationship.There is a large selection of validators in nifi-processor-utils package in the offical developer guide.
//properties用於存儲這個processor中配置了的配置參數
private List<PropertyDescriptor> properties;
//relationship用於存儲這個processor中配置的數據去向關係。
private Set<Relationship> relationships;
public static final String MATCH_ATTR = "match";
public static final PropertyDescriptor JSON_PATH = new PropertyDescriptor.Builder()
// 參數名,輸入框前展示的內容
.name("Json Path")
// 是否必填
.required(true)
// 添加過濾器
.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
//內容添加完成後構建
.build();
public static final Relationship SUCCESS = new Relationship.Builder()
.name("SUCCESS")
.description("Succes relationship")
.build();
//多個選項型的屬性值定義如下
public static final AllowableValue EXTENSIVE = new AllowableValue("Extensive", "Extensive",
"Everything will be logged - use with caution!");
public static final PropertyDescriptor LOG_LEVEL = new PropertyDescriptor.Builder()
.name("Amount to Log")
.description("How much the Processor should log")
.allowableValues(REGULAR, VERBOSE, EXTENSIVE)
.defaultValue(REGULAR.getValue())
...
.build();
2.2.3 Apache Nifi Init
The init function is called at the start of Apache Nifi. Remember that this is a highly multi-threaded environment and be careful what you do in this space. This is why both the list of properties and the set of relationships are set with unmodifiable collections. I put the getters for the properties and relationships here as well.兩個get方法主要用於頁面正常顯示
init主要用於加載processor中定義的Relationship和PropertyDescriptor
@Override
public void init(final ProcessorInitializationContext context){
List<PropertyDescriptor> properties = new ArrayList<>();
properties.add(JSON_PATH);
// 防止多線程ADD
this.properties = Collections.unmodifiableList(properties);
Set<Relationship> relationships = new HashSet<>();
relationships.add(SUCCESS);
this.relationships = Collections.unmodifiableSet(relationships);
}
//兩個get方法主要用於頁面正常顯示
@Override
public Set<Relationship> getRelationships(){
return relationships;
}
@Override
public List<PropertyDescriptor> getSupportedPropertyDescriptors(){
return properties;
}
2.2.4 The onTrigger method
The onTrigger method is called when ever a flow file is passed to the processor. For more details on the context and session variables please again refer to the official developer guide.處理單位是fowfile,當每個數據流碎片來到時,具體要執行什麼樣的操作,需要根據這個方法來判斷,負責實現業務邏輯的方法:
@Override
public void onTrigger(ProcessContext processContext, ProcessSession processSession) throws ProcessException {
final AtomicReference<String> value = new AtomicReference<>();
//我們首先需要根據session來獲取到要處理的flowfile
FlowFile flowFile = processSession.get();
//read(FlowFile, InputStream),read方法用於讀取flow中的內容
//write(FlowFile, OutputStream),write方法用於向flow中寫數據
//write(flowfile,processorStream),同時處理輸入和輸出,所有的操作基本都放在了函數的回調方法中。數據處理完成後,需要根據處理結果的不同,將處理結果分發出去。所以第三種方法只適合業務邏輯以及代碼較爲簡單的處理組件。
//對於業務邏輯比較複雜的processor,儘量選擇使用先讀取數據,之後處理數據,然後重新回寫數據的形式,inputstreamcallback和oitputstreamcallback都需要用到,以減少針對flowfile讀寫的消耗
//read方法用於讀取flow中的內容
processSession.read(flowFile, in -> {
try{
String json = IOUtils.toString(in);
String result = JsonPath.read(json, "$.hello");
value.set(result);
}catch(Exception ex){
ex.printStackTrace();
getLogger().error("Failed to read json string.");
}
});
// Write the results to an attribute,write方法用於向flow中寫數據
String results = value.get();
if(results != null && !results.isEmpty()){
flowFile = processSession.putAttribute(flowFile, "match", results);
}
// To write the results back out ot flow file
flowFile = processSession.write(flowFile, out -> out.write(value.get().getBytes()));
//Finally every flow file that is generated needs to be deleted or transfered.
processSession.transfer(flowFile, SUCCESS);
}
In general you pull the flow file out of session. Read and write to the flow files and add attributes where needed. To work on flow files nifi provides 3 callback interfaces.
2.2.5 InputStreamCallback
For reading the contents of the flow file through a input stream.
session.read(flowfile, new InputStreamCallback() {
@Override
public void process(InputStream in) throws IOException {
try{
//Using Apache Commons to read the input stream out to a string.
String json = IOUtils.toString(in);
//Use JsonPath to attempt to read the json and set a value to the pass on.
String result = JsonPath.read(json, "$.hello");
value.set(result);
}catch(Exception ex){
// It would normally be best practice in the case of a exception to pass the original flow file to a Error relation point in the case of an exception.
ex.printStackTrace();
getLogger().error("Failed to read json string.");
}
}
});
2.2.6 OutputStreamCallback
For writing to a flowfile, this will over write not concatenate.We simply write out the value we recieved in the InputStreamCallback
flowfile = session.write(flowfile, new OutputStreamCallback() {
@Override
public void process(OutputStream out) throws IOException {
out.write(value.get().getBytes());
}
});
2.2.7 StreamCallback
This is for both reading and writing to the same flow file. With both the outputstreamcallback and streamcall back remember to assign it back to a flow file. This processor is not in use in the code and could have been. The choice was deliberate to show a way of moving data out of callbacks and back in.
flowfile = session.write(flowfile, new OutputStreamCallback() {
@Override
public void process(OutputStream out) throws IOException {
out.write(value.get().getBytes());
}
});
2.3 Test
應該先在項目裏測試看是否符合設計規範
3.Deployment
3.1 打包
在文件路徑下,進入命令行界面,並執行mvn clean install
命令
3.2 上傳
找到[INFO] Installing D:\ideaSpace\nifi-1.3.0\self-define\first-processors\nifi-demo-nar\target\nifi-demo-nar-1.0.nar to D:\SoftWares\apache-maven-3.2.3\repo\first\nifi-demo-nar\1.0\nifi-demo-nar-1.0.nar中的nifi-demo-nar-1.0.nar
將nar後綴的文件上傳至Nifi的服務器的lib目錄下
3.3 重啓Nifi進入UI
在UI界面下就可以使用了!
參考以下:
https://blog.csdn.net/mianshui1105/article/details/75313480
https://blog.csdn.net/larygry/article/details/89092573
https://blog.csdn.net/yitengtongweishi/article/details/88807934
https://www.nifi.rocks/developing-a-custom-apache-nifi-processor-json/
https://blog.csdn.net/yitengtongweishi/article/details/88807934
https://blog.csdn.net/mianshui1105/article/details/75313480