NiFi 學習 — 統計數

好記憶不如爛筆頭,能記下點東西,就記下點,有時間拿出來看看,也會發覺不一樣的感受.

利用nifi統計單詞出現的次數

  1. 從工具欄中拖入一個Processor,在彈出面板中搜索GenerateFlowFile,然後確認,設置GenerateFlowFile 的屬性如下:

注意看屬性的設置, 尤其要輸入custom text的內容: 本例中輸入的text內容是:

With each release of Apache NiFi, we tend to see at least one pretty powerful new application-level feature, in addition to all of the new and improved Processors that are added. And the latest release of NiFi, version 1.8.0, is no exception! Version 1.8.0 brings us a very powerful new feature, known as Load-Balanced Connections, which makes it much easier to move data around a cluster. Prior to this feature, when a user needs to spread data from one node in a cluster to all the nodes of the cluster, the best option was to use Remote Process Groups and Site-to-Site to move the data. The approach looks like this:

First, an Input Port has to be added to the Root Group. Then, a Remote Process Group has to be added to the flow in order to transfer data to that Root Group Input Port. The connection has to be drawn from the Processor to the Remote Process Group, and the Input Port has to be chosen. Next, the user will configure the specific Port within the Remote Process Group and set the Batch Size to 1 FlowFile, so that data is Round-Robin'ed between the nodes. A connection must then be made from the Root Group Input Port all the way back down, through the Process Groups, to the desired destination, so that the FlowFiles that are sent to the Remote Process Group are transferred to where they need to go in the flow. This may involve adding several more Local Input Ports to the Process Groups, to ensure that the data can flow to the correct destination. Finally, all of those newly created components have to be started.

b.從工具欄中拖入一個Processor,在彈出面板中搜索ExecuteScript,然後確認,設置ExecuteScript的屬性如下:

 1.ScriptEngine 選擇 Groovy

  2. 輸入groovy統計的代碼實現:(此代碼可能因爲字符集或者什麼的容易出錯,最好是用utf8的編輯工具重新編輯以便,這樣才能保證運行處正常結果)

import org.apache.commons.io.IOUtils
import java.nio.charset.*

def flowFile = session.get()
if(!flowFile) return

flowFile = session.write(flowFile, {inputStream, outputStream ->
   def wordCount = [:]

   def tellTaleHeart = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
   def words = tellTaleHeart.split(/(!|\?|-|\.|\"|:|;|,|\s)+/)*.toLowerCase()

   words.each { word ->
   def currentWordCount = wordCount.get(word)
   if(!currentWordCount) {
          wordCount.put(word, 1)
   }
   else {
          wordCount.put(word, currentWordCount + 1)
   }
   }

   def outputMapString = wordCount.inject("", {k,v -> k += "${v.key}: ${v.value}\n"})

   outputStream.write(outputMapString.getBytes(StandardCharsets.UTF_8))
} as StreamCallback)

flowFile = session.putAttribute(flowFile, 'filename', 'wordcount.txt')
session.transfer(flowFile, REL_SUCCESS)

將這段代碼輸入到Script Body 中去:如下圖所示!

c. 從工具欄中拖入一個Processor,在彈出面板中搜索PutFIle,然後設置PutFile的屬性,主要是設置output的地址

d.將GenerateFlowFile ExecuteScriptPutFile 這三個processor的組件連接起來: 連接順序如下所示:

查看最終計算的結果:如下所示!

如此表示改用例是OK的,是沒有什麼問題的。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章