前言：

本文會從如何寫一個Storm的topology開始，來對Storm實現的細節進行闡述。避免乾巴巴的講理論。

1. 建立Maven項目

我們用Maven來管理項目，方便lib依賴的引用和版本控制。

建立最基本的pom.xml如下：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.edi.storm</groupId>
<artifactId>storm-samples</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>


<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>


<repositories>
<repository>
<id>clojars.org</id>
<url>http://clojars.org/repo</url>
</repository>
</repositories>


<build>
<finalName>storm-samples</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
<encoding>${project.build.sourceEncoding}</encoding>
</configuration>
</plugin>


<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>


</plugin>
</plugins>
</build>


<dependencies>
<dependency>
<groupId>storm</groupId>
<artifactId>storm</artifactId>
<version>0.9.0-rc2</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>

這裏我額外添加了兩個build 插件：

maven-compiler-plugin ：爲了方便指定編譯時jdk。Storm的依賴包裏面某些是jdk1.5的.

和

maven-assembly-plugin：爲了把所有依賴包最後打到一個jar包去，方便測試和部署。後面會提到如果不想打到一個jar該怎麼做。

2. 建立Spout

前文提到過，Storm中的spout負責發射數據。

我們來實現這樣一個spout：

它會隨機發射一系列的句子，句子的格式是誰：說的話

代碼如下：

public class RandomSpout extends BaseRichSpout {

	private SpoutOutputCollector collector;

	private Random rand;
	
	private static String[] sentences = new String[] {"edi:I'm happy", "marry:I'm angry", "john:I'm sad", "ted:I'm excited", "laden:I'm dangerous"};
	
	@Override
	public void open(Map conf, TopologyContext context,
			SpoutOutputCollector collector) {
		this.collector = collector;
		this.rand = new Random();
	}

	@Override
	public void nextTuple() {
		String toSay = sentences[rand.nextInt(sentences.length)];
		this.collector.emit(new Values(toSay));
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		declarer.declare(new Fields("sentence"));
	}

}

這裏要先理解Tuple的概念。

Storm中，基本元數據是靠Tuple才承載的。或者說，Tuple是數據的一個大抽象。它要求實現類必須能序列化。

該Spout代碼裏面最核心的部分有兩個：

a. 用collector.emit()方法發射tuple。我們不用自己實現tuple，我們只需要定義tuple的value，Storm會幫我們生成tuple。Values對象接受變長參數。Tuple中以List存放Values，List的Index按照new Values(obj1, obj2,...)的參數的index,例如我們emit(new Values("v1", "v2")), 那麼Tuple的屬性即爲：{ [ "v1" ], [ "V2" ] }

b. declarer.declare方法用來給我們發射的value在整個Stream中定義一個別名。可以理解爲key。該值必須在整個topology定義中唯一。

3. 建立Bolt

既然有了源，那麼我們就來建立節點處理源流出來的數據。怎麼處理呢？爲了演示，我們來做些無聊的事情：末尾添加"!"，然後打印。

兩個功能，兩個Bolt。

先看添加"!"的Bolt

public class ExclaimBasicBolt extends BaseBasicBolt {

	@Override
	public void execute(Tuple tuple, BasicOutputCollector collector) {
		//String sentence = tuple.getString(0);
		String sentence = (String) tuple.getValue(0);
		String out = sentence + "!";
		collector.emit(new Values(out));
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		declarer.declare(new Fields("excl_sentence"));
	}

}

在RandomSpout中，我們發射的Tuple具有這樣的屬性 { [ "edi:I'm Happy" ] }, 所以tuple的value list中第0個值，肯定是個String。我們用tuple.getvalue(0)取到。

Storm爲tuple封裝了一些方法方便我們取一些基本類型，例如String，我們可以直接用getString(int N) 。

取到以後，我們在末尾添加"!"後，仍然發射一個Tuple，定義其唯一的value的field 名字爲"excl_sentence"

打印Bolt

public class PrintBolt extends BaseBasicBolt {

	@Override
	public void execute(Tuple tuple, BasicOutputCollector collector) {
		String rec = tuple.getString(0);
		System.err.println("String recieved: " + rec);
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		// do nothing
	}

}

仍然是取第一個，因爲我們並沒有定義過第二個value

4. 建立Topology

現在我們建立拓撲結構的主要組件都有了，可以創建topology了。

public class ExclaimBasicTopo {

	public static void main(String[] args) throws Exception {
		TopologyBuilder builder = new TopologyBuilder();
		
		builder.setSpout("spout", new RandomSpout());
		builder.setBolt("exclaim", new ExclaimBasicBolt()).shuffleGrouping("spout");
		builder.setBolt("print", new PrintBolt()).shuffleGrouping("exclaim");

		Config conf = new Config();
		conf.setDebug(false);

		if (args != null && args.length > 0) {
			conf.setNumWorkers(3);

			StormSubmitter.submitTopology(args[0], conf, builder.createTopology());
		} else {

			LocalCluster cluster = new LocalCluster();
			cluster.submitTopology("test", conf, builder.createTopology());
			Utils.sleep(100000);
			cluster.killTopology("test");
			cluster.shutdown();
		}
	}
}

很簡單，對吧。

其中，

builder.setSpout("spout", new RandomSpout());

定義一個spout，id爲"spout"

builder.setBolt("exclaim", new ExclaimBasicBolt()).shuffleGrouping("spout");

定義了一個id爲"exclaim"的bolt，並且按照隨機分組獲得"spout"發射的tuple

builder.setBolt("print", new PrintBolt()).shuffleGrouping("exclaim");

定義了一個id爲"print"的bolt，並且按照隨機分組獲得"exclaim”發射出來的tuple

.shuffleGrouping

是指明Storm按照何種策略將tuple分配到後續的bolt去。

可以看到，如果我們運行時不帶參數，是把topology提交到了LocalCluster的，即所有的task都在一個本地JVM去執行。可以用LocalCluster來調試。如果後面帶一個參數，即爲該topology的名字，那麼就把該topology提交到集羣上去了。

把項目用M2E插件導入Eclipse直接運行試試

String recieved: marry:I'm angry!
String recieved: edi:I'm happy!
String recieved: john:I'm sad!
String recieved: edi:I'm happy!
String recieved: ted:I'm excited!
String recieved: laden:I'm dangerous!
String recieved: edi:I'm happy!
String recieved: edi:I'm happy!

這裏我們並沒有指定並行，那麼其實是每個spout、bolt僅有一個線程對應去執行。

我們修改下代碼，指定並行數

		builder.setBolt("exclaim", new ExclaimBasicBolt(), 2).shuffleGrouping("spout");
		builder.setBolt("print", new PrintBolt(),3).shuffleGrouping("exclaim");

由於我們並沒有多指定task數目，所以默認，會有兩個exectuor去執行兩個exclaimBasicBolt的task，3個executor去執行3個PrintBolt的task。

爲了方便體現確實是並行，我們修改PrintBolt代碼如下：

public class PrintBolt extends BaseBasicBolt {

    private int indexId;
	
	@Override
	public void prepare(Map stormConf, TopologyContext context) {
		this.indexId = context.getThisTaskIndex();
    }

	@Override
	public void execute(Tuple tuple, BasicOutputCollector collector) {
		String rec = tuple.getString(0);
		System.err.println(String.format("Bolt[%d] String recieved: %s",this.indexId, rec));
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		// do nothing
	}

}

這裏從上下文中拿到該Bolt的TaskIndex，我們指定了3的併發度，所以理論上有3個task，那麼該值應該爲[1,2,3]。

運行下看看：

Bolt[0] String recieved: marry:I'm angry!
Bolt[2] String recieved: john:I'm sad!
Bolt[2] String recieved: ted:I'm excited!
Bolt[2] String recieved: john:I'm sad!
Bolt[2] String recieved: john:I'm sad!

證實確實是併發了。

本地測試通過了，我們用 mvn clean install 命令編譯，然後把target目錄下生成的 storm-samples-jar-with-dependencies.jar 拷到nimbus機器上，執行

./storm jar storm-samples-jar-with-dependencies.jar com.edi.storm.topos.ExclaimBasicTopo test

在StormUI裏面，點進 test

看到spout 已然已經emit了 11347280個tuple了…… 而id爲exclaim的bolt也已經接受了2906920個tuple了。print沒有輸出，所以emit爲0。

截止到這裏，一個簡單的Storm的topology已經完成了。

但是，這裏依然有些問題：

1. 什麼是acker？

2. Bolt爲什麼有兩個繼承類和接口？

3. Topology的提交方式到底有幾種？

4. 除了隨機分組，還有哪些分組策略？

5. Storm是如何保證tuple不被丟失的？

6. 我看到spout發送數據比bolt處理的速度快太多了，我能不能在spout裏面sleep？

7. 併發數要如何指定呢？

要知後事如何，且聽下回分解~

Storm應用系列之——最基本的例子

1. 建立Maven項目

2. 建立Spout

3. 建立Bolt

4. 建立Topology

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

linux安裝cuda和cudnn

Mellanox網卡開啓SR-IOV

模擬手機設備：使用 Playwright 實現移動端自動化測試

HTML 00 Tutorial

全面系統的AI學習路徑，幫助普通人也能玩轉AI

從零開始：使用 Playwright 腳本錄製實現自動化測試

uni-app實現上拉加載

Opensource IDM對比

忙死了，很多東西沒有來得及總結

Spring + AspectJ

github如何貢獻源代碼

查找某應用後kill全局命令

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結