storm的第一個例子

Storm的一個簡單例子:

本文不涉及到各種細節,只是一個簡單的storm程序,用於快速入門

例子簡介

有一個數據源,不斷隨機發送字符串aa,bb,cc,dd,ee,ff 中選擇一個發送給一個程序進行處理。這個程序將這個字符串打印到控制檯寫道log裏面,然後傳給下一個程序,下一個程序把這個字符串保存到本地文件。

設計

這就是一個流式的處理過程。聯想到Storm的拓撲圖
spout-flow
我們可以想到, Spout用來產生數據,有兩個Bolt,一個Bolt用來將這個字符串打印到控制檯寫入log,另外一個Bolt用來將這個字符串寫入到本地文件。
所以,我們只需要安裝上述思路就行了。

Topology介紹與程序編寫

這個用來控制數據流的方向,是從自己出發後,進入到哪一個Bolt,下一個Bolt又進入到哪一個Bolt。
有點類似於Hadoop中的Main方法。 下面是一些需要的設置
先設置參數

        // 設置參數
        Config cfg = new Config(); //import backtype.storm.Config;
        cfg.setNumWorkers(2); //設置worker數爲2,現在暫時不需要仔細研究這塊
        cfg.setDebug(true);

設置數據流的流轉方向和shuffle的設置
下面的代碼我們知道,數據源是new PWSpout() 這個對象產生, 然後命名爲spout發出去,
接下來是new PrintBolt() 這個對象接收了spout,處理後,命名爲print-bolt發出去,然後
new WriteBolt() 這個對象接收了print-bolt, 然後進行處理,以write-bolt發出去,後面沒有Bolt則不再對數據進行處理了
我這裏使用的是隨機分組shuffleGrouping,隨機派發stream裏面的tuple, 保證每個bolt接收到的tuple數目相同

        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("spout", new PWSpout());
        builder.setBolt("print-bolt", new PrintBolt()).shuffleGrouping("spout");
        builder.setBolt("write-bolt", new WriteBolt()).shuffleGrouping("print-bolt");

設置storm的運行方式
因爲這個例子是在本地運行,那麼就這樣設置
創建一個LocalCluster, 取名爲top1,10秒鐘後關閉這個程序

        LocalCluster cluster = new LocalCluster();
        cluster.submitTopology("top1", cfg, builder.createTopology());
        Thread.sleep(10000);
        cluster.killTopology("top1");
        cluster.shutdown();

如果是在集羣上設置,那麼這樣設置即可

StormSubmitter.submitTopology("top1", cfg, builder.createTopology());

Topology完整代碼如下

public class PWTopology1 {

    public static void main(String[] args) throws Exception {
        // 設置參數
        Config cfg = new Config();
        cfg.setNumWorkers(2);
        cfg.setDebug(true);


        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("spout", new PWSpout());
        builder.setBolt("print-bolt", new PrintBolt()).shuffleGrouping("spout");
        builder.setBolt("write-bolt", new WriteBolt()).shuffleGrouping("print-bolt");


        //1 本地模式
        LocalCluster cluster = new LocalCluster();
        cluster.submitTopology("top1", cfg, builder.createTopology());
        Thread.sleep(10000);
        cluster.killTopology("top1");
        cluster.shutdown();

        //2 集羣模式
//      StormSubmitter.submitTopology("top1", cfg, builder.createTopology());

    }
}

Spout介紹與程序編寫

上面創建了一個Spout, 用來產生數據
Spout需要繼承BaseRichSpout, 並且需要重寫其中的一些方法
- open方法,open方法實現的是ISpout接口中的, 看源碼即知道,這個open就是一個上下文的
第一個就是前面拓撲中的conf傳過來的,第二個是Topology的Context,第三個是用來發射tuple的

    /**
     * Called when a task for this component is initialized within a worker on the cluster.
     * It provides the spout with the environment in which the spout executes.
     *
     * <p>This includes the:</p>
     *
     * @param conf The Storm configuration for this spout. This is the configuration provided to the topology merged in with cluster configuration on this machine.
     * @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.
     * @param collector The collector is used to emit tuples from this spout. Tuples can be emitted at any time, including the open and close methods. The collector is thread-safe and should be saved as an instance variable of this spout object.
     */
 void open(Map conf, TopologyContext context, SpoutOutputCollector collector);

我們需要用一個collector把數據發到下一個節點,所以對spout的collector給賦值

    @Override
    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
        //對spout進行初始化
        this.collector = collector;
        //System.out.println(this.collector);
    }
  • nextTupe方法
    就是不斷的產生數據的
    /**
     * When this method is called, Storm is requesting that the Spout emit tuples to the 
     * output collector. This method should be non-blocking, so if the Spout has no tuples
     * to emit, this method should return. nextTuple, ack, and fail are all called in a tight
     * loop in a single thread in the spout task. When there are no tuples to emit, it is courteous
     * to have nextTuple sleep for a short amount of time (like a single millisecond)
     * so as not to waste too much CPU.
     */
    void nextTuple();

我們每0.5秒鐘隨機發送一個數據給下面的bolt進行處理

    @Override
    public void nextTuple() {
        //隨機發送一個單詞
        final Random r = new Random();
        int num = r.nextInt(6);
        try {
            Thread.sleep(500);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        this.collector.emit(new Values(lists.get(num)));
    }
  • declareOutputFields
    聲明發送數據的field,接下來的Bolts可以通過get這個field來得到這個數據
    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        //進行聲明
        declarer.declare(new Fields("print"));
    }

Spout的完整代碼如下

public class PWSpout extends BaseRichSpout {

    private static final long serialVersionUID = 1L;
    private SpoutOutputCollector collector;

    private static List<String> lists  = null;
    static{
        lists = Arrays.asList("aa", "bb", "cc", "dd", "ee", "ff");
    }
    @Override
    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
        //對spout進行初始化
        this.collector = collector;
        //System.out.println(this.collector);
    }

    /**
     * <B>方法名稱:</B>輪詢tuple<BR>
     * <B>概要說明:</B><BR>
     * @see backtype.storm.spout.ISpout#nextTuple()
     */
    @Override
    public void nextTuple() {
        //隨機發送一個單詞
        final Random r = new Random();
        int num = r.nextInt(6);
        try {
            Thread.sleep(500);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        this.collector.emit(new Values(lists.get(num)));
    }

    /**
     * <B>方法名稱:</B>declarer聲明發送數據的field<BR>
     * <B>概要說明:</B><BR>
     * @see backtype.storm.topology.IComponent#declareOutputFields(backtype.storm.topology.OutputFieldsDeclarer)
     */
    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        //進行聲明
        declarer.declare(new Fields("print"));
    }



}

Bolt介紹與程序編寫

Bolt一般需要繼承 BaseBasicBolt(當然也可以實現接口)有以下方法
- prepare, 類似於hadoop中的setup,第一次進來的時候運行一次

void prepare(Map stormConf, TopologyContext context);
  • excete, 用來執行這個數據流的方法
void execute(Tuple input, BasicOutputCollector collector);
  • declareOutputFields。 聲明發送數據的field,接下來的Bolts可以通過get這個field來得到這個數據
void declareOutputFields(OutputFieldsDeclarer declarer);

上面創建了兩個bolt,分別有不同的功能

PrintBolt

用來輸出到控制檯,打印到log
完整代碼如下

public class PrintBolt extends BaseBasicBolt {

    private static final Log log = LogFactory.getLog(PrintBolt.class);

    private static final long serialVersionUID = 1L;

    @Override
    public void execute(Tuple input, BasicOutputCollector collector) {
        //獲取上一個組件所聲明的Field
        String print = input.getStringByField("print");
        log.info("【print】: " + print);
        System.out.println("print: "+print);
        collector.emit(new Values(print));

    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("write"));
    }

}

WriteBolt

public class WriteBolt extends BaseBasicBolt {

    private static final long serialVersionUID = 1L;

    private static final Log log = LogFactory.getLog(WriteBolt.class);

    private FileWriter writer ;
    @Override
    public void execute(Tuple input, BasicOutputCollector collector) {
        //獲取上一個組件所聲明的Field
        String text = input.getStringByField("write");
        try {
            if(writer == null){
                if(System.getProperty("os.name").equals("Windows 10")){
                    writer = new FileWriter("F:\\testdir\\" + this);
                } else if(System.getProperty("os.name").equals("Windows 8.1")){
                    writer = new FileWriter("F:\\testdir\\" + this);
                } else if(System.getProperty("os.name").equals("Windows 7")){
                    writer = new FileWriter("F:\\testdir\\" + this);
                } else if(System.getProperty("os.name").equals("Linux")){
                    System.out.println("----:" + System.getProperty("os.name"));
                    writer = new FileWriter("/usr/local/temp/" + this);
                }
            }
            log.info("【write】: 寫入文件");
            writer.write(text);
            writer.write("\n");
            writer.flush();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {

    }

}

程序需要storm相關jar包,maven的pom.xml如下
主要需要引入

    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-core</artifactId>
        <version>0.9.2-incubating</version>
    </dependency>

完整的pom如下

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>storm01</groupId>
  <artifactId>storm01</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>storm01</name>
  <url>http://maven.apache.org</url>
  <repositories>

      <repository>
          <id>central</id>
          <name>Central Repository</name>
          <url>http://maven.aliyun.com/nexus/content/repositories/central</url>
          <layout>default</layout>
          <snapshots>
              <enabled>false</enabled>
          </snapshots>
      </repository>

      <repository>
          <id>central</id>
          <name>Maven Repository Switchboard</name>
          <layout>default</layout>
          <url>http://repo2.maven.org/maven2</url>
          <snapshots>
              <enabled>false</enabled>
          </snapshots>
      </repository>

        <!-- Repository where we can found the storm dependencies  -->
        <repository>
            <id>clojars.org</id>
            <url>http://clojars.org/repo</url>
        </repository>
  </repositories>
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
  <dependencies>
    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-core</artifactId>
        <version>0.9.2-incubating</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
  </dependencies>




    <build>
    <finalName>storm01</finalName>
   <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-war-plugin</artifactId>
            <version>2.4</version>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.5</version>
            <configuration>
                <source>1.7</source>
                <target>1.7</target>
            </configuration>
        </plugin>
        <!-- 單元測試 -->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <configuration>
                <skip>true</skip>
                <includes>
                    <include>**/*Test*.java</include>
                </includes>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-source-plugin</artifactId>
            <version>2.1.2</version>
            <executions>
                <!-- 綁定到特定的生命週期之後,運行maven-source-pluin 運行目標爲jar-no-fork -->
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>jar-no-fork</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>   
    </plugins>    
  </build>
</project>

目錄結構如圖
storm01結構圖

本地運行後
將會在控制檯隨機輸出字符串,並且在本地F:\testdir\有文件記錄有控制檯輸出的字符串

發佈了137 篇原創文章 · 獲贊 211 · 訪問量 59萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章