Storm雜談之Topology的啓動過程（一）

topology的提交

大家都知道，要提交Storm Topology 到Cluster，需要運行如下命令：

${STORM_HOME}/bin/storm jar xxxxxxxxxxx.jar ${main class} [args ...]

bin目錄下storm是一個Python文件，我們可以看一下Python腳本的main方法

def main():
    if len(sys.argv) <= 1:
        print_usage()
        sys.exit(-1)
    global CONFIG_OPTS
    config_list, args = parse_config_opts(sys.argv[1:])
    parse_config(config_list)
    COMMAND = args[0]
    ARGS = args[1:]
    (COMMANDS.get(COMMAND, unknown_command))(*ARGS)
    
if __name__ == "__main__":
    main()

首先解析args參數，解析完了之後，把所有的參數傳遞給COMMANDS，由COMMANDS調用正確的方法，COMMANDS是一個Dict，key是string，value是function

COMMANDS = {"jar": jar, "kill": kill, "shell": shell, "nimbus": nimbus, "ui": ui, "logviewer": logviewer,
            "drpc": drpc, "supervisor": supervisor, "localconfvalue": print_localconfvalue,
            "remoteconfvalue": print_remoteconfvalue, "repl": repl, "classpath": print_classpath,
            "activate": activate, "deactivate": deactivate, "rebalance": rebalance, "help": print_usage,
            "list": listtopos, "dev-zookeeper": dev_zookeeper, "version": version, "monitor": monitor}

我們是調用jar方法：

def jar(jarfile, klass, *args):
    """Syntax: [storm jar topology-jar-path class ...]

    Runs the main method of class with the specified arguments. 
    The storm jars and configs in ~/.storm are put on the classpath. 
    The process is configured so that StormSubmitter 
    (http://storm.incubator.apache.org/apidocs/backtype/storm/StormSubmitter.html)
    will upload the jar at topology-jar-path when the topology is submitted.
    """
    exec_storm_class(
        klass,
        jvmtype="-client",
        extrajars=[jarfile, USER_CONF_DIR, STORM_DIR + "/bin"],
        args=args,
        jvmopts=JAR_JVM_OPTS + ["-Dstorm.jar=" + jarfile])

exec_storm_class時加了一些默認的參數，jvmtype是client的，爲什麼用client模式啓動，而不是server吶？二者區別請看之前的一篇blog：Real differences between “java -server” and “java -client” ，其他的就是把系統配置傳進去：

def exec_storm_class(klass, jvmtype="-server", jvmopts=[], extrajars=[], args=[], fork=False):
    global CONFFILE
    storm_log_dir = confvalue("storm.log.dir",[CLUSTER_CONF_DIR])
    if(storm_log_dir == None or storm_log_dir == "nil"):
        storm_log_dir = STORM_DIR+"/logs"
    all_args = [
        JAVA_CMD, jvmtype, get_config_opts(),
        "-Dstorm.home=" + STORM_DIR,
        "-Dstorm.log.dir=" + storm_log_dir, 
        "-Djava.library.path=" + confvalue("java.library.path", extrajars),
        "-Dstorm.conf.file=" + CONFFILE,
        "-cp", get_classpath(extrajars),
    ] + jvmopts + [klass] + list(args)
    print("Running: " + " ".join(all_args))
    if fork:
        os.spawnvp(os.P_WAIT, JAVA_CMD, all_args)
    else:
        os.execvp(JAVA_CMD, all_args) # replaces the current process and
        # never returns

組件初始化

進程啓動之後，就開始調用你自己寫的Topology代碼了，我們一般用TopologyBuilder來構建Topology，TopologyBuilder有三個變量

private Map<String, IRichBolt> _bolts = new HashMap<String, IRichBolt>();
    private Map<String, IRichSpout> _spouts = new HashMap<String, IRichSpout>();
    private Map<String, ComponentCommon> _commons = new HashMap<String, ComponentCommon>();

_bolts和_spouts就不言而喻了，就是存放你定義的bolt和spout，然後setXXX（）進來的，key=componentId，value是自定義實現的組件

_commons存放該組件額外的一些信息，並行度，額外配置等等。每set一個組件時都會調用初始化common方法

private void initCommon(String id, IComponent component, Number parallelism) {
        ComponentCommon common = new ComponentCommon();
        common.set_inputs(new HashMap<GlobalStreamId, Grouping>());
        if(parallelism!=null) common.set_parallelism_hint(parallelism.intValue());
        Map conf = component.getComponentConfiguration();
        if(conf!=null) common.set_json_conf(JSONValue.toJSONString(conf));
        _commons.put(id, common);
    }

該方法會調getComponentCommon方法

private ComponentCommon getComponentCommon(String id, IComponent component) {
        ComponentCommon ret = new ComponentCommon(_commons.get(id));
        
        OutputFieldsGetter getter = new OutputFieldsGetter();
        component.declareOutputFields(getter);
        ret.set_streams(getter.getFieldsDeclaration());
        return ret;        
    }

大家會看到方法調用組件的declareOutputFields方法，所以在一般重載的方法（Sput會重載open，nextTuple等等，Bolt會重載prepare，execute等等）中declareOutputFields是被最先調用的，所以是不能再declareOutputFields中使用未被初始化的變量（我們一般會在open或prepare中初始化變量，一般也不強調在構造函數中初始化，因爲Storm自身的序列化框架機制），這樣會拋出NullPointer異常。

當所有的bolt和spout都set完畢之後，我們就會調用createTopology方法生成一個StormTopology，由StormSubmitter來submit topology

 /**
     * Submits a topology to run on the cluster. A topology runs forever or until
     * explicitly killed.
     *
     *
     * @param name the name of the storm.
     * @param stormConf the topology-specific configuration. See {@link Config}.
     * @param topology the processing to execute.
     * @param opts to manipulate the starting of the topology
     * @param progressListener to track the progress of the jar upload process
     * @throws AlreadyAliveException if a topology with this name is already running
     * @throws InvalidTopologyException if an invalid topology was submitted
     */
    public static void submitTopology(String name, Map stormConf, StormTopology topology, SubmitOptions opts, ProgressListener progressListener) throws AlreadyAliveException, InvalidTopologyException {
        if(!Utils.isValidConf(stormConf)) {
            throw new IllegalArgumentException("Storm conf is not valid. Must be json-serializable");
        }
        stormConf = new HashMap(stormConf);
        stormConf.putAll(Utils.readCommandLineOpts());
        Map conf = Utils.readStormConfig();
        conf.putAll(stormConf);
        try {
            String serConf = JSONValue.toJSONString(stormConf);
            if(localNimbus!=null) {
                LOG.info("Submitting topology " + name + " in local mode");
                localNimbus.submitTopology(name, null, serConf, topology);
            } else {
                NimbusClient client = NimbusClient.getConfiguredClient(conf);
                if(topologyNameExists(conf, name)) {
                    throw new RuntimeException("Topology with name `" + name + "` already exists on cluster");
                }
                submitJar(conf, progressListener);
                try {
                    LOG.info("Submitting topology " +  name + " in distributed mode with conf " + serConf);
                    if(opts!=null) {
                        client.getClient().submitTopologyWithOpts(name, submittedJar, serConf, topology, opts);
                    } else {
                        // this is for backwards compatibility
                        client.getClient().submitTopology(name, submittedJar, serConf, topology);
                    }
                } catch(InvalidTopologyException e) {
                    LOG.warn("Topology submission exception: "+e.get_msg());
                    throw e;
                } catch(AlreadyAliveException e) {
                    LOG.warn("Topology already alive exception", e);
                    throw e;
                } finally {
                    client.close();
                }
            }
            LOG.info("Finished submitting topology: " +  name);
        } catch(TException e) {
            throw new RuntimeException(e);
        }
    }

提交Topology的操作是，初始化NimbusClient，上傳Jar包，檢查該Topology是否存在，一切完工後，接下來就交由Nimbus來做了。

Nimbus

Nimbus可以說是storm中最核心的部分，它的主要功能有兩個：

對Topology的任務進行分配資源
接收用戶的命令並做相應的處理，如Topology的提交，殺死，激活等等

抱歉，太晚了。後續補上

Storm雜談之Topology的啓動過程（一）

topology的提交

組件初始化

Nimbus

在streaming process中爲什麼需要類似sql查詢語言

Storm問題——組件帶參數構造函數未被調用，拋出NullpointerException

storm雜談之Why use netty as transport instead of zeromq

Storm雜談之Topology的啓動過程（二）

Storm雜談之Topology的啓動過程（一）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結