topology的提交
大家都知道,要提交Storm Topology 到Cluster,需要運行如下命令:
${STORM_HOME}/bin/storm jar xxxxxxxxxxx.jar ${main class} [args ...]
bin目錄下storm是一個Python文件,我們可以看一下Python腳本的main方法
def main():
if len(sys.argv) <= 1:
print_usage()
sys.exit(-1)
global CONFIG_OPTS
config_list, args = parse_config_opts(sys.argv[1:])
parse_config(config_list)
COMMAND = args[0]
ARGS = args[1:]
(COMMANDS.get(COMMAND, unknown_command))(*ARGS)
if __name__ == "__main__":
main()
首先解析args參數,解析完了之後,把所有的參數傳遞給COMMANDS,由COMMANDS調用正確的方法,COMMANDS是一個Dict,key是string,value是function
COMMANDS = {"jar": jar, "kill": kill, "shell": shell, "nimbus": nimbus, "ui": ui, "logviewer": logviewer,
"drpc": drpc, "supervisor": supervisor, "localconfvalue": print_localconfvalue,
"remoteconfvalue": print_remoteconfvalue, "repl": repl, "classpath": print_classpath,
"activate": activate, "deactivate": deactivate, "rebalance": rebalance, "help": print_usage,
"list": listtopos, "dev-zookeeper": dev_zookeeper, "version": version, "monitor": monitor}
我們是調用jar方法:
def jar(jarfile, klass, *args):
"""Syntax: [storm jar topology-jar-path class ...]
Runs the main method of class with the specified arguments.
The storm jars and configs in ~/.storm are put on the classpath.
The process is configured so that StormSubmitter
(http://storm.incubator.apache.org/apidocs/backtype/storm/StormSubmitter.html)
will upload the jar at topology-jar-path when the topology is submitted.
"""
exec_storm_class(
klass,
jvmtype="-client",
extrajars=[jarfile, USER_CONF_DIR, STORM_DIR + "/bin"],
args=args,
jvmopts=JAR_JVM_OPTS + ["-Dstorm.jar=" + jarfile])
exec_storm_class時加了一些默認的參數,jvmtype是client的,爲什麼用client模式啓動,而不是server吶?二者區別請看之前的一篇blog:Real differences between “java -server” and “java -client” ,其他的就是把系統配置傳進去:
def exec_storm_class(klass, jvmtype="-server", jvmopts=[], extrajars=[], args=[], fork=False):
global CONFFILE
storm_log_dir = confvalue("storm.log.dir",[CLUSTER_CONF_DIR])
if(storm_log_dir == None or storm_log_dir == "nil"):
storm_log_dir = STORM_DIR+"/logs"
all_args = [
JAVA_CMD, jvmtype, get_config_opts(),
"-Dstorm.home=" + STORM_DIR,
"-Dstorm.log.dir=" + storm_log_dir,
"-Djava.library.path=" + confvalue("java.library.path", extrajars),
"-Dstorm.conf.file=" + CONFFILE,
"-cp", get_classpath(extrajars),
] + jvmopts + [klass] + list(args)
print("Running: " + " ".join(all_args))
if fork:
os.spawnvp(os.P_WAIT, JAVA_CMD, all_args)
else:
os.execvp(JAVA_CMD, all_args) # replaces the current process and
# never returns
組件初始化
進程啓動之後,就開始調用你自己寫的Topology代碼了,我們一般用TopologyBuilder來構建Topology,TopologyBuilder有三個變量
private Map<String, IRichBolt> _bolts = new HashMap<String, IRichBolt>();
private Map<String, IRichSpout> _spouts = new HashMap<String, IRichSpout>();
private Map<String, ComponentCommon> _commons = new HashMap<String, ComponentCommon>();
_bolts和_spouts就不言而喻了,就是存放你定義的bolt和spout,然後setXXX()進來的,key=componentId,value是自定義實現的組件
_commons存放該組件額外的一些信息,並行度,額外配置等等。每set一個組件時都會調用初始化common方法
private void initCommon(String id, IComponent component, Number parallelism) {
ComponentCommon common = new ComponentCommon();
common.set_inputs(new HashMap<GlobalStreamId, Grouping>());
if(parallelism!=null) common.set_parallelism_hint(parallelism.intValue());
Map conf = component.getComponentConfiguration();
if(conf!=null) common.set_json_conf(JSONValue.toJSONString(conf));
_commons.put(id, common);
}
該方法會調getComponentCommon方法
private ComponentCommon getComponentCommon(String id, IComponent component) {
ComponentCommon ret = new ComponentCommon(_commons.get(id));
OutputFieldsGetter getter = new OutputFieldsGetter();
component.declareOutputFields(getter);
ret.set_streams(getter.getFieldsDeclaration());
return ret;
}
大家會看到方法調用組件的declareOutputFields方法,所以在一般重載的方法(Sput會重載open,nextTuple等等,Bolt會重載prepare,execute等等)中declareOutputFields是被最先調用的,所以是不能再declareOutputFields中使用未被初始化的變量(我們一般會在open或prepare中初始化變量,一般也不強調在構造函數中初始化,因爲Storm自身的序列化框架機制),這樣會拋出NullPointer異常。
當所有的bolt和spout都set完畢之後,我們就會調用createTopology方法生成一個StormTopology,由StormSubmitter來submit topology
/**
* Submits a topology to run on the cluster. A topology runs forever or until
* explicitly killed.
*
*
* @param name the name of the storm.
* @param stormConf the topology-specific configuration. See {@link Config}.
* @param topology the processing to execute.
* @param opts to manipulate the starting of the topology
* @param progressListener to track the progress of the jar upload process
* @throws AlreadyAliveException if a topology with this name is already running
* @throws InvalidTopologyException if an invalid topology was submitted
*/
public static void submitTopology(String name, Map stormConf, StormTopology topology, SubmitOptions opts, ProgressListener progressListener) throws AlreadyAliveException, InvalidTopologyException {
if(!Utils.isValidConf(stormConf)) {
throw new IllegalArgumentException("Storm conf is not valid. Must be json-serializable");
}
stormConf = new HashMap(stormConf);
stormConf.putAll(Utils.readCommandLineOpts());
Map conf = Utils.readStormConfig();
conf.putAll(stormConf);
try {
String serConf = JSONValue.toJSONString(stormConf);
if(localNimbus!=null) {
LOG.info("Submitting topology " + name + " in local mode");
localNimbus.submitTopology(name, null, serConf, topology);
} else {
NimbusClient client = NimbusClient.getConfiguredClient(conf);
if(topologyNameExists(conf, name)) {
throw new RuntimeException("Topology with name `" + name + "` already exists on cluster");
}
submitJar(conf, progressListener);
try {
LOG.info("Submitting topology " + name + " in distributed mode with conf " + serConf);
if(opts!=null) {
client.getClient().submitTopologyWithOpts(name, submittedJar, serConf, topology, opts);
} else {
// this is for backwards compatibility
client.getClient().submitTopology(name, submittedJar, serConf, topology);
}
} catch(InvalidTopologyException e) {
LOG.warn("Topology submission exception: "+e.get_msg());
throw e;
} catch(AlreadyAliveException e) {
LOG.warn("Topology already alive exception", e);
throw e;
} finally {
client.close();
}
}
LOG.info("Finished submitting topology: " + name);
} catch(TException e) {
throw new RuntimeException(e);
}
}
提交Topology的操作是,初始化NimbusClient,上傳Jar包,檢查該Topology是否存在,一切完工後,接下來就交由Nimbus來做了。
Nimbus
Nimbus可以 說是storm中最核心的部分,它的主要功能有兩個:
- 對Topology的任務進行分配資源
- 接收用戶的命令並做相應的處理,如Topology的提交,殺死,激活等等
抱歉,太晚了。後續補上