Jstorm源碼閱讀(1)—— topology 提交過程 原

Jstorm源碼閱讀(1)—— topology 提交過程

Client端

client端即我們使用命令

jstorm jar xxxx.jar xxxx.xxxx.xxxx args...

提交topology的過程,所以我們直接從StormSubmitter類的submitTopology方法開始看,核心代碼如下:

    try {
        String serConf = Utils.to_json(stormConf);
        if (localNimbus != null) {
            LOG.info("Submitting topology " + name + " in local mode");
            localNimbus.submitTopology(name, null, serConf, topology);
        } else {
            NimbusClient client = NimbusClient.getConfiguredClient(conf);
            try {
                if (topologyNameExists(client, conf, name)) {
                    throw new RuntimeException("Topology with name `" + name + "` already exists on cluster");
                }
                submitJar(client, conf);
                LOG.info("Submitting topology " + name + " in distributed mode with conf " + serConf);
                if (opts != null) {
                    client.getClient().submitTopologyWithOpts(name, path, serConf, topology, opts);
                } else {
                    // this is for backwards compatibility
                    client.getClient().submitTopology(name, path, serConf, topology);
                }
            } finally {
                client.close();
            }
        }
        LOG.info("Finished submitting topology: " + name);
    } catch (InvalidTopologyException e) {
        LOG.warn("Topology submission exception", e);
        throw e;
    } catch (AlreadyAliveException e) {
        LOG.warn("Topology already alive exception", e);
        throw e;
    } catch (TopologyAssignException e) {
        LOG.warn("Failed to assign " + e.get_msg(), e);
        throw new RuntimeException(e);
    } catch (TException e) {
        LOG.warn("Failed to assign ", e);
        throw new RuntimeException(e);
    }

由於是集羣模式,這裏走else邏輯,首先根據jstorm路徑下的配置文件創建NimbusClient對象,然後判斷提交的topology的名字是否是唯一的,然後調用submitJar上傳jar包。submitJar方法核心代碼如下:

String localJar = System.getProperty("storm.jar");
path = client.getClient().beginFileUpload();
String[] pathCache = path.split("/");
String uploadLocation = path + "/stormjar-" + pathCache[pathCache.length - 1] + ".jar";
List<String> lib = (List<String>) conf.get(GenericOptionsParser.TOPOLOGY_LIB_NAME);
Map<String, String> libPath = (Map<String, String>) conf.get(GenericOptionsParser.TOPOLOGY_LIB_PATH);
if (lib != null && lib.size() != 0) {
    for (String libName : lib) {
        String jarPath = path + "/lib/" + libName;
        client.getClient().beginLibUpload(jarPath);
        submitJar(conf, libPath.get(libName), jarPath, client);
    }
} else {
    if (localJar == null) {
        // no lib, no client jar
        throw new RuntimeException("No client app jar, please upload it");
    }
}
if (localJar != null) {
    submittedJar = submitJar(conf, localJar, uploadLocation, client);
} else {
    // no client jar, but with lib jar
    client.getClient().finishFileUpload(uploadLocation);
}

首先根據storm.jar屬性獲取到jar包的全路徑。我們在調用jstorm命令時執行的其實是源碼目錄下的bin/jstorm.py文件,而jstorm.py裏的jar函數就是我們在執行命令jstorm jar時調用的代碼,可以看到裏面在調用java程序時有"-Dstorm.jar=" + jarfile 這樣的代碼,所以在java中使用storm.jar能貨渠道jar包的路徑。

這裏先向Nimbus服務器發起beginFileUpload請求,得到要上傳的路徑後,在submitJar中調用uploadChunk將jar包上傳到服務器,文件傳輸完畢後,也是在submitJar方法中調用finishFileUpload結束上傳jar包的過程i,返回到submitTopology方法中。調用submitTopology方法通知服務器做topology的後續處理工作。


Server端

服務端在啓動NimbusServer的時候會實例化一個ServiceHandler類的對象,然後在啓動thrift server時會把它作爲參數傳遞進去,所有後續thrift server的調用都會回調到ServiceHandler類的方法中,所以我們直接看這個類中的submitTopology方法。

@Override
public void submitTopology(String name, String uploadedJarLocation, String jsonConf, StormTopology topology) throws TException {
    SubmitOptions options = new SubmitOptions(TopologyInitialStatus.ACTIVE);
    submitTopologyWithOpts(name, uploadedJarLocation, jsonConf, topology, options);
}

調用了submitTopologyWithOpts,首先是一些topology的合法性校驗。

if (!Common.charValidate(topologyName)) {
    throw new InvalidTopologyException(topologyName + " is not a valid topology name");
}
checkTopologyActive(data, topologyName, false);

構造一些配置項

Map<Object, Object> serializedConf = (Map<Object, Object>) JStormUtils.from_json(jsonConf);
if (serializedConf == null) {
    LOG.warn("Failed to serialized Configuration");
    throw new InvalidTopologyException("Failed to serialize topology configuration");
}
serializedConf.put(Config.TOPOLOGY_ID, topologyId);
serializedConf.put(Config.TOPOLOGY_NAME, topologyName);
Map<Object, Object> stormConf;
stormConf = NimbusUtils.normalizeConf(conf, serializedConf, topology);
LOG.info("Normalized configuration:" + stormConf);
Map<Object, Object> totalStormConf = new HashMap<Object, Object>(conf);
totalStormConf.putAll(stormConf);

標準化topology和進行一些校驗

//標準化
StormTopology normalizedTopology = NimbusUtils.normalizeTopology(stormConf, topology, true);
//校驗ID、字段合法性,worker和acker數量合法性
Common.validate_basic(normalizedTopology, totalStormConf, topologyId);
//創建各種本地文件
setupStormCode(conf, topologyId, uploadedJarLocation, stormConf, normalizedTopology);

//在zk上爲每個spout和bolt創建task信息
setupZkTaskInfo(conf, topologyId, stormClusterState);
//爲topology創建一個分發事件,放到隊列中,等待被處理。真正的分發是由TopologyAssign在其他線程中處理的。
makeAssignment(topologyName, topologyId, options.get_initial_status());
//向監控topology的線程發起一個開始事件,該事件由TopologyMetricsRunnable在其他線程中處理。
StartTopologyEvent startEvent = new StartTopologyEvent();
startEvent.clusterName = this.data.getClusterName();
startEvent.topologyId = topologyId;
startEvent.timestamp = System.currentTimeMillis();
startEvent.sampleRate = metricsSampleRate;
this.data.getMetricRunnable().pushEvent(startEvent);

ServiceHandler的工作到這裏就結束了,後面就沒它什麼事了,接下來我們看看TopologyAssign是怎麼處理髮送過來的TopologyAssignEvent的。這個類繼承自Runnable,是在NimbusServer初始化的時候啓動的一個線程。在它的run方法裏循環的從隊列中讀取並由doTopologyAssignment方法處理分發事件。核心代碼在mkAssignment方法中:

String topologyId = event.getTopologyId();
LOG.info("Determining assignment for " + topologyId);
//創建TP分發上下文,裏面封裝了一些集羣、task和component信息
TopologyAssignContext context = prepareTopologyAssign(event);
Set<ResourceWorkerSlot> assignments = null;
if (!StormConfig.local_mode(nimbusData.getConf())) {
    //獲取調度器,執行任務的真正調度
    IToplogyScheduler scheduler = schedulers.get(DEFAULT_SCHEDULER_NAME);
    assignments = scheduler.assignTasks(context);
} else {
    assignments = mkLocalAssignment(context);
}
Assignment assignment = null;
if (assignments != null && assignments.size() > 0) {
...
assignment = new Assignment(codeDir, assignments, nodeHost, startTimes);
StormClusterState stormClusterState = nimbusData.getStormClusterState();
//將調度器分配的結果設置到zk中
stormClusterState.set_assignment(topologyId, assignment);
...
}
return assignment;

調度器的部分就先不說了,總體來說一個topology的提交過程就算完事了,基本過程如下:

  1. client將jar包傳給server,通知server提交了一個topology
  2. server做一些基本的校驗後生成一個事件,發送給TopologyAssign
  3. TopologyAssign處理分發事件,並交給Scheduler來計算如何分配task和work到supervisor。
  4. 將task的分配結果寫到zk,後續由supervisor訂閱並執行task的啓動和運行。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章