Jstorm源碼閱讀（1）—— topology 提交過程

Client端

client端即我們使用命令

jstorm jar xxxx.jar xxxx.xxxx.xxxx args...

提交topology的過程，所以我們直接從StormSubmitter類的submitTopology方法開始看，核心代碼如下：

    try {
        String serConf = Utils.to_json(stormConf);
        if (localNimbus != null) {
            LOG.info("Submitting topology " + name + " in local mode");
            localNimbus.submitTopology(name, null, serConf, topology);
        } else {
            NimbusClient client = NimbusClient.getConfiguredClient(conf);
            try {
                if (topologyNameExists(client, conf, name)) {
                    throw new RuntimeException("Topology with name `" + name + "` already exists on cluster");
                }
                submitJar(client, conf);
                LOG.info("Submitting topology " + name + " in distributed mode with conf " + serConf);
                if (opts != null) {
                    client.getClient().submitTopologyWithOpts(name, path, serConf, topology, opts);
                } else {
                    // this is for backwards compatibility
                    client.getClient().submitTopology(name, path, serConf, topology);
                }
            } finally {
                client.close();
            }
        }
        LOG.info("Finished submitting topology: " + name);
    } catch (InvalidTopologyException e) {
        LOG.warn("Topology submission exception", e);
        throw e;
    } catch (AlreadyAliveException e) {
        LOG.warn("Topology already alive exception", e);
        throw e;
    } catch (TopologyAssignException e) {
        LOG.warn("Failed to assign " + e.get_msg(), e);
        throw new RuntimeException(e);
    } catch (TException e) {
        LOG.warn("Failed to assign ", e);
        throw new RuntimeException(e);
    }

由於是集羣模式，這裏走else邏輯，首先根據jstorm路徑下的配置文件創建NimbusClient對象，然後判斷提交的topology的名字是否是唯一的，然後調用submitJar上傳jar包。submitJar方法核心代碼如下：

String localJar = System.getProperty("storm.jar");
path = client.getClient().beginFileUpload();
String[] pathCache = path.split("/");
String uploadLocation = path + "/stormjar-" + pathCache[pathCache.length - 1] + ".jar";
List<String> lib = (List<String>) conf.get(GenericOptionsParser.TOPOLOGY_LIB_NAME);
Map<String, String> libPath = (Map<String, String>) conf.get(GenericOptionsParser.TOPOLOGY_LIB_PATH);
if (lib != null && lib.size() != 0) {
    for (String libName : lib) {
        String jarPath = path + "/lib/" + libName;
        client.getClient().beginLibUpload(jarPath);
        submitJar(conf, libPath.get(libName), jarPath, client);
    }
} else {
    if (localJar == null) {
        // no lib, no client jar
        throw new RuntimeException("No client app jar, please upload it");
    }
}
if (localJar != null) {
    submittedJar = submitJar(conf, localJar, uploadLocation, client);
} else {
    // no client jar, but with lib jar
    client.getClient().finishFileUpload(uploadLocation);
}

首先根據storm.jar屬性獲取到jar包的全路徑。我們在調用jstorm命令時執行的其實是源碼目錄下的bin/jstorm.py文件，而jstorm.py裏的jar函數就是我們在執行命令jstorm jar時調用的代碼，可以看到裏面在調用java程序時有"-Dstorm.jar=" + jarfile 這樣的代碼，所以在java中使用storm.jar能貨渠道jar包的路徑。

這裏先向Nimbus服務器發起beginFileUpload請求，得到要上傳的路徑後，在submitJar中調用uploadChunk將jar包上傳到服務器，文件傳輸完畢後，也是在submitJar方法中調用finishFileUpload結束上傳jar包的過程i，返回到submitTopology方法中。調用submitTopology方法通知服務器做topology的後續處理工作。

Server端

服務端在啓動NimbusServer的時候會實例化一個ServiceHandler類的對象，然後在啓動thrift server時會把它作爲參數傳遞進去，所有後續thrift server的調用都會回調到ServiceHandler類的方法中，所以我們直接看這個類中的submitTopology方法。

@Override
public void submitTopology(String name, String uploadedJarLocation, String jsonConf, StormTopology topology) throws TException {
    SubmitOptions options = new SubmitOptions(TopologyInitialStatus.ACTIVE);
    submitTopologyWithOpts(name, uploadedJarLocation, jsonConf, topology, options);
}

調用了submitTopologyWithOpts，首先是一些topology的合法性校驗。

if (!Common.charValidate(topologyName)) {
    throw new InvalidTopologyException(topologyName + " is not a valid topology name");
}
checkTopologyActive(data, topologyName, false);

構造一些配置項

Map<Object, Object> serializedConf = (Map<Object, Object>) JStormUtils.from_json(jsonConf);
if (serializedConf == null) {
    LOG.warn("Failed to serialized Configuration");
    throw new InvalidTopologyException("Failed to serialize topology configuration");
}
serializedConf.put(Config.TOPOLOGY_ID, topologyId);
serializedConf.put(Config.TOPOLOGY_NAME, topologyName);
Map<Object, Object> stormConf;
stormConf = NimbusUtils.normalizeConf(conf, serializedConf, topology);
LOG.info("Normalized configuration:" + stormConf);
Map<Object, Object> totalStormConf = new HashMap<Object, Object>(conf);
totalStormConf.putAll(stormConf);

標準化topology和進行一些校驗

//標準化
StormTopology normalizedTopology = NimbusUtils.normalizeTopology(stormConf, topology, true);
//校驗ID、字段合法性，worker和acker數量合法性
Common.validate_basic(normalizedTopology, totalStormConf, topologyId);

//創建各種本地文件
setupStormCode(conf, topologyId, uploadedJarLocation, stormConf, normalizedTopology);

//在zk上爲每個spout和bolt創建task信息
setupZkTaskInfo(conf, topologyId, stormClusterState);
//爲topology創建一個分發事件，放到隊列中，等待被處理。真正的分發是由TopologyAssign在其他線程中處理的。
makeAssignment(topologyName, topologyId, options.get_initial_status());
//向監控topology的線程發起一個開始事件，該事件由TopologyMetricsRunnable在其他線程中處理。
StartTopologyEvent startEvent = new StartTopologyEvent();
startEvent.clusterName = this.data.getClusterName();
startEvent.topologyId = topologyId;
startEvent.timestamp = System.currentTimeMillis();
startEvent.sampleRate = metricsSampleRate;
this.data.getMetricRunnable().pushEvent(startEvent);

ServiceHandler的工作到這裏就結束了，後面就沒它什麼事了，接下來我們看看TopologyAssign是怎麼處理髮送過來的TopologyAssignEvent的。這個類繼承自Runnable，是在NimbusServer初始化的時候啓動的一個線程。在它的run方法裏循環的從隊列中讀取並由doTopologyAssignment方法處理分發事件。核心代碼在mkAssignment方法中：

String topologyId = event.getTopologyId();
LOG.info("Determining assignment for " + topologyId);
//創建TP分發上下文，裏面封裝了一些集羣、task和component信息
TopologyAssignContext context = prepareTopologyAssign(event);
Set<ResourceWorkerSlot> assignments = null;
if (!StormConfig.local_mode(nimbusData.getConf())) {
    //獲取調度器，執行任務的真正調度
    IToplogyScheduler scheduler = schedulers.get(DEFAULT_SCHEDULER_NAME);
    assignments = scheduler.assignTasks(context);
} else {
    assignments = mkLocalAssignment(context);
}
Assignment assignment = null;
if (assignments != null && assignments.size() > 0) {
...
assignment = new Assignment(codeDir, assignments, nodeHost, startTimes);
StormClusterState stormClusterState = nimbusData.getStormClusterState();
//將調度器分配的結果設置到zk中
stormClusterState.set_assignment(topologyId, assignment);
...
}
return assignment;

調度器的部分就先不說了，總體來說一個topology的提交過程就算完事了，基本過程如下：

client將jar包傳給server，通知server提交了一個topology
server做一些基本的校驗後生成一個事件，發送給TopologyAssign
TopologyAssign處理分發事件，並交給Scheduler來計算如何分配task和work到supervisor。
將task的分配結果寫到zk，後續由supervisor訂閱並執行task的啓動和運行。

Jstorm源碼閱讀（1）—— topology 提交過程原

Jstorm源碼閱讀（1）—— topology 提交過程

Client端

Server端

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

Spark 內存管理原

記Structured Streaming 2.3.1的OOM排查過程原薦

Spark源碼閱讀——streaming模塊作業生成和提交原

jstorm源碼閱讀（2） —— supervisor簡介原

spring源碼閱讀筆記（一）原

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Jstorm源碼閱讀（1）—— topology 提交過程 原

Jstorm源碼閱讀（1）—— topology 提交過程

Client端

Server端

Jstorm源碼閱讀（1）—— topology 提交過程原