datax源碼閱讀二：JobContainer

原創

2019-04-04 21:34

前面介紹的python文件和Engine都是用於做初始化準備，真正的執行都是在這裏完成，start代碼如下：

/**
 * jobContainer主要負責的工作全部在start()裏面，包括init、prepare、split、scheduler、
 * post以及destroy和statistics
 */
@Override
public void start() {
    LOG.info("DataX jobContainer starts job.");

    boolean hasException = false;
    boolean isDryRun = false;
    try {
        this.startTimeStamp = System.currentTimeMillis();
        isDryRun = configuration.getBool(CoreConstant.DATAX_JOB_SETTING_DRYRUN, false);
        if(isDryRun) {
            LOG.info("jobContainer starts to do preCheck ...");
            this.preCheck();
        } else {
            userConf = configuration.clone();
            LOG.debug("jobContainer starts to do preHandle ...");
            this.preHandle();

            LOG.debug("jobContainer starts to do init ...");
            this.init();
            LOG.info("jobContainer starts to do prepare ...");
            this.prepare();
            LOG.info("jobContainer starts to do split ...");
            this.totalStage = this.split();
            LOG.info("jobContainer starts to do schedule ...");
            this.schedule();
            LOG.debug("jobContainer starts to do post ...");
            this.post();

            LOG.debug("jobContainer starts to do postHandle ...");
            this.postHandle();
            LOG.info("DataX jobId [{}] completed successfully.", this.jobId);

            this.invokeHooks();
        }
    } catch (Throwable e) {
        LOG.error("Exception when job run", e);

        hasException = true;

        if (e instanceof OutOfMemoryError) {
            this.destroy();
            System.gc();
        }
        if (super.getContainerCommunicator() == null) {
            // 由於 containerCollector 是在 scheduler() 中初始化的，所以當在 scheduler() 之前出現異常時，需要在此處對 containerCollector 進行初始化

            AbstractContainerCommunicator tempContainerCollector;
            // standalone
            tempContainerCollector = new StandAloneJobContainerCommunicator(configuration);

            super.setContainerCommunicator(tempContainerCollector);
        }

        Communication communication = super.getContainerCommunicator().collect();
        // 彙報前的狀態，不需要手動進行設置
        // communication.setState(State.FAILED);
        communication.setThrowable(e);
        communication.setTimestamp(this.endTimeStamp);

        Communication tempComm = new Communication();
        tempComm.setTimestamp(this.startTransferTimeStamp);

        Communication reportCommunication = CommunicationTool.getReportCommunication(communication, tempComm, this.totalStage);
        super.getContainerCommunicator().report(reportCommunication);

        throw DataXException.asDataXException(
                FrameworkErrorCode.RUNTIME_ERROR, e);
    } finally {
        if(!isDryRun) {
            this.destroy();
            this.endTimeStamp = System.currentTimeMillis();
            if (!hasException) {
                //最後打印cpu的平均消耗，GC的統計
                VMInfo vmInfo = VMInfo.getVmInfo();
                if (vmInfo != null) {
                    vmInfo.getDelta(false);
                    LOG.info(vmInfo.totalString());
                    LogUtil.logVmInfo(vmInfo);
                }

                LOG.info(PerfTrace.getInstance().summarizeNoException());
                this.logStatistics();
            }
        }
    }
}

主要執行流程爲：

1、preHandle()：job前置操作
2、init()：初始化reader和writer
3、prepare()：執行插件的prepare操作
4、split()：切分任務
5、schedule()：執行任務
6、post()：執行插件的post操作
7、postHandle()：job後置操作
8、invokeHooks()：調用hook

上述任務流是順序執行的，第5步會將切分的task分配到多個taskGroup中併發執行

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

datax源碼閱讀二：JobContainer

datax常見問題

datax源碼閱讀四：TaskGroupContainer

datax源碼閱讀二：JobContainer

datax源碼閱讀二：Engine流程

datax源碼閱讀一

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結