MongoDB更新同步的oplog位置

在MongoDB中，副本集節點之間爲了保持一致性，需要通過oplog的同步與回放來進行。MongoDB採用的是節點向源節點主動拉取的方式，從源節點拉取oplog，目的節點需要及時通知其他節點它的最新的同步到的時間點。

如上圖所示， 2個Secondary從Primary上面拉取oplog，每當secondary的時間點發生改變，會調用replSetUpdatePosition來告訴
在mongod內，有一個專門的名字爲SyncSourceFeedback的線程,它負責向與節點彙報當前節點的進度， Primary本身是不需要的，因爲它不需向其他節點同步數據，當然不存儲數據的節點，例如Arbiter類型的節點也不需要。有2個類專門負責這項任務： SyncSourceFeedback與Reporter，其調用關係如下圖所示：

SyncSourceFeedback

SyncSourceFeedback負責：

節點是否需要向其源節點彙報位置；
節點角色的轉換，比如一個節點從secondary轉化爲primary，就不需要繼續彙報位置了；
同步源發生了切換，比如原本從A節點同步，後來變成從B同步；
調用Reporter彙報位置，它本身將具體的彙報工作交給Reporter來做;

void SyncSourceFeedback::run(executor::TaskExecutor* executor,
                             BackgroundSync* bgsync,
                             ReplicationCoordinator* replCoord) {
    Client::initThread("SyncSourceFeedback");

    HostAndPort syncTarget;

    // keepAliveInterval indicates how frequently to forward progress in the absence of updates.
    Milliseconds keepAliveInterval(0);

    while (true) {  // breaks once _shutdownSignaled is true
        // 判斷節點的狀態， 確定要不要彙報位置
        {  
            while (!_positionChanged && !_shutdownSignaled) {
          
                MemberState state = replCoord->getMemberState();
                if (!(state.primary() || state.startup())) {
                    break;
                }
            }

           //是否程序退出
            if (_shutdownSignaled) {
                break;
            }

            _positionChanged = false;
        }

        {
            stdx::lock_guard<stdx::mutex> lock(_mtx);
            MemberState state = replCoord->getMemberState();
            if (state.primary() || state.startup()) {
                continue;
            }
        }

       // 源節點是否發生了變化
        const HostAndPort target = bgsync->getSyncTarget()；
        if (target.empty()) {
            if (syncTarget != target) {
                syncTarget = target;
            }
            // Loop back around again; the keepalive functionality will cause us to retry
            continue;
        }

        if (syncTarget != target) {
            LOG(1) << "setting syncSourceFeedback to " << target;
            syncTarget = target;
        }

        // 產生Reporter
        Reporter reporter(executor,
                          makePrepareReplSetUpdatePositionCommandFn(replCoord, syncTarget, bgsync),
                          syncTarget,
                          keepAliveInterval,
                          syncSourceFeedbackNetworkTimeoutSecs);
        {
            stdx::lock_guard<stdx::mutex> lock(_mtx);
            if (_shutdownSignaled) {
                break;
            }
            _reporter = &reporter;
        }
        //上報位置信息
        auto status = _updateUpstream(&reporter);
    }
}

Status SyncSourceFeedback::_updateUpstream(Reporter* reporter) {
    auto syncTarget = reporter->getTarget();

    auto triggerStatus = reporter->trigger();
    if (!triggerStatus.isOK()) {
        warning() << "unable to schedule reporter to update replication progress on " << syncTarget
                  << ": " << triggerStatus;
        return triggerStatus;
    }

    auto status = reporter->join();

    if (!status.isOK()) {
        log() << "SyncSourceFeedback error sending update to " << syncTarget << ": " << status;
    }

    // Sync source blacklisting will be done in BackgroundSync and SyncSourceResolver.

    return status;
}

Reporter

Reporter主要調用executor::TaskExecutor來完成command的request， callback以及response。
command是通過TopologyCoordinator::prepareReplSetUpdatePositionCommand來實現，然後通過Reporter::trigger()開始一個command， Reporter::join()等待結束。

Status Reporter::join() {
    stdx::unique_lock<stdx::mutex> lk(_mutex);
    _condition.wait(lk, [this]() { return !_isActive_inlock(); });
    return _status;
}

Status Reporter::trigger() {
    if (_keepAliveTimeoutWhen != Date_t()) {
        // Reset keep alive expiration to signal handler that it was canceled internally.
        invariant(_prepareAndSendCommandCallbackHandle.isValid());
        _keepAliveTimeoutWhen = Date_t();
        _executor->cancel(_prepareAndSendCommandCallbackHandle);
        return Status::OK();
    } else if (_isActive_inlock()) {
        _isWaitingToSendReporter = true;
        return Status::OK();
    }

    auto scheduleResult =
        _executor->scheduleWork([=](const executor::TaskExecutor::CallbackArgs& args) {
            _prepareAndSendCommandCallback(args, true);
        });

    _status = scheduleResult.getStatus();

    _prepareAndSendCommandCallbackHandle = scheduleResult.getValue();

    return _status;
}

void Reporter::_prepareAndSendCommandCallback(const executor::TaskExecutor::CallbackArgs& args,
                                              bool fromTrigger) {
    // Must call without holding the lock.
    auto prepareResult = _prepareCommand();

    _sendCommand_inlock(prepareResult.getValue(), _updatePositionTimeout);
    if (!_status.isOK()) {
        _onShutdown_inlock();
        return;
    }

    invariant(_remoteCommandCallbackHandle.isValid());
    _prepareAndSendCommandCallbackHandle = executor::TaskExecutor::CallbackHandle();
    _keepAliveTimeoutWhen = Date_t();
}

void Reporter::_sendCommand_inlock(BSONObj commandRequest, Milliseconds netTimeout) {
    LOG(2) << "Reporter sending slave oplog progress to upstream updater " << _target << ": "
           << commandRequest;

    auto scheduleResult = _executor->scheduleRemoteCommand(
        executor::RemoteCommandRequest(_target, "admin", commandRequest, nullptr, netTimeout),
        [this](const executor::TaskExecutor::RemoteCommandCallbackArgs& rcbd) {
            _processResponseCallback(rcbd);
        });

    _status = scheduleResult.getStatus();
    _remoteCommandCallbackHandle = scheduleResult.getValue();
}

從上面的代碼看到，基本上所有的功能都是通過executor::TaskExecutor* const _executor 來實現的，最終通過executor::TaskExecutor::scheduleRemoteCommand完成調用。

replSetUpdatePosition同步的內容

prepareReplSetUpdatePositionCommand是通過TopologyCoordinator::prepareReplSetUpdatePositionCommand來完成，主要是把std::vector _memberData裏面的ApplyTime， DurableTime以及其他的每個副本集節點信息發送給源節點。
如下是日誌裏面打印的該command信息：

MongoDB更新同步的oplog位置

SyncSourceFeedback

Reporter

replSetUpdatePosition同步的內容

MooseFS的常見問題與操作

性能問題的定位

分佈式系統的性能優化方法

MongoDB中併發控制（MVCC）

MongoDB的事務實現

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結