在MongoDB中, 副本集節點之間爲了保持一致性, 需要通過oplog的同步與回放來進行。MongoDB採用的是節點向源節點主動拉取的方式, 從源節點拉取oplog, 目的節點需要及時通知其他節點它的最新的同步到的時間點。
如上圖所示, 2個Secondary從Primary上面拉取oplog,每當secondary的時間點發生改變, 會調用replSetUpdatePosition來告訴
在mongod內, 有一個專門的名字爲SyncSourceFeedback的線程,它負責向與節點彙報當前節點的進度, Primary本身是不需要的, 因爲它不需向其他節點同步數據,當然不存儲數據的節點, 例如Arbiter類型的節點也不需要。 有2個類專門負責這項任務: SyncSourceFeedback與Reporter, 其調用關係如下圖所示:
SyncSourceFeedback
SyncSourceFeedback負責:
- 節點是否需要向其源節點彙報位置;
- 節點角色的轉換, 比如一個節點從secondary轉化爲primary, 就不需要繼續彙報位置了;
- 同步源發生了切換, 比如原本從A節點同步, 後來變成從B同步;
- 調用Reporter彙報位置, 它本身將具體的彙報工作交給Reporter來做;
void SyncSourceFeedback::run(executor::TaskExecutor* executor,
BackgroundSync* bgsync,
ReplicationCoordinator* replCoord) {
Client::initThread("SyncSourceFeedback");
HostAndPort syncTarget;
// keepAliveInterval indicates how frequently to forward progress in the absence of updates.
Milliseconds keepAliveInterval(0);
while (true) { // breaks once _shutdownSignaled is true
// 判斷節點的狀態, 確定要不要彙報位置
{
while (!_positionChanged && !_shutdownSignaled) {
MemberState state = replCoord->getMemberState();
if (!(state.primary() || state.startup())) {
break;
}
}
//是否程序退出
if (_shutdownSignaled) {
break;
}
_positionChanged = false;
}
{
stdx::lock_guard<stdx::mutex> lock(_mtx);
MemberState state = replCoord->getMemberState();
if (state.primary() || state.startup()) {
continue;
}
}
// 源節點是否發生了變化
const HostAndPort target = bgsync->getSyncTarget();
if (target.empty()) {
if (syncTarget != target) {
syncTarget = target;
}
// Loop back around again; the keepalive functionality will cause us to retry
continue;
}
if (syncTarget != target) {
LOG(1) << "setting syncSourceFeedback to " << target;
syncTarget = target;
}
// 產生Reporter
Reporter reporter(executor,
makePrepareReplSetUpdatePositionCommandFn(replCoord, syncTarget, bgsync),
syncTarget,
keepAliveInterval,
syncSourceFeedbackNetworkTimeoutSecs);
{
stdx::lock_guard<stdx::mutex> lock(_mtx);
if (_shutdownSignaled) {
break;
}
_reporter = &reporter;
}
//上報位置信息
auto status = _updateUpstream(&reporter);
}
}
Status SyncSourceFeedback::_updateUpstream(Reporter* reporter) {
auto syncTarget = reporter->getTarget();
auto triggerStatus = reporter->trigger();
if (!triggerStatus.isOK()) {
warning() << "unable to schedule reporter to update replication progress on " << syncTarget
<< ": " << triggerStatus;
return triggerStatus;
}
auto status = reporter->join();
if (!status.isOK()) {
log() << "SyncSourceFeedback error sending update to " << syncTarget << ": " << status;
}
// Sync source blacklisting will be done in BackgroundSync and SyncSourceResolver.
return status;
}
Reporter
Reporter主要調用executor::TaskExecutor來完成command的request, callback以及response。
command是通過TopologyCoordinator::prepareReplSetUpdatePositionCommand來實現, 然後通過Reporter::trigger()開始一個command, Reporter::join()等待結束。
Status Reporter::join() {
stdx::unique_lock<stdx::mutex> lk(_mutex);
_condition.wait(lk, [this]() { return !_isActive_inlock(); });
return _status;
}
Status Reporter::trigger() {
if (_keepAliveTimeoutWhen != Date_t()) {
// Reset keep alive expiration to signal handler that it was canceled internally.
invariant(_prepareAndSendCommandCallbackHandle.isValid());
_keepAliveTimeoutWhen = Date_t();
_executor->cancel(_prepareAndSendCommandCallbackHandle);
return Status::OK();
} else if (_isActive_inlock()) {
_isWaitingToSendReporter = true;
return Status::OK();
}
auto scheduleResult =
_executor->scheduleWork([=](const executor::TaskExecutor::CallbackArgs& args) {
_prepareAndSendCommandCallback(args, true);
});
_status = scheduleResult.getStatus();
_prepareAndSendCommandCallbackHandle = scheduleResult.getValue();
return _status;
}
void Reporter::_prepareAndSendCommandCallback(const executor::TaskExecutor::CallbackArgs& args,
bool fromTrigger) {
// Must call without holding the lock.
auto prepareResult = _prepareCommand();
_sendCommand_inlock(prepareResult.getValue(), _updatePositionTimeout);
if (!_status.isOK()) {
_onShutdown_inlock();
return;
}
invariant(_remoteCommandCallbackHandle.isValid());
_prepareAndSendCommandCallbackHandle = executor::TaskExecutor::CallbackHandle();
_keepAliveTimeoutWhen = Date_t();
}
void Reporter::_sendCommand_inlock(BSONObj commandRequest, Milliseconds netTimeout) {
LOG(2) << "Reporter sending slave oplog progress to upstream updater " << _target << ": "
<< commandRequest;
auto scheduleResult = _executor->scheduleRemoteCommand(
executor::RemoteCommandRequest(_target, "admin", commandRequest, nullptr, netTimeout),
[this](const executor::TaskExecutor::RemoteCommandCallbackArgs& rcbd) {
_processResponseCallback(rcbd);
});
_status = scheduleResult.getStatus();
_remoteCommandCallbackHandle = scheduleResult.getValue();
}
從上面的代碼看到, 基本上所有的功能都是通過executor::TaskExecutor* const _executor 來實現的, 最終通過executor::TaskExecutor::scheduleRemoteCommand完成調用。
replSetUpdatePosition同步的內容
prepareReplSetUpdatePositionCommand是通過TopologyCoordinator::prepareReplSetUpdatePositionCommand來完成,主要是把std::vector _memberData裏面的ApplyTime, DurableTime以及其他的每個副本集節點信息發送給源節點。
如下是日誌裏面打印的該command信息: