Quorum請求是轉發給Leader處理,並且需要得一個Follower Quorum確認的請求。這些請求包括:
1)znode的寫操作(OpCode.create,OpCode.delete,OpCode.setData,OpCode.setACL)
2)Session的創建和關閉操作(OpCode.createSession和OpCode.closeSession)
3)OpCode.multi操作。
本博文分析了Client, Follower和Leader協同完成Quorum請求的過程。另外需注意的是OpCode.sync請求也需要轉發給Leader, 但不需要得到一個Follower Quorum確認。本文也會提到OpCode.sync操作。
數據結構
Request類型對象:Server內部傳遞的數據結構。
屬性 |
說明 |
sessionId |
會話ID |
cxid |
客戶端事務ID |
type |
操作類型, 如OpCode.setData |
request |
請求Record對象,如SetDataRequest |
cnxn |
Server和Client端的連接對象 |
hdr |
請求事務頭TxnHeader |
txn |
請求事務體Record,如OpCode.setData請求,則是SetDataTxn類型對象 |
zxid |
ZooKeeper事務ID |
authInfo |
認證信息 |
createTime |
創建時間 |
owner |
所有者 |
e |
處理過程中的異常 |
QuorumPacket類型對象:用於ZooKeeper服務器之間傳遞的數據包。
屬性 |
說明 |
type |
QuorumPacket類型,如Leader.REQUEST和Leader.ACK等 |
zxid |
ZooKeeper事務ID |
data |
數據包的數據:
在Leader.REQUEST中,數據依次如下:
Request.sessionId
Request.cxid
Request.type
Request.request
在Leader.PROPOSAL中,數據依次如下:
Request.hdr
Request.txn
在Leader.ACK中,爲null
在Leader.COMMIT中,爲null |
authinfo |
認證信息 |
Quorum請求流程
假設拓撲結構如下圖,Client A和Follower A建立連接。
數據流程圖如下。在圖中,連接線說明前的數字表示事件發的生時序,主時序是直接使用一個數字表示,並且數字越小表示越早發生(如1 Client Request是在2 Request之前發生)。對於和主時序併發的操作使用主時序序號後加上一個括號括起來的數字表示,如7(1)-n Request指和7 Request是併發的。7(1)-n中n表示以7(1)開頭的操作時序。
我們從數據流程圖中Step 1講起:Client A 發起一個Quorum請求給Follower A。
【Client A, Step 1】Client A調用Quorum請求對應的方法:
如調用Zookeeper的構造函數,會發起OpCode.createSession請求,
如調用Zookeeper.setData方法,會發起OpCode.setData操作。
最終會調用ClientCnxn.submitRequest方法將請求放入outgoingQueue隊列中,並阻塞等待Follower A反饋。而ClientCnxn.SendThread線程會從outgoingQueue中取出請求,併發送給Follower A。
下面代碼Zookeeper.setData方法: Client A構建對象發送給Follower A
public Stat setData( final String
path, byte data[], int version)
throws KeeperException,
InterruptedException
{
final String
clientPath = path;
PathUtils. validatePath(clientPath);
//通過傳入的path構造完整serverPath
final String
serverPath = prependChroot(clientPath);
//構造一個Request頭
RequestHeader h = new RequestHeader();
//設置類型爲setData
h.setType(ZooDefs.OpCode.setData);
//構造一個SetData請求體
SetDataRequest request = new SetDataRequest();
//設置需要修改node的serverPath
request.setPath(serverPath);
//設置需要修改的node的data
request.setData(data);
//設置需要修改的node的version
request.setVersion(version);
//構建SetDataResponse對象
SetDataResponse response = new SetDataResponse();
//提交請求,並等待返回結果
ReplyHeader r = cnxn.submitRequest(h,
request, response, null);
//如果r.getErr()不能0,則表示有錯誤,拋出異常
if (r.getErr()
!= 0) {
throw KeeperException.create(KeeperException.Code. get(r.getErr()),
clientPath);
}
return response.getStat();
}
【Follower A, Step 2,3】Follower A的NIOServerCnxn類接到了Client A的請求,會調用ZookeeperServer.processPacket方法。該方法會構建一個Request對象,並調用第一個處理器FollowerRequestProcessor的processRequest方法。該方法將Request對象放入FollowerRequestProcessor.queuedRequests隊列中。FollowerRequestProcessor處理器線程會循環從FollowerRequestProcessor.queuedRequests隊列中取出Request對象,並繼續下面步驟:
1)調用下一個處理器CommitProcessor的processRequest方法。該方法將Request對象放入CommitProcessor.queuedRequests隊列中;
2)通過Request.type判斷Request類型。若發現是一個Quorum請求,會直接調用Learner.request(request)方法。該方法將Request對象封裝成一個Leader.Request的Quorum數據包,併發送給Leader。
OpCode.sync操作也將調用Learner.request方法將請求轉發給Leader,但在這之前會先將Request對象加入到pendingSyncs隊列中。
FollowerRequestProcessor的run方法如下:
public void run()
{
try {
while (!finished )
{
//從queuedRequests隊列中取出Request對象
Request request = queuedRequests .take();
if (LOG .isTraceEnabled())
{
ZooTrace. logRequest( LOG, ZooTrace.CLIENT_REQUEST_TRACE_MASK
,
'F' , request, "" );
}
//當request是Request.requestOfDeath,一個poison pill, 就退出while循環,
//並結束FollowerRequestProcessor線程
if (request
== Request.requestOfDeath) {
break ;
}
//我們在提交這個request到leader之前,把這個request傳遞到下一個處理器。
//這樣我們就準備好從Leader那得到Response
nextProcessor.processRequest(request);
//只有Quorum操作和sync操作纔會調用Follower.request方法, 轉發Leader.REQUEST數據包給Leader
//sync操作和 Quorum操作有一些不同,
//我們需要保持跟蹤這個sync操作對於的Follower已經掛起,所有我們將它加入pendingSyncs隊列中。
switch (request.type
) {
case OpCode.sync:
//將OpCode.sync放入pendingSyncs隊列中
zks.pendingSyncs .add(request);
zks.getFollower().request(request);
break ;
case OpCode.create:
case OpCode.delete:
case OpCode.setData:
case OpCode.setACL:
case OpCode.createSession:
case OpCode.closeSession:
case OpCode.multi:
//Quorum請求,直接調用Folloer.request方法
zks.getFollower().request(request);
break ;
}
}
} catch (Exception
e) {
LOG.error( "Unexpected exception causing
exit" , e);
}
LOG.info( "FollowerRequestProcessor exited
loop!" );
}
【Leader A, Step 4】Leader A的LearnerHandler線程會循環讀取從Learner獲得的Quorum數據包。如果數據包是Learner.REQUEST類型,則會解析Quorum數據包的內容,檢查操作類型。
如果操作類型不是OpCode.sync, 則會構造Request對象。並調用ZooKeeperServer.submitRequest方法(和上面Follower接收到請求所使用的submitRequest方法是同一個方法),並最終會調用第一個處理器PrepRequestProcessor的submitRequest方法,將Request對象放入PrepRequestProcessor.submittedRequests隊列中。
如果操作類型是OpCode.sync, 會構造Request類型的子類LearnerSyncRequest對象,並同樣調用PrepRequestProcessor的submitRequest方法。
LearnerHandler.run方法中對Leader.REQUEST數據包的處理代碼如下:
public void run
() {
......
case Leader.REQUEST :
bb = ByteBuffer. wrap(qp .getData());
//從QuorumPacket中讀取sesssionId
sessionId = bb.getLong();
//從QuorumPacket中讀取 cxid
cxid = bb.getInt();
//從QuorumPacket中讀取操作類型
type = bb.getInt();
bb = bb.slice();
Request si;
//如果操作Code的類型是OpCode.sync,則構造LearnerSyncRequest對象
if (type == OpCode.sync){
si = new LearnerSyncRequest( this , sessionId, cxid, type ,
bb, qp.getAuthinfo());
}
//如果操作Code的類型不是OpCode.sync, 則構造Request對象
else {
si = new Request( null , sessionId, cxid, type ,
bb, qp.getAuthinfo());
}
//設置owner
si.setOwner( this );
//提交請求
leader.zk .submitRequest(si);
break ;
......
}
PrepRequestProcessor處理器線程會從PrepRequestProcessor.submittedRequests隊列中取出Request對象,並根據Request類型構建TxnHeader和Record對象,然後分別賦給Request.hdr和Request.txn。之後會調用下一個處理器ProposalRequestProcessor的processRequest方法,將Request對象傳遞給處理器ProposalRequestProcessor。(如果發現有異常會則會創建一個錯誤Record類型對象)
PrepRequestProcessor的run方法如下:
public void run() {
try {
while (true )
{
//從submittedRequests隊列中取去第一個request對象
Request request = submittedRequests .take();
long traceMask
= ZooTrace.CLIENT_REQUEST_TRACE_MASK;
//如果是OpCode.ping操作,則將traceMask設置成ZooTrace. CLIENT_PING_TRACE_MASK
if (request.type
== OpCode.ping) {
traceMask = ZooTrace. CLIENT_PING_TRACE_MASK;
}
if (LOG .isTraceEnabled())
{
ZooTrace. logRequest( LOG, traceMask, 'P' ,
request, "" );
}
//如果request是一個requestOfDeath, 則退出while循環。
if (Request.requestOfDeath
== request) {
break ;
}
//處理請求
pRequest(request);
}
} catch (InterruptedException
e) {
LOG.error( "Unexpected interruption" ,
e);
} catch (RequestProcessorException e)
{
if (e.getCause() instanceof XidRolloverException)
{
LOG.info(e.getCause().getMessage());
}
LOG.error( "Unexpected exception" ,
e);
} catch (Exception
e) {
LOG.error( "Unexpected exception" ,
e);
}
LOG.info( "PrepRequestProcessor exited loop!" );
}
PrepRequestProcessor的pRequest2Txn方法,該方法會在pRequest方法中調用,構建TxnHeader和Record對象。下面是關於OpCode.setData請求的代碼:
protected void pRequest2Txn( int type, long zxid, Request request, Record record, boolean deserialize)
throws KeeperException, IOException, RequestProcessorException
{
request.hdr = new TxnHeader(request.sessionId
, request.cxid, zxid,
zks.getTime(), type);
switch (type)
{
.....
case OpCode.setData:
//檢查session
zks.sessionTracker .checkSession(request.sessionId, request.getOwner());
//將record轉成SetDataRequest類型
SetDataRequest setDataRequest = ( SetDataRequest)record;
if (deserialize)
//將Request.reques數據反序列化成setDataRequest對象
ByteBufferInputStream.byteBuffer2Record(request. request, setDataRequest);
//獲取需要需要修改的znode的path
path = setDataRequest.getPath();
//獲取內存數據中獲取path對於的znode信息
nodeRecord = getRecordForPath( path);
//檢查對 znode是否有寫權限
checkACL( zks, nodeRecord .acl
, ZooDefs.Perms.WRITE,
request.authInfo);
//獲取客戶端設置的版本號
version = setDataRequest.getVersion();
//獲取節點當前版本號
int currentVersion
= nodeRecord.stat.getVersion();
//如果客戶端設置的版本號不是-1,且不等於當前版本號,則拋出KeeperException.BadVersionException異常
if (version !=
-1 && version != currentVersion) {
throw new KeeperException .BadVersionException(path);
}
//version等於當前版本加1
version = currentVersion + 1;
//構建SetDataTxn對象,並賦給request.txn
request. txn = new SetDataTxn( path,
setDataRequest.getData(), version);
//拷貝nodeRecord
nodeRecord = nodeRecord.duplicate(request.hdr.getZxid());
//將nodeRecord的當前版本號設置爲version
nodeRecord.stat.setVersion( version);
//將nodeRecord放入outstandingChanges
//path和nodeRecord map放入outstandingChangesForPath
addChangeRecord( nodeRecord);
break ;
......
}
}
【Leader A, Step 5,6】處理器ProposalRequestProcessor會先判斷Request對象是否是LearnerSyncRequest類型。
如果不是LearnerSyncRequest類型(也就是Quorum請求),會按如下步驟執行:
1)調用下一個處理器CommitProcessor的processRequest方法,將Request對象放入CommitProcessor.queuedRequests隊列中;
2)將proposal發送到所有的Follower;
3)調用SyncRequestProcessor處理器的processRequest方法。該方法會將請求放入SyncRequestProcessor.queuedRequests隊列中。(【Leader A, Step 7(1)】SyncRequestProcessor線程會記錄Log, 然後傳遞給SendAckRequestProcessor。SendAckRequestProcessor會發送一個Leader.ACK的Quorum數據包給自己)
如果是LearnerSyncRequest類型,說明該請求是OpCode.sync操作,則會直接調用Leader.processSync方法。
ProposalRequestProcessor的processRequest方法如下:
public void processRequest(Request request) throws RequestPrzocessorException {
//如果是sync操作,則調用Leader.processSync方法
if (request instanceof LearnerSyncRequest){
zks.getLeader().processSync(( LearnerSyncRequest)request);
}
//如果不是sync操作
else {
//傳遞到下一個處理器
nextProcessor.processRequest(request);
if (request.hdr
!= null) {
// We need to sync and get consensus on any transactions
try {
//發送proposal給所有的follower
zks.getLeader().propose(request);
} catch (XidRolloverException e)
{
throw new RequestProcessorException (e.getMessage(),
e);
}
//調用SyncRequestProcessor處理器的processRequest方法
syncProcessor.processRequest(request);
}
}
}
Leader的propose方法如下:
/**
* 創建Proposal,併發送給所有的members
* @param request
* @return the
proposal that is queued to send to all the members
*/
public Proposal propose(Request request) throws XidRolloverException {
//解決 rollover的問題,所有低32位重置表示一個新的leader選擇。強制重新選擇Leader。
//See ZOOKEEPER- 1277
if ((request.zxid
& 0xffffffffL) == 0xffffffffL) {
String msg =
"zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start";
shutdown(msg);
throw new XidRolloverException (msg);
}
//將request.hdr和request.txn序列化到boa中
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryOutputArchive boa = BinaryOutputArchive. getArchive(baos);
try {
request.hdr.serialize(boa, "hdr" );
if (request.txn
!= null) {
request. txn.serialize(boa, "txn" );
}
baos.close();
} catch (IOException e)
{
LOG.warn( "This really should be impossible" ,
e);
}
//構造Leader.PROPOSAL的QuorumPacket
QuorumPacket pp = new QuorumPacket(Leader.PROPOSAL,
request.zxid,
baos.toByteArray(), null );
//構造Proposal對象
Proposal p = new Proposal();
p.packet = pp;
p.request = request;
synchronized (this )
{
if (LOG .isDebugEnabled())
{
LOG.debug( "Proposing:: " +
request);
}
//獲得packet的 zxid, 並放入outstandingProposals 未完成Proposal
Map中
lastProposed = p.packet.getZxid();
//將p加入到outstandingProposals Map中
outstandingProposals.put( lastProposed , p);
//發送給所有的Follower
sendPacket(pp);
}
return p;
}
Follower.processPacket方法如下:
/**
* 檢查在qp中接收到的packet, 並根據它的內容進行分發
* @param qp
* @throws IOException
*/
protected void processPacket(QuorumPacket qp) throws IOException{
switch (qp.getType())
{
case Leader.PING:
ping(qp);
break ;
case Leader.PROPOSAL:
TxnHeader hdr = new TxnHeader();
//從數據包 qp中反序列化出 txn
Record txn = SerializeUtils . deserializeTxn(qp.getData(), hdr);
if (hdr.getZxid()
!= lastQueued + 1) {
LOG.warn( "Got zxid 0x"
+ Long. toHexString(hdr.getZxid())
+ "
expected 0x"
+ Long. toHexString(lastQueued +
1));
}
lastQueued = hdr.getZxid();
fzk.logRequest(hdr, txn);
break ;
case Leader.COMMIT:
fzk.commit(qp.getZxid());
break ;
case Leader.UPTODATE:
LOG.error( "Received an UPTODATE message after
Follower started");
break ;
case Leader.REVALIDATE:
revalidate(qp);
break ;
case Leader.SYNC:
fzk.sync();
break ;
}
}
FollowerZooKeeperServer的logRequest方法如下:
public void logRequest(TxnHeader hdr, Record txn)
{
//構建Request對象
Request request = new Request( null,
hdr.getClientId(), hdr.getCxid(),
hdr.getType(), null , null );
request.hdr = hdr;
request.txn = txn;
request.zxid = hdr.getZxid();
//如果request.zxid的低32爲不全爲0, 則加入pendingTxns隊列中
if ((request.zxid
& 0xffffffffL) != 0) {
pendingTxns.add(request);
}
//調用SyncRequestProcessor處理這個request
syncProcessor.processRequest(request);
}
【All Followers, Step 8】處理器SyncRequestProcessor的功能和Leader的SyncRequestProcessor一樣,將請求記錄到日誌中,然後將Request請求傳遞給下一個處理器。不過Follower的下一個處理器是SendAckRequestProcessor。該處理器會構建一個Leader.ACK的Quorum數據包,併發送給Leader。
SendAckRequestProcessor的processRequest方法如下:
public void processRequest(Request si)
{
if (si.type
!= OpCode.sync){
//構建Leader.ACK Quorum包
QuorumPacket qp = new QuorumPacket(Leader.ACK,
si.hdr.getZxid(), null ,
null );
try {
//將Leader.ACK Quorum數據包發送給Leader
learner.writePacket(qp, false);
} catch (IOException e)
{
LOG.warn( "Closing connection to leader, exception
during packet send", e);
try {
if (!learner .sock
.isClosed()) {
learner.sock .close();
}
} catch (IOException e1)
{
// Nothing to do, we are shutting things down, so an exception here is irrelevant
LOG.debug( "Ignoring error closing the connection" ,
e1);
}
}
}
}
【Leader A, Step 9】LearnerHandler線程循環讀取從Learner那獲得的Quorum數據包。當發現是從Follower傳輸過來的Leader.ACK類型數據包,則會調用Leader.processAck方法進行處理。在Leader.processAck方法中,若已經有一個Follower Quorom發送了Leader.ACK數據包,則會執行下列三步驟:
1)調用Leader.commit方法,發送Leader.COMMIT類型Quorum數據包給所有 Follower;
2)調用Leader.inform 方法,通知所有的Observer;
3)調用處理器CommitRequestProcessor.commit 方法,將Request對象放到CommitRequestProcessor.committedRequests隊列中。(【Leader A, Step 10(1)-1,10(1)-2】CommitProcessor線程會從CommitRequestProcessor.committedRequests隊列中取出提交的Request對象,發現是和nextPending是一致的,然後提交的Request對象內容替換nextPending的內容,並將nextPending放入到toProcess隊列中。下一次循環會從toProcess隊列中取出nextPending,然後調用下一個處理器Leader.ToBeAppliedRequestProcessor的processRequest方法。該方法會調用下一個處理器FinalRequestProcessor的processRequest方法。FinalRequestProcessor.processRequest方法並根據Request對象中的操作更新內存中Session信息或者znode數據。)
Leader的processAck方法如下:
/**
* 保存某個proposal接收到的Ack數量
*
* @param zxid
* 被髮送的proposal的zxid
* @param followerAddr
*/
synchronized public void processAck( long sid, long zxid, SocketAddress followerAddr)
{
if (LOG .isTraceEnabled())
{
LOG.trace( "Ack zxid: 0x{}" ,
Long.toHexString (zxid));
for (Proposal p
: outstandingProposals .values()) {
long packetZxid
= p.packet.getZxid();
LOG.trace( "outstanding proposal: 0x{}" ,
Long. toHexString(packetZxid));
}
LOG.trace( "outstanding proposals all" );
}
//如果 zxid的低32位都是0, 則直接return
if ((zxid
& 0xffffffffL) == 0) {
/*
* We no longer process NEWLEADER ack by
this method. However,
* the learner sends ack back
to the leader after it gets UPTODATE
* so we just ignore the message.
*/
return ;
}
//如果沒有未完成的proposal, 則直接return
if (outstandingProposals .size()
== 0) {
if (LOG .isDebugEnabled())
{
LOG.debug( "outstanding is 0" );
}
return ;
}
//如果最近提交的proposal的 zxid比ack 的proposal的zxid大,說明 ack的proposal已經提交了,
則直接return
if (lastCommitted >=
zxid) {
if (LOG .isDebugEnabled())
{
LOG.debug( "proposal has already been committed,
pzxid: 0x{} zxid: 0x{}",
Long. toHexString( lastCommitted), Long.toHexString(zxid));
}
// The proposal has already been committed
return ;
}
//根據 zxid取出proposal對象
Proposal p = outstandingProposals .get(zxid);
//如果在未完成列表outstandingProposal中沒有找到 zxid對於的proposal,
則說明該 zxid對於的Proposal還沒有處理。
if (p
== null) {
LOG.warn( "Trying to commit future proposal: zxid
0x{} from {}",
Long. toHexString(zxid), followerAddr );
return ;
}
//將發送 ack的Follower的sid放入Proposal.ackSet集合中
p. ackSet.add(sid);
if (LOG .isDebugEnabled())
{
LOG.debug( "Count for zxid: 0x{} is {}" ,
Long. toHexString(zxid), p.ackSet.size());
}
//如果ackSet集合中已經包含了一個 Quorum
if (self .getQuorumVerifier().containsQuorum(p.ackSet)){
if (zxid
!= lastCommitted +1) {
LOG.warn( "Commiting zxid 0x{} from {} not first!" ,
Long. toHexString(zxid), followerAddr );
LOG.warn( "First is 0x{}" ,
Long.toHexString (lastCommitted + 1));
}
//從outstandingProposals中刪除掉這個 zxid對於的proposal對象
outstandingProposals.remove(zxid);
//如果p.request不等於null, 則將這個proposal放入toBeApplied列表中
if (p.request
!= null) {
toBeApplied.add(p);
}
if (p.request
== null) {
LOG.warn( "Going to commmit null request for proposal:
{}", p);
}
//發送Leader.COMMIT 包給所有的Follower
commit(zxid);
//通知所有的Observer
inform(p);
//調用處理器CommitProcessor的commit方法
zk. commitProcessor.commit(p.request );
//如果有sync等着等待這個commit的 zxid,發送Leader.SYNC數據包給對應的Follower
if (pendingSyncs .containsKey(zxid)){
for (LearnerSyncRequest r: pendingSyncs .remove(zxid))
{
sendSync(r);
}
}
}
}
【All Follower, Step 10】Follower.followLeader方法會循環讀取從Leader的傳輸過來的Quorum數據包,並調用Follower.processPacket方法。該方法會根據數據的內容來分發。當發現是Leader.COMMIT類型的Quorum數據包,則會根據Quorum數據包的內容構造一個Request對象,並調用FollowerZooKeeperServer.commit方法。該方法最終會調用處理器CommitRequestProcessor.commit方法,將Request對象放到CommitRequestProcessor.committedRequests隊列中。
FollowerZooKeeperServer.commit方法如下:
/**
*當接收到一個COMMIT消息,這個方法會被調用。該方法會將COMMIT消息
*中的zxid和pendingTxns隊列中的第一個對象的zxid進行匹配。如何相同,則
*傳遞給處理器CommitProcessor進行commit
* @param zxid - must
correspond to the head of pendingTxns if it exists
*/
public void commit( long zxid
) {
if (pendingTxns .size()
== 0) {
LOG.warn( "Committing " +
Long. toHexString (zxid)
+ "
without seeing txn" );
return ;
}
//取出pendingTxns第一個元素的 zxid
long firstElementZxid
= pendingTxns .element().zxid;
//如果第一個元素的 zxid不等於COMMIT消息中的 zxid,
則退出程序
if (firstElementZxid
!= zxid) {
LOG.error( "Committing zxid 0x" +
Long. toHexString (zxid)
+ "
but next pending txn 0x"
+ Long. toHexString(firstElementZxid));
System. exit(12);
}
//pendingTxns取出,並刪除第一個元素
Request request = pendingTxns .remove();
//將從pendingTxns隊列中取出的第一個 reqeust對象傳遞給CommitProcessor處理器進行commit
commitProcessor.commit(request);
}
【All Follower, Step 11】處理器CommitProcessor線程會處理提交的Request對象。
如果是Follower A, nextPending對象是和提交Request對象是一致的,所以將提交Request對象內容替換nextPending中的內容,並放入toProcess隊列中。在下一個循環會從toProcess隊列中取出並傳遞到下一個迭代器FinalRequestProcessor中。(和Leader中的CommitProcessor線程處理邏輯是一樣的)
如果不是Follower A, 則可能有下面兩種情況:
1)queuedRequest隊列爲empty且nextPending爲null, 也就是這個Follower沒有自己轉發的request正在處理;
2)nextPending不爲null, 也就是有轉發的request正在處理。但nextPending對象一定和提交的Request對象是不一致的。
不管是哪一種,都會直接將提交的Request對象加入到toProcess隊列中。處理器CommitProcessor線程會從中取出並傳遞到下一個迭代器FinalRequestProcessor中。
CommitProcessor.run方法如下:
public void run()
{
try {
Request nextPending = null;
while (!finished )
{
int len
= toProcess .size();
for (int i
= 0; i < len; i++) {
nextProcessor.processRequest( toProcess .get(i));
}
//當將所有的request傳遞到下一個處理器FinalRequestProcessor後,將toProcess清空
toProcess.clear();
synchronized (this )
{
//如果queuedRequests隊列爲空,或者nextPending爲null, 或者committedRequest隊列爲控股,則等待。
if ((queuedRequests .size()
== 0 || nextPending != null )
&& committedRequests.size() == 0)
{
wait();
continue ;
}
//第一步,檢查這個commit是否爲了pending request而來
//如果commit request到來,但是queuedRequests爲空,或者nextPending爲null
if ((queuedRequests .size()
== 0 || nextPending != null )
&& committedRequests.size() > 0) {
Request r = committedRequests .remove();
/*
* We match with nextPending so that we can move to the
* next request when it is committed. We also want to
* use nextPending because it has the cnxn member
set
* properly.
*/
//如果nextPending不等於null,
if (nextPending
!= null
&& nextPending. sessionId == r.sessionId
&& nextPending. cxid == r.cxid ) {
// we want to send our version of the request.
// the pointer to the connection in the request
nextPending.hdr = r. hdr;
nextPending. txn = r.txn ;
nextPending. zxid = r.zxid ;
toProcess.add(nextPending);
nextPending = null ;
} else {
// this request came from someone else so just
// send the commit packet
//如果這個請求來自於其他人,則直接加入到toProcess中
//sync請求,或者不是Follower發起的請求
toProcess.add(r);
}
}
}
//如果我們還沒有匹配上pending request, 則返回繼續等待
if (nextPending
!= null) {
continue ;
}
synchronized (this )
{
//處理queuedRequests中下一個請求
while (nextPending
== null && queuedRequests.size()
> 0) {
//從queuedRequests中取出第一個,並將其從隊列中刪除
Request request = queuedRequests .remove();
switch (request.type
) {
case OpCode.create:
case OpCode.delete:
case OpCode.setData:
case OpCode.multi:
case OpCode.setACL:
case OpCode.createSession:
case OpCode.closeSession:
//如果不是OpCode.sync操作,則將request對象賦予nextPending
nextPending = request;
break ;
case OpCode.sync:
if (matchSyncs )
{
nextPending = request;
}
//如果matchSyncs等於false, 則直接加入到toProcess, 不等待Commit
else {
toProcess.add(request);
}
break ;
default :
toProcess.add(request);
}
}
}
}
} catch (InterruptedException
e) {
LOG.warn( "Interrupted exception while waiting" ,
e);
} catch (Throwable
e) {
LOG.error( "Unexpected exception causing CommitProcessor
to exit", e);
}
LOG.info( "CommitProcessor exited loop!" );
}
【All Follower, Step 12】處理器FinalRequestProcessor更新內存中Session信息或者znode數據。
對於Follower A,將會構建Reponse,並返回Response給Client A;
對於其它的Follower, 不需要返回Response給客戶端,直接返回。
FinalRequestProcessor.processRequest方法如下。其中構造Response部分,只給出了SetData請求相關的代碼。
public void processRequest(Request request)
{
if (LOG .isDebugEnabled())
{
LOG.debug( "Processing request:: " +
request);
}
// request.addRQRec(">final");
long traceMask
= ZooTrace.CLIENT_REQUEST_TRACE_MASK;
if (request.type
== OpCode.ping) {
traceMask = ZooTrace. SERVER_PING_TRACE_MASK;
}
if (LOG .isTraceEnabled())
{
ZooTrace. logRequest( LOG, traceMask, 'E' ,
request, "" );
}
ProcessTxnResult rc = null ;
synchronized (zks.outstandingChanges
) {
//循環從outstandingChanges中取出小於等於request.zxid的ChangeRecord,並刪除
while (!zks
.outstandingChanges .isEmpty()
&& zks.outstandingChanges .get(0).zxid <= request.zxid) {
ChangeRecord cr = zks.outstandingChanges .remove(0);
if (cr.zxid
< request.zxid) {
LOG.warn( "Zxid
outstanding "
+ cr. zxid
+ "
is less than current " + request.zxid );
}
if (zks
.outstandingChangesForPath .get(cr.path) == cr) {
zks.outstandingChangesForPath .remove(cr.path);
}
}
//如果request.hdr不等於null, 則在內存 Datatree中處理這個請求
if (request.hdr
!= null) {
TxnHeader hdr = request. hdr;
Record txn = request. txn;
rc = zks.processTxn(hdr, txn);
}
//檢測這個request的類型是否是需要 Quorum Ack 的requrest
//如果是,加入到committedProposal中
if (Request.
isQuorum(request.type)) {
zks.getZKDatabase().addCommittedProposal(request);
}
}
if (request.hdr
!= null &&
request.hdr.getType() == OpCode.closeSession ) {
ServerCnxnFactory scxn = zks.getServerCnxnFactory();
if (scxn
!= null &&
request.cnxn == null )
{
scxn.closeSession(request. sessionId);
return ;
}
}
//如果request的 cnxn爲null, 則直接return
if (request.cnxn
== null) {
return ;
}
//下面是構造response
ServerCnxn cnxn = request. cnxn;
String lastOp = "NA" ;
zks.decInProcess();
Code err = Code . OK;
Record rsp = null;
boolean closeSession
= false;
try {
if (request.hdr
!= null &&
request.hdr.getType() == OpCode.error) {
throw KeeperException.create( KeeperException.Code.
get( (
(ErrorTxn) request. txn) .getErr()));
}
KeeperException ke = request.getException();
if (ke
!= null &&
request.type != OpCode. multi) {
throw ke;
}
if (LOG .isDebugEnabled())
{
LOG.debug( "{}" ,request);
}
switch (request.type
) {
......
case OpCode.setData:
{
lastOp = "SETD" ;
//構建SetDataResponse
rsp = new SetDataResponse( rc.stat);
err = Code. get(rc .err);
break ;
}
......
} catch (SessionMovedException
e) {
cnxn.sendCloseSession();
return ;
} catch (KeeperException
e) {
//如果有KeeperException,則設置err
err = e.code();
} catch (Exception
e) {
// log at error level as we are returning a marshalling
// error to the user
LOG.error( "Failed
to process " + request, e);
StringBuilder sb = new StringBuilder();
ByteBuffer bb = request. request;
bb.rewind();
while (bb.hasRemaining())
{
sb.append(Integer. toHexString(bb.get() & 0xff));
}
LOG.error( "Dumping
request buffer: 0x" + sb.toString());
err = Code. MARSHALLINGERROR;
}
//讀取最後 zxid
long lastZxid
= zks.getZKDatabase().getDataTreeLastProcessedZxid();
ReplyHeader hdr =
new ReplyHeader(request.
cxid, lastZxid, err.intValue());
zks.serverStats().updateLatency(request.createTime);
cnxn.updateStatsForResponse(request. cxid, lastZxid, lastOp,
request. createTime, System.currentTimeMillis());
try {
//發送Response給客戶端
cnxn.sendResponse(hdr, rsp, "response" );
if (closeSession)
{
cnxn.sendCloseSession();
}
} catch (IOException
e) {
LOG.error( "FIXMSG" ,e);
}
}