Leader選舉又稱爲master選舉是zookeeper中最爲經典的應用場景了。那爲什麼需要leader 選舉呢。
ZooKeeper需要在所有的服務(可理解爲服務器)中選舉出一個Leader,然後讓這個Leader來負責管理集羣。此時,集羣中的其他服務器則成了此Leader的follower。並且,當Leader出現故障的時候,ZooKeeper要能夠快速地在Follower中選舉出下一個Leader。這就是ZooKeeper的Leader機制,下面我們將簡單介紹如何使用ZooKeeper實現Leader選舉(Leader Election)。
此操作實現的核心思想是:首先創建一個EPHEMERAL的節點,例如"/election"。然後每一個ZooKeeper服務器在此目錄下創建一個SEQUENCE|EPHEMERAL類型的節點,例如“/election/n_”。在SEQUENCE標誌下,ZooKeeper將自動地爲每一個ZooKeeper服務分配一個比前面所分配的序號要大的序號。此時創建節點ZooKeeper服務器中擁有最小編號的服務器將成爲Leader。
在實際的操作中,還需要保證:當Leader服務器發生故障的時候,系統能夠快速地選出下一個ZooKeeper服務器作爲Leader。一個簡單的方案是,讓所有的Follower監視leader所對應的節點。當Leader發生故障時,Leader所對應的臨時節點會被自動刪除,此操作將會觸發所有監視Leader的服務器的watch。這樣這些服務器就會收到Leader故障的消息,進而進行下一次的Leader選舉操作。但是,這種操作將會導致“從衆效應”的發生,尤其是當集羣中服務器衆多並且寬帶延遲比較大的時候更爲明顯。在ZooKeeper中,爲了避免從衆效應的發生,它是這樣來實現的:每一個Follower爲Follower集羣中對應着比自己節點序號小的節點中x序號最大的節點設置一個watch。只有當Followers所設置的watch被觸發時,它才驚醒Leader選舉操作,一般情況下它將成爲集羣中的下一個Leader。很明顯,此Leader選舉操作的速度是很快的。因爲每一次Leader選舉幾乎只涉及單個Follower的操作。
下面我們看下源碼是怎麼實現的 org.apache.zookeeper.recipes.leader.LeaderElectionSupport
具體的實現邏輯在這個類中 。首先有一個start 方法 我們來看下 在這個方法中
可以看到 首先調用了 makeOffer();然後是 determineElectionStatus();
/**
*選舉的開始方法
*/
public synchronized void start() {
state = State.START;
// 廣播選舉開始
dispatchEvent(EventType.START);
LOG.info("Starting leader election support");
if (zooKeeper == null) {
throw new IllegalStateException(
"No instance of zookeeper provided. Hint: use setZooKeeper()");
}
if (hostName == null) {
throw new IllegalStateException(
"No hostname provided. Hint: use setHostName()");
}
try {
makeOffer();
determineElectionStatus();
} catch (KeeperException | InterruptedException e) {
becomeFailed(e);
}
}
我們一起來看下 makeOffer()方法,這個方法主要就是創建臨時節點
/**
* 真正開始選舉的方法 在root 目錄下創建節點
* @throws KeeperException
* @throws InterruptedException
*/
private void makeOffer() throws KeeperException, InterruptedException {
state = State.OFFER;
dispatchEvent(EventType.OFFER_START);
LeaderOffer newLeaderOffer = new LeaderOffer();
byte[] hostnameBytes;
synchronized (this) {
newLeaderOffer.setHostName(hostName);
hostnameBytes = hostName.getBytes();
newLeaderOffer.setNodePath(zooKeeper.create(rootNodeName + "/" + "n_",
hostnameBytes, ZooDefs.Ids.OPEN_ACL_UNSAFE,
// 零時節點
CreateMode.EPHEMERAL_SEQUENTIAL));
leaderOffer = newLeaderOffer;
}
LOG.debug("Created leader offer {}", leaderOffer);
dispatchEvent(EventType.OFFER_COMPLETE);
}
然後就是 determineElectionStatus()
這個方法獲取文件列表下面所有的文件最小的那個設置爲leader 其他的節點添加對上一個的監聽
/**
*
* 選出最小序號的文件 對應的機器就是leader
* @throws KeeperException
* @throws InterruptedException
*/
private void determineElectionStatus() throws KeeperException, InterruptedException {
state = State.DETERMINE;
dispatchEvent(EventType.DETERMINE_START);
LeaderOffer currentLeaderOffer = getLeaderOffer();
String[] components = currentLeaderOffer.getNodePath().split("/");
currentLeaderOffer.setId(Integer.valueOf(components[components.length - 1].substring("n_".length())));
List<LeaderOffer> leaderOffers = toLeaderOffers(zooKeeper.getChildren(rootNodeName, false));
/*
* For each leader offer, find out where we fit in. If we're first, we
* become the leader. If we're not elected the leader, attempt to stat the
* offer just less than us. If they exist, watch for their failure, but if
* they don't, become the leader.
*/
for (int i = 0; i < leaderOffers.size(); i++) {
LeaderOffer leaderOffer = leaderOffers.get(i);
if (leaderOffer.getId().equals(currentLeaderOffer.getId())) {
LOG.debug("There are {} leader offers. I am {} in line.", leaderOffers.size(), i);
dispatchEvent(EventType.DETERMINE_COMPLETE);
if (i == 0) {
// 最小的那個變成leader
becomeLeader();
} else {
// 其他的是非leader
becomeReady(leaderOffers.get(i - 1));
}
/* Once we've figured out where we are, we're done. */
break;
}
}
}
如果沒有成爲leader 的節點監聽上一個節點 如果上一個節點故障了 則重新執行上面的方法
private void becomeReady(LeaderOffer neighborLeaderOffer)
throws KeeperException, InterruptedException {
LOG.info(
"{} not elected leader. Watching node: {}",
getLeaderOffer().getNodePath(),
neighborLeaderOffer.getNodePath());
/*
* Make sure to pass an explicit Watcher because we could be sharing this
* zooKeeper instance with someone else.
*/
/**
*
* 進行watch,監視上一個節點 如果上一個節點刪除了 就重新掉用determineElectionStatus
*/
Stat stat = zooKeeper.exists(neighborLeaderOffer.getNodePath(), this);
if (stat != null) {
dispatchEvent(EventType.READY_START);
LOG.debug(
"We're behind {} in line and they're alive. Keeping an eye on them.",
neighborLeaderOffer.getNodePath());
state = State.READY;
dispatchEvent(EventType.READY_COMPLETE);
} else {
/*
* If the stat fails, the node has gone missing between the call to
* getChildren() and exists(). We need to try and become the leader.
*/
LOG.info(
"We were behind {} but it looks like they died. Back to determination.",
neighborLeaderOffer.getNodePath());
determineElectionStatus();
}
}
更多的註釋可以看這裏
https://github.com/haha174/zookeeper/commit/1174717483578074654bbc6a8a1e4744b9c255a9