Zookeeper源码分析:集群模式启动概述

参考资料

<<从PAXOS到ZOOKEEPER分布式一致性原理与实践>>
zookeeper-3.0.0

Zookeeper概述

Zookeeper是一个分布式的,开放源码的分布式应用程序协调服务。致力于提供一个高性能、高可用,具有严格的顺序访问控制能力(写操作严格顺序)的分布式协调服务。

Zookeeper集群启动

集群启动方法与配置文件

查看目录bin下的zkServer.sh内容;

ZOOBIN=`readlink -f "$0"`
ZOOBINDIR=`dirname "$ZOOBIN"`

. $ZOOBINDIR/zkEnv.sh                                                                               # 设置运行的环境变量

case $1 in
start) 
    echo -n "Starting zookeeper ... "
    java  "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
    -cp $CLASSPATH $JVMFLAGS org.apache.zookeeper.server.quorum.QuorumPeerMain $ZOOCFG &            # 启动 zookeeper 启动类为 QuorumPeerMain
    echo STARTED
    ;;
stop) 
    echo -n "Stopping zookeeper ... " 
    echo kill | nc localhost $(grep clientPort $ZOOCFG | sed -e 's/.*=//')                          # 杀死进程
    echo STOPPED
    ;;
upgrade)
    shift
    echo "upgrading the servers to 3.*"
    java "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
    -cp $CLASSPATH $JVMFLAGS org.apache.zookeeper.server.upgrade.UpgradeMain ${@}            
    echo "Upgrading ... "
    ;;
restart)
    shift
    $0 stop ${@}
    sleep 3
    $0 start ${@}                                                                                   # 重启就是先杀死进程 然后再启动
    ;;
status)
    STAT=`echo stat | nc localhost $(grep clientPort $ZOOCFG | sed -e 's/.*=//') 2> /dev/null| grep Mode`
    if [ "x$STAT" = "x" ]
    then
        echo "Error contacting service. It is probably not running."                               # 检查状态
    else
        echo $STAT
    fi
    ;;
*)
    echo "Usage: $0 {start|stop|restart|status}" >&2

esac

在中断中输入zkServer.sh start就可以启动Zookeeper集群,启动的配置文件为默认的zoo_sample.cfg,如果是集群启动,需要修改该配置文件,文件中需要加入多台集群的IP信息,并且集群启动的时候的配置文件需要相同。参考配置文件如下;

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/export/crawlspace/mahadev/zookeeper/server1/data
# the port at which the clients will connect
clientPort=2181

#设置集群信息
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888

在启动的参数解析过程中可以依次查看各个参数的用途。

Zookeeper集群启动流程
public class QuorumPeerMain {
    
    private static final Logger LOG = Logger.getLogger(QuorumPeerMain.class);

    /**
     * To start the replicated server specify the configuration file name on the
     * command line.
     * @param args command line
     */
    public static void main(String[] args) {
        if (args.length == 2) {
            ZooKeeperServerMain.main(args);                                                         // 如果是参数启动则直接启动 默认为单节点启动
            return;
        }
        QuorumPeerConfig.parse(args);
        if (!QuorumPeerConfig.isStandalone()) {
            runPeer(new QuorumPeer.Factory() {                                                      // 继承自QuorumPeer.Factory 并实现了其中的接口方法 create 和 createConnectionFactory
                public QuorumPeer create(NIOServerCnxn.Factory cnxnFactory) throws IOException {
                    QuorumPeer peer = new QuorumPeer();                                             // 生成实例
                    peer.setClientPort(ServerConfig.getClientPort());                               // 获取实例监听的客户端端口
                    peer.setTxnFactory(new FileTxnSnapLog(
                                new File(QuorumPeerConfig.getDataLogDir()), 
                                new File(QuorumPeerConfig.getDataDir())));                   
                    peer.setQuorumPeers(QuorumPeerConfig.getServers());                             // 设置Servers配置信息
                    peer.setElectionType(QuorumPeerConfig.getElectionAlg());                        // 设置选举类型
                    peer.setMyid(QuorumPeerConfig.getServerId());                                   // 设置Serverid 
                    peer.setTickTime(QuorumPeerConfig.getTickTime());                               
                    peer.setInitLimit(QuorumPeerConfig.getInitLimit());
                    peer.setSyncLimit(QuorumPeerConfig.getSyncLimit());
                    peer.setCnxnFactory(cnxnFactory);                                               // 设置网络客户端请求处理的框架
                    return peer;
                }
                public NIOServerCnxn.Factory createConnectionFactory() throws IOException {
                    return new NIOServerCnxn.Factory(getClientPort());                              // 找到IO复用的工厂方法
                }
            });
        }else{
            // there is only server in the quorum -- run as standalone
            ZooKeeperServerMain.main(args); 
        }
    }
    
    public static void runPeer(QuorumPeer.Factory qpFactory) {
        try {
            QuorumStats.registerAsConcrete();
            QuorumPeer self = qpFactory.create(qpFactory.createConnectionFactory());                // 创建实例
            self.start();                                                                           // 启动线程执行
            self.join();                                                                            // 阻塞直到线程退出
        } catch (Exception e) {
            LOG.fatal("Unexpected exception",e);
        }
        System.exit(2);
    }

}

启动的简单逻辑流程就是,首先判断是否是集群模式启动,如果是集群模式启动,则首先调用QuorumPeerConfig解析配置参数,通过解析参数来判断是否在配置文件中是否是集群模式,如果配置中是集群模式,则调用runPeer方法,该方法主要就是接受一个QuorumPeer.Factory参数,然后调用create方法,然后就调用start方法启动线程并阻塞等待经常结束。

Created with Raphaël 2.2.0启动命令行参数长度是否为2Yes or No?解析配置文件解析是否单机启动集群启动单节点启动yesnoyesno
配置文件类QuorumPeerConfig
public class QuorumPeerConfig extends ServerConfig {                                        // 继承自ServerConfig 该类实现了一个配置实例单例模式
    private static final Logger LOG = Logger.getLogger(QuorumPeerConfig.class);

    private int tickTime;
    private int initLimit;
    private int syncLimit;
    private int electionAlg;
    private int electionPort;
    private HashMap<Long,QuorumServer> servers = null;
    private long serverId;

    private QuorumPeerConfig(int port, String dataDir, String dataLogDir) {                // 调用父类的构造方法
        super(port, dataDir, dataLogDir);
    }

    public static void parse(String[] args) {                                             // 解析配置文件参数
        if(instance!=null)
            return;

        try {
            if (args.length != 1) {                                                      // 确保输入的唯一参数就是配置文件的文件路径
                System.err.println("USAGE: configFile");
                System.exit(2);
            }
            File zooCfgFile = new File(args[0]);                                        // 生成配置文件类型
            if (!zooCfgFile.exists()) {                                                 // 检查输入的配置文件是否存在
                LOG.error(zooCfgFile.toString() + " file is missing");
                System.exit(2);
            }
            Properties cfg = new Properties();     
            FileInputStream zooCfgStream = new FileInputStream(zooCfgFile);             // 读文件
            try {
                cfg.load(zooCfgStream);
            } finally {
                zooCfgStream.close();
            }
            HashMap<Long,QuorumServer> servers = new HashMap<Long,QuorumServer>();      // 保存集群机器的信息
            String dataDir = null;
            String dataLogDir = null;
            int clientPort = 0;
            int tickTime = 0;
            int initLimit = 0;
            int syncLimit = 0;
            int electionAlg = 3;
            int electionPort = 2182;
            for (Entry<Object, Object> entry : cfg.entrySet()) {                        // 获取解析的文件的配置参数
                String key = entry.getKey().toString();                                 // 转为string类型
                String value = entry.getValue().toString();
                if (key.equals("dataDir")) {                                            // 文件目录
                    dataDir = value;
                } else if (key.equals("dataLogDir")) {                                  // 日志目录
                    dataLogDir = value;
                } else if (key.equals("clientPort")) {                                  // 客户端连接端口
                    clientPort = Integer.parseInt(value);
                } else if (key.equals("tickTime")) {                                    // 基本时间间隔
                    tickTime = Integer.parseInt(value);
                } else if (key.equals("initLimit")) {                                   // 配置多少个心跳间隔
                    initLimit = Integer.parseInt(value);
                } else if (key.equals("syncLimit")) {                                   // 表示主从之间最长不能超过多少个基本时间间隔
                    syncLimit = Integer.parseInt(value);
                } else if (key.equals("electionAlg")) {                                 // 选举类型 有几个选举的策略可供选择
                    electionAlg = Integer.parseInt(value);
                } else if (key.startsWith("server.")) {                                 // 解析配置集群IP端口信息
                    int dot = key.indexOf('.');
                    long sid = Long.parseLong(key.substring(dot + 1));                  // 获取server配置的第一个id 
                    String parts[] = value.split(":");                                  // 获取  ip  port
                    if ((parts.length != 2) &&                                      
                            (parts.length != 3)){
                        LOG.error(value
                                + " does not have the form host:port or host:port:port");
                    }
                    InetSocketAddress addr = new InetSocketAddress(parts[0],
                            Integer.parseInt(parts[1]));                                // 配置IP Port
                    if(parts.length == 2)
                        servers.put(Long.valueOf(sid), new QuorumServer(sid, addr));
                    else if(parts.length == 3){
                        InetSocketAddress electionAddr = new InetSocketAddress(parts[0],
                                Integer.parseInt(parts[2]));                            // 通信接口监听
                        servers.put(Long.valueOf(sid), new QuorumServer(sid, addr, electionAddr));   // 压入server
                    }
                } else {
                    System.setProperty("zookeeper." + key, value);                      // 其他属性直接设置到类上
                }
            }
            if (dataDir == null) {                                                      // 检查参数  是否为空 如果为空 则报错
                LOG.error("dataDir is not set");
                System.exit(2);
            }
            if (dataLogDir == null) {
                dataLogDir = dataDir;
            } else {
                if (!new File(dataLogDir).isDirectory()) {
                    LOG.error("dataLogDir " + dataLogDir+ " is missing.");
                    System.exit(2);
                }
            }
            if (clientPort == 0) {
                LOG.error("clientPort is not set");
                System.exit(2);
            }
            if (tickTime == 0) {
                LOG.error("tickTime is not set");
                System.exit(2);
            }
            if (servers.size() > 1 && initLimit == 0) {
                LOG.error("initLimit is not set");
                System.exit(2);
            }
            if (servers.size() > 1 && syncLimit == 0) {
                LOG.error("syncLimit is not set");
                System.exit(2);
            }
            QuorumPeerConfig conf = new QuorumPeerConfig(clientPort, dataDir,
                    dataLogDir);                                                    // 生成一个实例 并设置参数
            conf.tickTime = tickTime;
            conf.initLimit = initLimit;
            conf.syncLimit = syncLimit;
            conf.electionAlg = electionAlg;
            conf.servers = servers;
            if (servers.size() > 1) {                                               // 如果是多个server
                /*
                 * If using FLE, then every server requires a separate election port.
                 */
                if(electionAlg != 0){
                   for(QuorumServer s : servers.values()){
                       if(s.electionAddr == null)
                           LOG.error("Missing election port for server: " + s.id);
                   }
                }
                
                File myIdFile = new File(dataDir, "myid");                        // 检查myid文件是否存在 该文件包含该实例的id信息
                if (!myIdFile.exists()) {
                    LOG.error(myIdFile.toString() + " file is missing");
                    System.exit(2);
                }
                BufferedReader br = new BufferedReader(new FileReader(myIdFile));       // 获取server id
                String myIdString;
                try {
                    myIdString = br.readLine();
                } finally {
                    br.close();
                }
                try {
                    conf.serverId = Long.parseLong(myIdString); 
                } catch (NumberFormatException e) {
                    LOG.error(myIdString + " is not a number");
                    System.exit(2);
                }
            }
            instance=conf;                                                      // 将解析好的数据设置到instance 中, 后续的实例信息都是从该实例获取
        } catch (Exception e) {
            LOG.error("FIXMSG",e);
            System.exit(2);
        }
    }
    ...
}

继承的父类就是ServerConfig,主要查看该类的parse方法。

    public static int getClientPort(){
        assert instance!=null;
        return instance.clientPort;
    }
    public static String getDataDir(){
        assert instance!=null;
        return instance.dataDir;
    }
    public static String getDataLogDir(){
        assert instance!=null;
        return instance.dataLogDir;
    }
    public static boolean isStandalone(){
        assert instance!=null;
        return instance.isStandaloneServer();
    }
    
    protected static ServerConfig instance=null;
    
    public static void parse(String[] args) {                                              // 解析的时候生成单例
        if(instance!=null)
            return;
        if (args.length != 2) {                                                            // 如果输入参数长度不为2
            System.err.println("USAGE: ZooKeeperServer port datadir\n");
            System.exit(2);
        }
        try {
              instance=new ServerConfig(Integer.parseInt(args[0]),args[1],args[1]);
        } catch (NumberFormatException e) {
            System.err.println(args[0] + " is not a valid port number");
            System.exit(2);
        }
    }
QuorumPeer执行流程

由于QuorumPeer类继承自Thread,所以调用start方法时,最终会调用QuorumPeer的start方法,然后该方法会执行run函数启动线程执行。

    @Override
    public synchronized void start() {
        startLeaderElection();                                              // 启动选举流程
        super.start();                                                      // 调用Thread的start方法,即最终会调用该类的run方法
    }

此时就调用了startLeaderElection方法来启动集群的选举。

    synchronized public void startLeaderElection() {
        currentVote = new Vote(myid, getLastLoggedZxid());                              // 获取最后的zxid 并首先投一票给自己
        for (QuorumServer p : quorumPeers.values()) {                                   // 获取当前自己的id
            if (p.id == myid) {
                myQuorumAddr = p.addr;                                                  // 获取当前的地址
                break;
            }
        }
        if (myQuorumAddr == null) {                                                     // 如果没找到则报错
            throw new RuntimeException("My id " + myid + " not in the peer list");
        }
        if (electionType == 0) {                                                        // 如果选择策略为0
            try {
                udpSocket = new DatagramSocket(myQuorumAddr.getPort());                 // 获取端口 使用UDP进行选举
                responder = new ResponderThread();                                      // 开启线程 执行
                responder.start();
            } catch (SocketException e) {
                throw new RuntimeException(e);
            }
        }
        this.electionAlg = createElectionAlgorithm(electionType);                       // 获取当前的选举算法
    }

此时开始选举的使用了UDP来进行选举。

    class ResponderThread extends Thread {
        ResponderThread() {
            super("ResponderThread");
        }

        volatile boolean running = true;
        
        @Override
        public void run() {
            try {
                byte b[] = new byte[36];
                ByteBuffer responseBuffer = ByteBuffer.wrap(b);
                DatagramPacket packet = new DatagramPacket(b, b.length);
                while (running) {
                    udpSocket.receive(packet);                                          // 接受数据包
                    if (packet.getLength() != 4) {             
                        LOG.warn("Got more than just an xid! Len = "
                                + packet.getLength());
                    } else {
                        responseBuffer.clear();
                        responseBuffer.getInt(); // Skip the xid                        // 跳过 xid
                        responseBuffer.putLong(myid);                                   
                        Vote current = getCurrentVote();                                // 获取当前选票
                        switch (getPeerState()) {                                   
                        case LOOKING:                                                   // 如果是竞选状态
                            responseBuffer.putLong(current.id);                         // 压入id 和 zxid
                            responseBuffer.putLong(current.zxid);
                            break;
                        case LEADING:
                            responseBuffer.putLong(myid);                              // 如果是主 则返回当前主的服务器id 
                            try {
                                responseBuffer.putLong(leader.lastProposed);           // 压入主 最后一次提交的事物
                            } catch (NullPointerException npe) {
                                // This can happen in state transitions,
                                // just ignore the request
                            }
                            break;
                        case FOLLOWING:
                            responseBuffer.putLong(current.id);                         // 压入当前的id
                            try {
                                responseBuffer.putLong(follower.getZxid());             // 压入 zxid
                            } catch (NullPointerException npe) {
                                // This can happen in state transitions,
                                // just ignore the request
                            }
                        }
                        packet.setData(b);
                        udpSocket.send(packet);                                        // 将数据发送出去
                    }
                    packet.setLength(b.length);
                }
            } catch (Exception e) {
                LOG.warn("Unexpected exception",e);
            } finally {
                LOG.warn("QuorumPeer responder thread exited");
            }
        }
    }

根据当前的角色进行不同的操作,选举过程中会传输当前的id和事物id来进行数据的统一,有关选举的详细内容后文再详细分析。

开始执行
@Override
public void run() {
    setName("QuorumPeer:" + cnxnFactory.getLocalAddress());                 // 设置当前的名称 该名称以监听客户端的端口结尾

    /*
     * Main loop
     */
    while (running) {
        switch (getPeerState()) {                                           // 获取当前的状态
        case LOOKING:
            try {
                LOG.info("LOOKING");
                setCurrentVote(makeLEStrategy().lookForLeader());           // 设置投票并选择leader
            } catch (Exception e) {
                LOG.warn("Unexpected exception",e);                         // 如果出错则设置为LOOKING状态
                setPeerState(ServerState.LOOKING);
            }
            break;
        case FOLLOWING:
            try {
                LOG.info("FOLLOWING");
                setFollower(makeFollower(logFactory));                      // 如果是FOLLOWING状态则转换成follower 跟随主
                follower.followLeader();
            } catch (Exception e) {
                LOG.warn("Unexpected exception",e);
            } finally {
                follower.shutdown();
                setFollower(null);
                setPeerState(ServerState.LOOKING);
            }
            break;
        case LEADING:
            LOG.info("LEADING");
            try {
                setLeader(makeLeader(logFactory));                          // 设置成主状态
                leader.lead();                                              // 接听所有事件请求
                setLeader(null);                                            // 如果失去当前主  则将主设置为空
            } catch (Exception e) {
                LOG.warn("Unexpected exception",e);
            } finally {
                if (leader != null) {                                       // 设置为空并重置状态
                    leader.shutdown("Forcing shutdown");
                    setLeader(null);
                }
                setPeerState(ServerState.LOOKING);
            }
            break;
        }
    }
    LOG.warn("QuorumPeer main thread exited");
}

根据状态来执行不同的操作,如果是主则接受从的连接,并处理从发送上来的信息。从则会转发创建信息等到主节点进行处理。后续会详细的描述整个过程。

客户端服务器启动

在创建的过程中也需要创建给客户端连接请求的服务端口,创建过程就是初始化过程中执行;

new NIOServerCnxn.Factory(getClientPort())

该方法如下;

        public Factory(int port) throws IOException {
            super("NIOServerCxn.Factory:" + port);                          // 获取服务端连接端口
            setDaemon(true);                                                
            this.ss = ServerSocketChannel.open();                           // 打开连接
            ss.socket().bind(new InetSocketAddress(port));                  // 监听端口
            ss.configureBlocking(false);                                    // 设置成非阻塞
            ss.register(selector, SelectionKey.OP_ACCEPT);                  // 设置该描述符为接受请求
            start();                                                        // 开始执行
        }

应该该类也是继承自Thread,调用start就是执行了该类重写的run方法。

        public void run() {
            while (!ss.socket().isClosed()) {                                                                       // 检查连接是否关闭
                try {
                    selector.select(1000);                                                                          // IO复用
                    Set<SelectionKey> selected;
                    synchronized (this) {
                        selected = selector.selectedKeys();                                                         // 加锁 获取 当前的触发事件描述符
                    }
                    ArrayList<SelectionKey> selectedList = new ArrayList<SelectionKey>(
                            selected);
                    Collections.shuffle(selectedList);
                    for (SelectionKey k : selectedList) {                                                           // 遍历 该列表
                        if ((k.readyOps() & SelectionKey.OP_ACCEPT) != 0) {                                         // 如果是新的请求进来
                            SocketChannel sc = ((ServerSocketChannel) k 
                                    .channel()).accept();                                                           // 接受新连接
                            sc.configureBlocking(false);                                                            // 设置非阻塞
                            SelectionKey sk = sc.register(selector,
                                    SelectionKey.OP_READ);                                                          // 注册读事件
                            NIOServerCnxn cnxn = createConnection(sc, sk);                                          // 初始化一个NIOServerCnxn类
                            sk.attach(cnxn);                                                                        // 添加到列表中
                            addCnxn(cnxn);
                        } else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) {          // 如果是读事件或者写事件则获取触发内容
                            NIOServerCnxn c = (NIOServerCnxn) k.attachment();
                            c.doIO(k);                                                                              // 回调执行处理该事件
                        }
                    }
                    selected.clear();                                                                               // 清空
                } catch (Exception e) { 
                    LOG.error("FIXMSG",e);                                                                          // 如果报错则打印错误日志
                }
            }
            ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
                                     "NIOServerCnxn factory exitedloop.");
            clear();
            LOG.error("=====> Goodbye cruel world <======");
            // System.exit(0);
        }

通过查看run函数的执行流程可知,该函数处理过程是一个典型的IO复用的处理过程,客户端新入的请求都是通过该服务来进行处理的,后续会详细分析该处理的详细流程。

总结

本文主要是简单的概述了一下Zookeeper集群模式的启动流程,很粗略的描述了启动过程中执行的主要内容,首先会开启一个线程接受客户端请求的处理,然后打开一个选举端口进行选举,接着就会打开一个集群之间数据处理同步的端口,至此三个端口都提供了不同的服务,完成了主要的Zookeeper集群的启动。由于本人才疏学浅,如有错误请批评指正。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章