ZooKeeper場景實踐:(6)集羣監控和Master選舉

1. 集羣機器監控

這通常用於那種對集羣中機器狀態,機器在線率有較高要求的場景,能夠快速對集羣中機器變化作出響應。這樣的場景中,往往有一個監控系統,實時檢測集羣機器是否存活。

利用ZooKeeper有兩個特性(讀可監控,臨時節點),就可以實現一種集羣機器存活性監控系統:

1. 客戶端在節點 x 上註冊一個Watcher,那麼如果x的子節點變化了,會通知該客戶端
2. 創建EPHEMERAL類型的節點,一旦客戶端和服務器的會話結束或過期,那麼該節點就會消失

利用這兩個特性,可以分別實現對客服端的狀態變化、上下線進行監控。

例如,監控系統在 /Monitor 節點上註冊一個Watcher,以後每動態加機器,那麼就往 /Monitor 下創建一個 EPHEMERAL類型的節點:/Monitor/{hostname}. 這樣,監控系統就能夠實時知道機器的增減情況,至於後續處理就是監控系統的業務了。

2. Master選舉

在分佈式環境中,有些業務邏輯只需要集羣中的某一臺機器進行執行,其他的機器可以共享這個結果,這樣可以大大減少重複計算,提高性能,於是就需要進行master選舉。

利用ZooKeeper的強一致性,能夠保證在分佈式高併發情況下節點創建的全局唯一性,即:同時有多個客戶端請求創建 /currentMaster 節點,最終一定只有一個客戶端請求能夠創建成功。利用這個特性,就能很輕易的在分佈式環境中進行集羣選舉了。

此外,也可以利用Zookeeper的EPHEMERAL_SEQUENTIAL節點,實現動態選舉:每個客戶端都在/Master/下創建一個EPHEMERAL_SEQUENTIAL節點,由於ZooKeeper保證SEQUENTIAL的有序性,因此我們可以簡單的把節點號最小的作爲Master,就完成了選主。

3. 場景分析

假設我們要監控集羣中的一羣活動的業務進程,同時會在這羣進程中選取一個進程作爲監控的Master進程。每個進程使用IP地址加進程號標識,即{ip:pid}.當新的業務進程上線時,該進程會到/Monitor下創建一個臨時有序(EPHEMERAL_SEQUENTIAL)的節點.並獲取/Monitor下的子節點列表,如果發現自己創建的節點最小,則提升自己爲Master進程,否則仍是業務進程。當進程退出時該節點會自動刪除,其他進程則會嘗試選主,保證當Master進程退出後,會提升一個新的Master進程。

舉個例子,假設集羣中一開始沒有進程,

  1. 進程A1被創建,在/Monitor創建/Monitor/proc-1路徑,由於/Monitor下只有一個路徑,A1被提升爲Master進程。
  2. 進程A2被創建,在/Monitor創建/Monitor/proc-2路徑,選主不成功,作爲Slave進程;同時A1監控/Monitor的子節點變化事件,會收到有新進程被創建 ,因此執行show_list。
  3. 進程A2被創建,在/Monitor創建/Monitor/proc-3路徑,選主不成功,作爲Slave進程;同時A1監控/Monitor的子節點變化事件,會收到有新進程被創建 ,因此執行show_list。
  4. 進程A1被Killed掉,其他進程監控到/Monitor的子節點變化事件,嘗試選主,只有A2序號成功,因此A2選主成功,A3作爲Slave進程。
  5. 進程A4被創建,在/Monitor創建/Monitor/proc-4路徑,選主不成功,作爲Slave進程;同時A2監控/Monitor的子節點變化事件,會收到有新進程被創建 ,因此執行show_list。

執行情況如下表所示:

A1 A2 A3 A4
create,show_list(M)      
show_list(M) create    
show_list(M) - create  
killed show_list(M) -  
- show_list(M) - create

4. 動手實踐

首先是獲取本機的IP已經當前進程的進程號PID,並通過ip_pid返回。

void getlocalhost(char *ip_pid,int len)
{
    char hostname[64] = {0};
    struct hostent *hent ;

    gethostname(hostname,sizeof(hostname));
    hent = gethostbyname(hostname);

    char * localhost = inet_ntoa(*((struct in_addr*)(hent->h_addr_list[0])));

    snprintf(ip_pid,len,"%s:%lld",localhost,getpid());
}


選主函數,獲取path下的所有子節點,選擇序號最小的一個,取出它的ip_pid,如果和本進程相同,則本進程被選爲Master。如果當前進程被選爲Master,則進程中的全局變量g_mode會被賦值爲MODE_MONITOR,否則不變。

void choose_mater(zhandle_t *zkhandle,const char *path)
{
    struct String_vector procs;
    int i = 0;
    int ret = zoo_get_children(zkhandle,path,1,&procs);

    if(ret != ZOK || procs.count == 0){
        fprintf(stderr,"failed to get the children of path %s!\n",path);
    }else{
        char master_path[512] ={0};
        char ip_pid[64] = {0};
        int ip_pid_len = sizeof(ip_pid);

        char master[512]={0};
        char localhost[512]={0};

        getlocalhost(localhost,sizeof(localhost));

        strcpy(master,procs.data[0]);
        for(i = 1; i < procs.count; ++i){
            if(strcmp(master,procs.data[i])>0){
                strcpy(master,procs.data[i]);
            }
        }

        sprintf(master_path,"%s/%s",path,master);

        ret = zoo_get(zkhandle,master_path,0,ip_pid,&ip_pid_len,NULL);
        if(ret != ZOK){
            fprintf(stderr,"failed to get the data of path %s!\n",master_path);
        }else if(strcmp(ip_pid,localhost)==0){
            g_mode = MODE_MONITOR;
        }

    }

    for(i = 0; i < procs.count; ++i){
        free(procs.data[i]);
        procs.data[i] = NULL;
    }

}


show_list爲Master進程函數,所做的任務爲打印path目錄下所有子節點的ip_pid.

void show_list(zhandle_t *zkhandle,const char *path)
{

    struct String_vector procs;
    int i = 0;
    char localhost[512]={0};

    getlocalhost(localhost,sizeof(localhost));

    int ret = zoo_get_children(zkhandle,path,1,&procs);

    if(ret != ZOK){
        fprintf(stderr,"failed to get the children of path %s!\n",path);
    }else{
        char child_path[512] ={0};
        char ip_pid[64] = {0};
        int ip_pid_len = sizeof(ip_pid);
        printf("--------------\n");
        printf("ip\tpid\n");
        for(i = 0; i < procs.count; ++i){
            sprintf(child_path,"%s/%s",path,procs.data[i]);
            //printf("%s\n",child_path);
            ret = zoo_get(zkhandle,child_path,0,ip_pid,&ip_pid_len,NULL);
            if(ret != ZOK){
                fprintf(stderr,"failed to get the data of path %s!\n",child_path);
            }else if(strcmp(ip_pid,localhost)==0){
                printf("%s(Master)\n",ip_pid);
            }else{
                printf("%s\n",ip_pid);
            }
        }
    }

    for(i = 0; i < procs.count; ++i){
        free(procs.data[i]);
        procs.data[i] = NULL;
    }
}


監控函數如下,當發現path的子節點發生變化,就會嘗試重新選主,如果當前進程被選爲主,就立即執行show_list,打印path下的所有子節點對應的ip_pid.

void zktest_watcher_g(zhandle_t* zh, int type, int state, const char* path, void* watcherCtx)  
{  
/*  
    printf("watcher event\n");  
    printf("type: %d\n", type);  
    printf("state: %d\n", state);  
    printf("path: %s\n", path);  
    printf("watcherCtx: %s\n", (char *)watcherCtx);  
*/  

    if(type == ZOO_CHILD_EVENT &&
       state == ZOO_CONNECTED_STATE ){

        choose_mater(zh,path);
        if(g_mode == MODE_MONITOR){
            show_list(zh,path);
        }
    }
}


完整代碼如下:
1.monitor.c

#include<stdio.h>  
#include<string.h>  
#include<unistd.h>
#include <netinet/in.h>
#include <netdb.h>
#include <arpa/inet.h>
#include"zookeeper.h"  
#include"zookeeper_log.h"  

enum WORK_MODE{MODE_MONITOR,MODE_WORKER} g_mode;
char g_host[512]= "172.17.0.36:2181";  

//watch function when child list changed
void zktest_watcher_g(zhandle_t* zh, int type, int state, const char* path, void* watcherCtx);
//show all process ip:pid
void show_list(zhandle_t *zkhandle,const char *path);
//if success,the g_mode will become MODE_MONITOR
void choose_mater(zhandle_t *zkhandle,const char *path);
//get localhost ip:pid
void getlocalhost(char *ip_pid,int len);

void print_usage();
void get_option(int argc,const char* argv[]);

/**********unitl*********************/  
void print_usage()
{
    printf("Usage : [monitor] [-h] [-m] [-s ip:port] \n");
    printf("        -h Show help\n");
    printf("        -m set monitor mode\n");
    printf("        -s zookeeper server ip:port\n");
    printf("For example:\n");
    printf("monitor -m -s172.17.0.36:2181 \n");
}

void get_option(int argc,const char* argv[])
{
    extern char    *optarg;
    int            optch;
    int            dem = 1;
    const char    optstring[] = "hms:";

    //default    
    g_mode = MODE_WORKER;

    while((optch = getopt(argc , (char * const *)argv , optstring)) != -1 )
    {
        switch( optch )
        {
        case 'h':
            print_usage();
            exit(-1);
        case '?':
            print_usage();
            printf("unknown parameter: %c\n", optopt);
            exit(-1);
        case ':':
            print_usage();
            printf("need parameter: %c\n", optopt);
            exit(-1);
        case 'm':
                g_mode = MODE_MONITOR;
            break;
        case 's':
            strncpy(g_host,optarg,sizeof(g_host));
            break;
        default:
            break;
        }
    }
} 
void zktest_watcher_g(zhandle_t* zh, int type, int state, const char* path, void* watcherCtx)  
{  
/*  
    printf("watcher event\n");  
    printf("type: %d\n", type);  
    printf("state: %d\n", state);  
    printf("path: %s\n", path);  
    printf("watcherCtx: %s\n", (char *)watcherCtx);  
*/  

    if(type == ZOO_CHILD_EVENT &&
       state == ZOO_CONNECTED_STATE ){

        choose_mater(zh,path);
        if(g_mode == MODE_MONITOR){
            show_list(zh,path);
        }
    }
}  
void getlocalhost(char *ip_pid,int len)
{
    char hostname[64] = {0};
    struct hostent *hent ;

    gethostname(hostname,sizeof(hostname));
    hent = gethostbyname(hostname);

    char * localhost = inet_ntoa(*((struct in_addr*)(hent->h_addr_list[0])));

    snprintf(ip_pid,len,"%s:%lld",localhost,getpid());
}

void choose_mater(zhandle_t *zkhandle,const char *path)
{
    struct String_vector procs;
    int i = 0;
    int ret = zoo_get_children(zkhandle,path,1,&procs);

    if(ret != ZOK || procs.count == 0){
        fprintf(stderr,"failed to get the children of path %s!\n",path);
    }else{
        char master_path[512] ={0};
        char ip_pid[64] = {0};
        int ip_pid_len = sizeof(ip_pid);

        char master[512]={0};
        char localhost[512]={0};

        getlocalhost(localhost,sizeof(localhost));

        strcpy(master,procs.data[0]);
        for(i = 1; i < procs.count; ++i){
            if(strcmp(master,procs.data[i])>0){
                strcpy(master,procs.data[i]);
            }
        }

        sprintf(master_path,"%s/%s",path,master);

        ret = zoo_get(zkhandle,master_path,0,ip_pid,&ip_pid_len,NULL);
        if(ret != ZOK){
            fprintf(stderr,"failed to get the data of path %s!\n",master_path);
        }else if(strcmp(ip_pid,localhost)==0){
            g_mode = MODE_MONITOR;
        }

    }

    for(i = 0; i < procs.count; ++i){
        free(procs.data[i]);
        procs.data[i] = NULL;
    }

}
void show_list(zhandle_t *zkhandle,const char *path)
{

    struct String_vector procs;
    int i = 0;
    char localhost[512]={0};

    getlocalhost(localhost,sizeof(localhost));

    int ret = zoo_get_children(zkhandle,path,1,&procs);

    if(ret != ZOK){
        fprintf(stderr,"failed to get the children of path %s!\n",path);
    }else{
        char child_path[512] ={0};
        char ip_pid[64] = {0};
        int ip_pid_len = sizeof(ip_pid);
        printf("--------------\n");
        printf("ip\tpid\n");
        for(i = 0; i < procs.count; ++i){
            sprintf(child_path,"%s/%s",path,procs.data[i]);
            //printf("%s\n",child_path);
            ret = zoo_get(zkhandle,child_path,0,ip_pid,&ip_pid_len,NULL);
            if(ret != ZOK){
                fprintf(stderr,"failed to get the data of path %s!\n",child_path);
            }else if(strcmp(ip_pid,localhost)==0){
                printf("%s(Master)\n",ip_pid);
            }else{
                printf("%s\n",ip_pid);
            }
        }
    }

    for(i = 0; i < procs.count; ++i){
        free(procs.data[i]);
        procs.data[i] = NULL;
    }
}

int main(int argc, const char *argv[])  
{  
    int timeout = 30000;  
    char path_buffer[512];  
    int bufferlen=sizeof(path_buffer);  

    zoo_set_debug_level(ZOO_LOG_LEVEL_WARN); //設置日誌級別,避免出現一些其他信息  

    get_option(argc,argv);

    zhandle_t* zkhandle = zookeeper_init(g_host,zktest_watcher_g, timeout, 0, (char *)"Monitor Test", 0);  

    if (zkhandle ==NULL)  
    {  
        fprintf(stderr, "Error when connecting to zookeeper servers...\n");  
        exit(EXIT_FAILURE);  
    }  

    char path[512]="/Monitor";

    int ret = zoo_exists(zkhandle,path,0,NULL); 
    if(ret != ZOK){
        ret = zoo_create(zkhandle,path,"1.0",strlen("1.0"),  
                          &ZOO_OPEN_ACL_UNSAFE,0,  
                          path_buffer,bufferlen);  
        if(ret != ZOK){
            fprintf(stderr,"failed to create the path %s!\n",path);
        }else{
            printf("create path %s successfully!\n",path);
        }
    }

    if(ret == ZOK && g_mode == MODE_WORKER){

        char localhost[512]={0};
        getlocalhost(localhost,sizeof(localhost));

        char child_path[512];
        sprintf(child_path,"%s/proc-",path);
        ret = zoo_create(zkhandle,child_path,localhost,strlen(localhost),  
                          &ZOO_OPEN_ACL_UNSAFE,ZOO_SEQUENCE|ZOO_EPHEMERAL,  
                          path_buffer,bufferlen);  
        if(ret != ZOK){
            fprintf(stderr,"failed to create the child_path %s,buffer:%s!\n",child_path,path_buffer);
        }else{
            printf("create child path %s successfully!\n",path_buffer);
        }
        choose_mater(zkhandle,path);

    }

    if(g_mode == MODE_MONITOR){
        show_list(zkhandle,path);
    }

    getchar();

    zookeeper_close(zkhandle); 

    return 0;
}


2.Makefile

CC=gcc
CFLAGS=-g 
ZOOKEEPER_INSTALL=/usr/local
ZOOKEEPER_INC=-I${ZOOKEEPER_INSTALL}/include/zookeeper
ZOOKEEPER_LIB= -L${ZOOKEEPER_INSTALL}/lib -lzookeeper_mt

APP=monitor
all:
    ${CC} monitor.c -DTHREAD ${CFLAGS} ${ZOOKEEPER_INC} ${ZOOKEEPER_LIB} -o ${APP} 
clean:
    rm -f ${APP}


可以單機上重複啓動程序,它們的進程號都是不同的,也可以在集羣中啓動程序。
參數-s表示Zookeeper的服務器的ip和端口,(注意不要理解成master的ip和端口哦)
參數-m表示該進程是一個獨立的監控進程,注意,指定這個參數的進程是不參加選主的,因爲它不會在/Monitor目錄下創建路徑。
運行示例:
monitor -s172.17.0.36:2181


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章