Zookeeper学习笔记：客户端程序分析一

Zookeper的客户端程序有多种不同语言的版本，C和JAVA。因为平时在项目中接触的比较多的是C语言开发，所以在这里也就主要对C语言的客户端程序进行分析

Zookeeper的C语言的客户端代码在解压后的zookeeper压缩包src/c目录下，在该目录下可以通过./configure， make&sudo make install来安装程序。安装好了之后，主要有两个执行程序：cli_mt, cli_st。需要注意的是，这两个程序分别需要链接libzookeeper_mt.so.2，libzookeeper_st.so.2。所以在运行的时候可能需要通过export LD_LIBRARY_PATH来指定路径

这里我们首先来分析该客户端程序的Log机制，其主要实现文件为src/c/include/zookeeper_log.h, src/c/src/zk_log.c，在zookeeper.c中定义了可选的LogLevel：

	typedef enum {ZOO_LOG_LEVEL_ERROR=1,ZOO_LOG_LEVEL_WARN=2,ZOO_LOG_LEVEL_INFO=3,ZOO_LOG_LEVEL_DEBUG=4} ZooLogLevel;

并且从zookeeper_log.h的宏定义我们可以知道，log的实现主要是由log_message这个函数来实现的

#define LOG_ERROR(x) if(logLevel>=ZOO_LOG_LEVEL_ERROR) \
    log_message(ZOO_LOG_LEVEL_ERROR,__LINE__,__func__,format_log_message x)
#define LOG_WARN(x) if(logLevel>=ZOO_LOG_LEVEL_WARN) \
    log_message(ZOO_LOG_LEVEL_WARN,__LINE__,__func__,format_log_message x)
#define LOG_INFO(x) if(logLevel>=ZOO_LOG_LEVEL_INFO) \
    log_message(ZOO_LOG_LEVEL_INFO,__LINE__,__func__,format_log_message x)
#define LOG_DEBUG(x) if(logLevel==ZOO_LOG_LEVEL_DEBUG) \
    log_message(ZOO_LOG_LEVEL_DEBUG,__LINE__,__func__,format_log_message x)

而log_message函数的实现主要是在zk_log.c文件中，我们主要来分析以下zk_log.c这个文件。

__attribute__((constructor)) void prepareTSDKeys() {
    pthread_key_create (&time_now_buffer, freeBuffer);
    pthread_key_create (&format_log_msg_buffer, freeBuffer);
}

这里，我们可以看到使用了pthread_key_create来创建专属于每一个线程的time_now_buffer和format_log_msg_buffer变量。这是为什么呢？因为如果在多线程的环境下，当多个线程并行的调用Log函数时，由于log信息里面包含了打印log的时间戳，而该时间戳是程序经过格式化后的数据：

static const char* time_now(char* now_str){
    struct timeval tv;
    struct tm lt;
    time_t now = 0;
    size_t len = 0;

    gettimeofday(&tv,0);

    now = tv.tv_sec;
    localtime_r(&now, <);

    // clone the format used by log4j ISO8601DateFormat
    // specifically: "yyyy-MM-dd HH:mm:ss,SSS"

    len = strftime(now_str, TIME_NOW_BUF_SIZE,
                          "%Y-%m-%d %H:%M:%S",
                          <);

    len += snprintf(now_str + len,
                    TIME_NOW_BUF_SIZE - len,
                    ",%03d",
                    (int)(tv.tv_usec/1000));

    return now_str;
}

所以，如果每次打印log信息都经过malloc开辟空间来存储格式化的信息，在效率和性能上肯定是不高的。所以在zookeeper client端的log程序中，通过事先开辟好的一块固定的内存空间来存储这个时间戳信息，也就是上面代码的now_str对于每一个线程来说是一块固定内存的起始地址，这个我们可以从log_message的函数实现中看出来：

void log_message(ZooLogLevel curLevel,int line,const char* funcName,
    const char* message)
{
    static const char* dbgLevelStr[]={"ZOO_INVALID","ZOO_ERROR","ZOO_WARN",
            "ZOO_INFO","ZOO_DEBUG"};
    static pid_t pid=0;
#ifdef WIN32
    char timebuf [TIME_NOW_BUF_SIZE];
#endif
    if(pid==0)pid=getpid();
#ifndef THREADED
    fprintf(LOGSTREAM, "%s:%d:%s@%s@%d: %s\n", time_now(get_time_buffer()),pid,
            dbgLevelStr[curLevel],funcName,line,message);
#else
#ifdef WIN32
    fprintf(LOGSTREAM, "%s:%d(0x%lx):%s@%s@%d: %s\n", time_now(timebuf),pid,
            (unsigned long int)(pthread_self().thread_id),
            dbgLevelStr[curLevel],funcName,line,message);
#else
    fprintf(LOGSTREAM, "%s:%d(0x%lx):%s@%s@%d: %s\n", time_now(get_time_buffer()),pid,
            (unsigned long int)pthread_self(),
            dbgLevelStr[curLevel],funcName,line,message);
#endif
#endif
    fflush(LOGSTREAM);
}

而对于get_time_buffer的实现，我们可以知道，如果是单线程的程序，使用的是static的静态存储空间，而对于多线程的程序，则使用上面介绍的pthread_getspecific()来获取每一个线程专属的变量：

#ifdef THREADED
char* getTSData(pthread_key_t key,int size){
    char* p=pthread_getspecific(key);
    if(p==0){
        int res;
        p=calloc(1,size);
        res=pthread_setspecific(key,p);
        if(res!=0){
            fprintf(stderr,"Failed to set TSD key: %d",res);
        }
    }
    return p;
}

char* get_time_buffer(){
    return getTSData(time_now_buffer,TIME_NOW_BUF_SIZE);
}

char* get_format_log_buffer(){
    return getTSData(format_log_msg_buffer,FORMAT_LOG_BUF_SIZE);
}
#else
char* get_time_buffer(){
    static char buf[TIME_NOW_BUF_SIZE];
    return buf;
}

char* get_format_log_buffer(){
    static char buf[FORMAT_LOG_BUF_SIZE];
    return buf;
}
#endif

到这里，我们可就基本清楚了整个zookeeper client端Log的机制，整体架构还是挺简单的，也没有考虑高性能的问题，毕竟只是作为一个客户端使用。而在这里，我们想要关注的是这种架构下的一些额外的问题：

1. int pthread_key_create(pthread_key_t *key, void (*destr_function) (void*));

这个函数创建的key关联的变量的个数是有限制的:

pthread_key_create allocates a new TSD key. The key is stored in the location pointed to by key. There is a limit of PTHREAD_KEYS_MAX on the number of keys allocated at a given time. The value initially associated with the returned key is NULL in all currently executing threads.

并且，只有当一个线程通过pthread_exit或者被cancel的时候，destr_function才会被调用，如果该变量的值为NULL，则不会调用destr_function销毁该变量，并且，destr_function调用的顺序是未知的。

如果在调用destr_function，某个之前已经调用过destr_function的变量又被赋non-NULL值，则这个销毁的流程又会再重复一遍，这种重复是有次数限制的，最多为PTHREAD_DESTRUCTOR_ITERATIONS

2. 上面的Log机制中调用了fprintf来作为输出到文件或者terminal的方式，这里我们也许会有一个疑惑：在多线程的环境下，每一次fprintf并不需要明显的加锁操作，fprintf是线程安全的吗？

这里我们可以从stackflow(http://stackoverflow.com/questions/11664434/how-fprintf-behavior-when-multi-threaded-and-multi-processed)一些大拿的回答还有自己去查看源码来分析：

If you're using a single FILE object to perform output on an open file, then whole fprintf calls on that FILE will be atomic, i.e. lock is held on the FILE for the duration of the fprintf call. Since a FILE is local to a single process's address space, this setup is only possible in multi-threaded applications; it does not apply to multi-process setups where several different processes are accessing separate FILE objects referring to the same underlying open file. Even though you're using fprintf here, each process has its own FILE it can lock and unlock without the others seeing the changes, so writes can end up interleaved. There are several ways to prevent this from happening:

a. Allocate a synchronization object (e.g. a process-shared semaphore or mutex) in shared memory and make each process obtain the lock before writing to the file (so only one process can write at a time); OR

b. <span style="font-family: Arial, Helvetica, sans-serif;">Use filesystem-level advisory locking, e.g. fcntl locks or the (non-POSIX) BSD flock interface; OR</span>

c. Instead of writing directly to the log file, write to a pipe that another process will feed into the log file. Writes to a pipe are guaranteed (by POSIX) to be atomic as long as they are smaller than PIPE_BUF bytes long. You cannot use fprintf in this case (since it might perform multiple underlying write operations), but you could use snprintf to a PIPE_BUF-sized buffer followed by write.

Zookeeper学习笔记：客户端程序分析一

《日本蜡烛图》读书笔记 & 技术分析回测

Python多线程编程深度探索：从入门到实战

《期货-市场技术分析》读书笔记

mongodb处理json数据很好

顶级 Javaer 都在用的 20 个类库，真香！

[转帖]cpupower

google浏览器插件开发

35K*14 薪，入职了！这公司只要不裁员，我能一直呆下去！

Zookeeper學習筆記：客戶端程序分析一

Linux用戶層多線程無鎖化原子操作

malloc內存分配字節對齊問題

Zookeeper學習筆記一：分佈式一致性的一些基本概念

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結