自己動手寫緩存系統 - tmcache

作者：heiyeluren
時間：2008-10-24
博客：http://blog.csdn.net/heiyeshuwu

【原理介紹】

tmcache 大致就是一個類似於Memcache的緩存服務器，用過的應該都大致瞭解它的執行過程，爲了便於理解，我簡單描述一下。

發送請求過程：
客戶端(PHP/Java/C++) --> 緩存服務器 --> 內存(共享內存)

接收數據過程：
內存(共享內存) --> 緩存服務器 --> 客戶端

大致描述就是：客戶端(任何能夠訪問Socket的客戶端語言或工具) 訪問緩存服務器的指定端口，進行存儲/讀取/刪除數據的操作，緩存服務器接收到指令後進行內存操作，操作結束後回寫結果給客戶端。所以緩存服務器端包含這些模塊：Socket通信、協議解析、數據存儲、數據有效期控制

以下代碼就是按照這些模塊來進行描述的，下面的代碼取自於 tmcache - TieMa(Tiny&Mini) Memory Cache，tmcache 目前支持的功能包括：

* Based memory data storage
* Compatible memcached communication protocol
* Few operation interface, The use of simple
* Support custom port,max_clients,memory use control

tmcache下載（Windows版可直接運行）：

Windows版本：http://heiyeluren.googlecode.com/files/tmcache-1.0.0_alpha-win32.zip
Unix/Linux版： http://heiyeluren.googlecode.com/files/tmcache-1.0.0_alpha.tar.gz

【系統實現】

一、通信協議處理模塊

這個主要是包含一方面是監聽處理Socket，tmcache裏主要是依靠 init_server_listen() 函數進行監聽操作，同時併發接受連接是程序裏很重要的一塊，可以選擇方式有 select/poll 多路IO的方式，epoll/kqueue 的事件方式，另外還可以使用線程(thread)的方式，tmcache爲了兼容性和簡單起見，使用了線程的方式。

線程相關核心處理代碼：

void tm_thread( int serversock, unsigned int max_client ){
    int clientsock, *arg;
    struct sockaddr_in client_addr;
    char currtime[32];
    unsigned clientlen;
    pthread_attr_t thread_attr;
    void *thread_result;

    /* Setting pthread attribute */
    pthread_attr_init(&thread_attr);
    pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_DETACHED);

    /* Run until cancelled */
    while (1){
        pthread_t thread;
        unsigned int clientlen = sizeof(client_addr);
        memset(currtime, 0, sizeof(currtime));
        getdate(currtime);

        /* Wait for client connection */
        if ((clientsock = accept(serversock, (struct sockaddr *) &client_addr, &clientlen)) < 0){
            die("Failed to accept client connection");
        }
        /* Use thread process new connection */
        arg = &clientsock;
        if (pthread_create(thread, &thread_attr, tm_thread_callback, (void *)arg) != 0){
            die("Create new thread failed");
        }
    }
    /* Destory pthread attribute */
    (void)pthread_attr_destroy(&thread_attr);
}

協議處理是很核心的，主要是包括存儲數據的 set/add/replace/append，還有提取數據的 get/gets，刪除數據的 delete/remove，獲取狀態 stats/stat 等指令的各種操作，主要操作處理函數是 proc_request()，它負責協議的分析很調用相關的接口來進行處理。

二、數據處理模塊

這是數據存儲處理的核心，主要是通過使用哈希表來存儲數據，使用隊列來記錄數據的存儲順序並且爲內存不夠用時的處理數據結構，還有使用概率處理算法來不定期清除過期數據等等。

1. 哈希表數據存儲

數據是採用哈希表的存儲方式，存儲速度簡單快速，算法效率是 O(1)，非常適合這種 Key => Value 的存儲場合，核心的哈希算法是經典的Times33算法：

unsigned tm_hash( const char *str, unsigned table_size ){
unsigned long hash = 5381;
int c;
while (c = *str++) hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
hash = table_size > 0 ? hash % table_size : hash;
return hash;
}
同時如果存在一個數據節點衝突的情況，則採用開拉鍊法來解決，一個哈希存儲節點的數據結構，next程序用於存儲下一個相同哈希映射結果的值：

/* Hash data item struct */
struct tm_hash_entry_t {
char *key;   /* data key string */
char *data;   /* data value string */
size_t length;   /* data length */
unsigned created; /* data create time (Unix Timestamp) */
unsigned expired; /* data expire time (Unix Timestamp) */
struct tm_hash_entry_t *next; /* key conflict link next data node pointer */
};

2. 數據失效處理

目前主要是兩種方法處理時效，一種是當訪問某個數據節點的時候，如果發現該數據的 expired 字段已經超過當前時間，那麼將remove該節點。另外一種方法是在進行數據操作的時候，按照概率計算算法，不定期的清除掉已經過期的算法，看看概率算法實現：

status get_gc_probability(unsigned probaility, unsigned divisor){
    int n;
    struct timeval tv;
    gettimeofday(&tv , (struct timezone *)NULL);
    srand((int)(tv.tv_usec + tv.tv_sec));
    n = 1 + (int)( (float)divisor * rand() / (RAND_MAX+1.0) );
    return (n <= probaility ? TRUE : FALSE);
}

概率的機率百分比是通過 probaility 和 divisor 來確定的，缺省是 1/100 的機率，就是一百次操作裏，有一次是可能執行清除過期數據操作的，這樣做便於減輕程序操作的壓力。

3. 內存使用完了的操作

如果tmcache啓動的時候，設定了16MB的內存使用空間，但是最後內存不夠用了，那麼就只有通過清除前面插入的緩存數據來空出空間來進行存儲新數據，這裏主要是使用了隊列，因爲隊列是使用先進先出(First in first out) 的原則的，代碼：

/* current memory use size exceed MAX_MEM_SIZE, remove last node from queue, remove key from hash table */
if ( (get_mem_used() + length) > g_max_mem_size ){
struct tm_queue_node_t *qnode;
while ( (get_mem_used() + length) > g_max_mem_size ){
qnode = tm_qremove( g_qlist );
remove_data( qnode->key );
}
}

這樣做的缺點很明顯，就是明明數據沒有失效期，確被刪除了，所以，緩存工具並不能作爲持久化數據一樣的對待方式，必須確保每次查詢緩存的時候都進行了相應的存儲操作，因爲無法保證數據是還在內存中的。

【結束語】

基本可以確定 tmcache 是一個非常簡單的緩存系統，比Memcache差距很遠，更多來說 tmcache 只是一個學習的作品，同時也是做了一些簡單的引導思路，希望對真正要做一個成型複雜穩定的緩存系統做一個拋磚引玉的簡單參考，所以，tmcache 並不是一個穩定可靠的緩存系統，也不適合用於生產環境，更適合作爲一個學習參考的小東西。 :-)

關於其他上面沒有描述的內容，建議閱讀tmcache的代碼來獲得更多相關知識。

下載地址：http://code.google.com/p/heiyeluren/downloads

本文來自CSDN博客，轉載請標明出處：http://blog.csdn.net/heiyeshuwu/archive/2008/10/24/3132977.aspx