redis之網絡模型,acceptTcpHandler,readQueryFromClient,bio

http://blog.csdn.net/john_zzl/article/category/1231787

#define AE_SETSIZE (1024*10) /* Max number of fd supported */

redis的網絡模型處理的fd必須小於2048（在events結構體中放不下）；

/* State of an event based program */
typedef struct aeEventLoop {
    int maxfd;
    long long timeEventNextId;
    aeFileEvent events[AE_SETSIZE]; /* Registered events */
    aeFiredEvent fired[AE_SETSIZE]; /* Fired events */
    aeTimeEvent *timeEventHead;
    int stop;
    void *apidata; /* This is used for polling API specific data */
    aeBeforeSleepProc *beforesleep;
} aeEventLoop;

/* File event structure */
typedef struct aeFileEvent {
    int mask; /* one of AE_(READABLE|WRITABLE) */
    aeFileProc *rfileProc;
    aeFileProc *wfileProc;
    void *clientData;
} aeFileEvent;

typedef void aeFileProc(struct aeEventLoop *eventLoop, int fd, void *clientData, int mask);

/* A fired event */
typedef struct aeFiredEvent {
int fd;
int mask;
} aeFiredEvent;

int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask, aeFileProc *proc, void *clientData);
void aeDeleteFileEvent(aeEventLoop *eventLoop, int fd, int mask);

將fd交給eventloop託管，直接通過aeEventLoop.events[fd]找到託管aeFileEvent；

每個fd關聯一個mask（託管事件readable or writable），讀函數（可讀時調用）和寫函數（可寫時調用），以及clientData（傳給讀寫函數）；

aeFireEvent類似epoll的epoll_event；

首先，通過epoll_wait拿到events，再轉存到fired數組中；

通過fired數組中的fd找到託管的aeFileEvent，獲得讀函數、寫函數及clientData，然後，根據fired數組中的mask調用讀寫函數；

之所以引入aeFireEvent，是因爲redis的網絡模型不僅僅支持epoll，還支持select和kqueue，需要一箇中間抽象層；

除了託管fd外，還支持timer，timer都存儲在鏈表timeEventHead，每輪poll後，會check一下timer；

redis之acceptTcpHandler

aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL);

aeCreateFileEvent(server.el,server.ipfd,AE_READABLE,acceptTcpHandler,NULL);

acceptTcpHandler每accept到一個clientfd，會爲該fd調用createClient（有maxclients限制可配）；

redisClient *createClient(int fd)

1、nonblock&tcpnodelay設置；

2、aeCreateFileEvent(server.el,fd,AE_READABLE,readQueryFromClient, c)將fd交給網絡模型託管，當readable時，調用readQueryFromClient處理；

3、初始化；

4、將c加入server.clients鏈表；

redis之readQueryFromClient

void readQueryFromClient(aeEventLoop *el, int fd, void *privdata, int mask)

讀數據到redisClient::querybuf；

讀到數據超過client_max_querybuf_len時，直接freeClient(c)；

processInputBuffer(c);

1、輸入以*開頭，則reqtype爲REDIS_REQ_MULTIBULK，調用processMultibulkBuffer(c)；

2、否則，reqtype爲REDIS_REQ_INLINE，調用processInlineBuffer(c)；

3、調用processCommand(c)；

int processInlineBuffer(redisClient *c)

strstr(c->querybuf,"\r\n")找行尾；

argv = sdssplitlen(c->querybuf,querylen," ",1,&argc)，按空格分隔每個參數；

更新c->querybuf；

根據argv設置c->argv；

int processMultibulkBuffer(redisClient *c)

解析請求，請求格式如下：

*multibulklen\r\n

$bulklen\r\n

..........\r\n

$bulklen\r\n

..........\r\n

multibulklen指定有多少個bulk，bulklen指定每個bulk的長度；

multibulklen範圍[0, 1024*1024]；

bulklen範圍[0,512*1024*1024]；

更新c->querybuf；

根據argv設置c->argv；

int processCommand(redisClient *c)

處理client發送的命令；

redis之bio

bio開線程來執行close和fdatasync操作；

採用的是job list方式（每個線程一個job list，線程從job list取job執行），也就是生產者-消費者模型；

redis之VM

aeCreateFileEvent(server.el, server.io_ready_pipe_read, AE_READABLE, vmThreadedIOCompletedJob, NULL)

主線程和後臺線程通過pipe進行通信：

後臺線程每處理完一個job，就會給server.io_ready_pipe_write發送一個字節數據；

主線程監聽server.io_ready_pipe_read，當有數據可讀時，表明有job處理完成，調用vmThreadedIOCompletedJob處理；

int vmSwapObjectThreaded(robj *key, robj *val, redisDb *db)

提交一個job給後臺線程，將val交換到磁盤；

job類型爲REDIS_IOJOB_PREPARE_SWAP；

val的storage更新爲REDIS_VM_SWAPPING；

調用queueIOJob將job append到server.io_newjobs隊列，後臺線程將從這個隊列取job執行；

queueIOJob時，如果server.io_active_threads < server.vm_max_threads，將會調用spawnIOThread()，創建後臺工作線程；

void *IOThreadEntryPoint(void *arg)
後臺工作線程函數；

從server.io_newjobs隊首取下job，將其放入server.io_processing隊尾，表示當前job正在處理；

處理job：

REDIS_IOJOB_LOAD，vmReadObjectFromSwap將val從文件讀入；

REDIS_IOJOB_PREPARE_SWAP，開啓swap過程，本次job計算存儲val需要多少page，本次job成功後，轉入REDIS_IOJOB_DO_SWAP類型job完成swap；

REDIS_IOJOB_DO_SWAP，vmWriteObjectOnSwap將val存儲到文件；

job獨立完後，將job從server.io_processing隊列取下，放入server.io_processed隊尾，然後通過pipe（server.io_ready_pipe_write）給主線程發送一個字節數據（"x"）以通知主線程調用vmThreadedIOCompletedJob；

void vmThreadedIOCompletedJob(aeEventLoop *el, int fd, void *privdata, int mask)

主線程收到server.io_ready_pipe_read可讀事件時調用；

讀一個字節；

從server.io_processed隊首取下一個已完成job；

如果job被cancel，直接忽略；

job類型爲REDIS_IOJOB_LOAD：

表明val從磁盤讀入到內存；

釋放其佔用的pages；

將value從vmpointer替換爲object；

處理所有阻塞在該key上面的client，如果client需要的所有key都內存就緒，將其加入server.io_ready_clients隊列；

job類型爲REDIS_IOJOB_PREPARE_SWAP：

保存val需要的page數目已經計算完畢；

如果不能swapout或者vmFindContiguousPages(&j->page,j->pages)失敗，則swap過程失敗，終止swap過程，將storage設置爲REDIS_VM_MEMORY；

如果能夠swapout且找到了塊存儲val，則調用vmMarkPagesUsed(j->page,j->pages)標記這些page已經被佔用，將job類型更新爲REDIS_IOJOB_DO_SWAP，調用queueIOJob提交job；

job類型爲REDIS_IOJOB_DO_SWAP：

表明val已經成功存儲到磁盤；

將value從object替換爲vmpointer，釋放object；

redis之ziplist

本文剖析redis的ziplist的實現。

ziplist是一個存儲高效的雙鏈表，存儲的元素類型有字符串和整數；雖然存儲高效，但每次插入或刪除ziplist中的元素都會引起重新分配內存，所以，ziplist作爲大型只讀表非常高效，頻繁的插入或刪除ziplist不太合適；

ziplist的內部結構：

<zlbytes><zltail><zllen><entry><entry><zlend>

zlbytes是一個unsigned int數字，表示整個ziplist佔用的字節數；
zltail是最後一個元素的偏移量，也是unsigned int；
zllen是unsigned short數字，存儲ziplist的元素個數，如果個數超過0xFFFF，需要遍歷list才能得到元素個數；
entry就是元素本身；
zlend是0xFF，特殊標記；

entry內部結構：

<prelen><curlen><body>

prelen表示上一個元素的大小；
curlen表示當前元素的大小；
body存儲元素內容；

prelen有兩種存儲方式：

小於254時，直接用一個字節存儲；
大於或等於254時，用五個字節存儲，第一個字節存儲254，接下來四個字節存儲prelen；

curlen有六種存儲方式：

元素爲字符串類型：

字符串長度小於或等於0xFFFFFF（63），用一個字節表示，|00pppppp|，00是type，pppppp是值；
字符串長度大於0xFFFFFF（63）小於或等於0xFFFFFFFFFFFFFF（16383），用兩個字節表示，|01pppppp|qqqqqqqq|，01是type，ppppppqqqqqqqq是值；
字符串長度大於0xFFFFFFFFFFFFFF（16383），用五個字節表示，|10______|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt|，10是type，後面四個字節是值；

元素爲整數類型：

一個字節表示，|1100____|表示整數用int16_t存儲，隱含值爲2；
一個字節表示，|1101____|表示整數用int32_t存儲，隱含值爲4；
一個字節表示，|1110____|表示整數用int64_t存儲，隱含值爲8；

因爲ziplist是緊湊的存儲方式，所有東西都存放在連續的內存中，所以，插入刪除元素特別費勁，需要重新分配內存；

插入元素：

對字符串元素嘗試壓縮爲整數；

需要的內存量：lensize(prevlen) + lensize(curlen) + curlen + nextdiff；

對於下一個entry來說，前一個entry發生變化，需要更新prelen；

因爲prelen採用變長存儲，所以lensize(prelen)可能發生變動，這樣會導致下一個entry自身的大小也發生變化；

這個更新可能會繼續下去直到鏈尾或者收斂；

nextdiff的含義就是下一個entry的大小變化情況，值爲：<調整後大小> - <調整前大小>；

當nextdiff爲0時，調整收斂；

刪除元素：

元素被刪除，需要move後續元素以保持內存緊湊；

對被刪除元素的下一個entry來說，可能需要更新prelen，這個更新也是級聯的；

元素的插入和刪除都需要更新zltail；

redis之zipmap

zipmap是用連續內存保存key,value對的結構；

因爲是連續內存保存的，所以每次插入或刪除操作都可能會導致重新分配內存；

爲了緩解重新分配內存壓力，爲每個value保留一個free字段，表明可用空閒字節數（4）；

存儲結構：

<zmlen><len>"foo"<len><free>"bar"<len>"hello"<len><free>"world"<ZIPMAP_END>

zmlen表示key,value對數目，如果該數目大於或等於254，得遍歷整個map才能得到key,value對數目；
len表示key或value的長度，如果len<254，則用一個字節表示，否則，用五個字節表示，第一個字節爲254，接下來四個字節存儲長度；
free是一個字節，存儲在value後面有多少預留的空閒字節可用；

每次查詢key對應的value，都得遍歷zipmap；

插入、刪除元素也得move及重新分配內存；

當插入元素存在且既有內存符合要求（放得下且不會浪費太多）時，不需要重新分配內存；

redis之intset

intset結構體：

typedef struct intset {
    uint32_t encoding;
    uint32_t length;
    int8_t contents[];
} intset;

encoding保存編碼方式：INTSET_ENC_INT16、INTSET_ENC_INT32和INTSET_ENC_INT64；

length保存元素個數；

contents保存實際數組，int16_t[]、int32_t[]或int64_t[]；

元素是有序保存的數組；

元素插入：

當插入的元素不能用intset->encoding方式保存時，也就是說新元素超出intset->encoding所能表示的範圍，則需要升級intset->encoding到新元素對應的編碼方式；

其他情況，直接二分查找到要插入的位置，resize && move && set；

元素刪除：

首先，編碼方式過濾；

二分查找元素位置，move && resize；

redis之dict

redis的dict是自動rehash的hash表，爲了平衡性能，rehash不是一次做完的，而是分散到每次交互操作來做；

typedef struct dictEntry {
    void *key;
    void *val;
    struct dictEntry *next;
} dictEntry;

typedef struct dictType {
    unsigned int (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

/* This is our hash table structure. Every dictionary has two of this as we
* implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    int rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of iterators currently running */

} dict;

dict包含兩個hash表，當rehash時，逐步將ht[0]中元素挪至ht[1]，全部挪完後，ht[0]=ht[1]，結束rehash；

int dictRehash(dict *d, int n)

將ht[0]的n個非空桶的元素rehash到ht[1]，更新rehashidx；

如果所有元素都已經rehash，則ht[0]=ht[1]，reset(ht[1])，設置rehashidx爲-1；

觸發_dictRehashStep（在沒有iterator的時候，挪元素）的操作有：dictAdd、dictReplace、dictGenericDelete、dictDelete、dictDeleteNoFree、dictFind；

dict *dictCreate(dictType *type, void *privDataPtr); // 創建dict

int dictExpand(dict *d, unsigned long size); // 當ht[0].table爲NULL時，創建hashtable；其他時候，創建ht[1]，設置rehashidx爲0，開始rehash；

int dictAdd(dict *d, void *key, void *val);

判斷是否需要rehash，觸發條件爲：元素個數大於或等於桶個數且設置了可以rehash，或者元素個數是桶個數的5倍以上；

從ht[0]中查詢是否有key存在，如果在rehash過程中，另需判斷key是否在ht[1]中存在，如果存在，則添加失敗；

如果在rehash過程中，將元素添加到ht[1]，否則，添加到ht[0]；

int dictReplace(dict *d, void *key, void *val);

先調用dictAdd，如果成功，直接返回；

失敗則表明，key已經存在，調用dictFind獲得dictEntry，將dictEntry->val替換掉；

static int dictGenericDelete(dict *d, const void *key, int nofree);

先嚐試從ht[0]中刪除key元素；

若ht[0]中沒有key元素且在rehash過程中，則嘗試從ht[1]中刪除元素key；

void dictRelease(dict *d); // 釋放dict

dictEntry * dictFind(dict *d, const void *key);

先從ht[0]找key，找到直接返回；

若ht[0]中沒找到key元素且在rehash過程中，則嘗試從ht[1]中找key；

dictIterator *dictGetSafeIterator(dict *d); // 創建迭代器

dictEntry *dictNext(dictIterator *iter); // 迭代元素

與普通hash表迭代器區別在於，如果dict處於rehash過程中，迭代完ht[0]後，會繼續迭代ht[1]；

在有迭代器迭代dict時，是不允許從ht[0]挪元素到ht[1]的；

dictEntry *dictGetRandomKey(dict *d);

從dict中隨機獲取一個元素；

redis之網絡模型,acceptTcpHandler,readQueryFromClient,bio

redis之acceptTcpHandler

aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL);

redis之readQueryFromClient

redis之intset

redis之dict

MySQL 核心模塊揭祕 | 18 期 | 鎖在內存里長什麼樣*

使用perf工具生成火焰圖

HttpSecurity 是如何組裝過濾器鏈的

數說海南——近6年海南各市縣人口簡單看

長序列中Transformers的高級注意力機制總結

大齡程序員思考

響應式界面控件DevExtreme * 更強的數據分析和可視化功能

strchr和strstr函數

fprintf、fflush(stdout)、printf、sprintf與fprintf 的用法區分

在Linux下編譯Google leveldb數據庫及在C++中操作示例

mlock家族：鎖定物理內存 .

redis之網絡模型,acceptTcpHandler,readQueryFromClient,bio

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

redis之 網絡模型,acceptTcpHandler,readQueryFromClient,bio

aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL);

redis之網絡模型,acceptTcpHandler,readQueryFromClient,bio