Redis單線程不行了,快來割VM/ BIO/ IO多線程的韭菜!(附源碼)

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis在早期,曾因單線程“聞名”。在Redis的FAQ裏有一個提問"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"《Redis is single threaded. How can I exploit multiple CPU\/cores?》"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"https:\/\/redis.io\/topics\/faq,說明了redis使用單線程的原因:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU通常並不是Redis的瓶頸,因爲Redis通常要麼受內存限制,要麼受網絡限制。比如說,一般在Linux系統上運行的流水線Redis,每秒可以交付一百萬個請求,如果你的應用程序主要使用O(N)或O(log(N))命令,幾乎不會使用過多的CPU 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"......"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不過從Redis 4.0開始,Redis就開始使用更多的線程了。目前使用多線程的場景(Redis 4.0),僅限於在後臺刪除對象,以及通過Redis modules實現的阻塞命令。在未來的版本中,計劃是讓Redis越來越線程化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這不禁讓我好奇,Redis一開始是單線程的嗎?又是怎麼朝多線程演化的呢,又是爲什麼讓Redis越來越線程化呢。在閱讀了幾篇文章後,我決定自己讀一遍相關源代碼,瞭解Redis的多線程演化歷史。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Redis 多線程源碼分析系列指南:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis VM線程(Redis 1.3.x - Redis 2.4)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis BIO線程(Redis 2.4+ 和 Redis 4.0+)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis 網絡IO線程(Redis 6.0+)"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Redis VM線程(Redis 1.3.x - Redis 2.4)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實際上Redis很早就用到多線程,我們在 Redis 的 1.3.x (2010年)的源代碼中,能看到 Redis VM 相關的多線程代碼,這部分代碼主要是在 Redis 中實現線程化VM的能力。Redis VM 可以將 Redis 中很少訪問的 value 存到磁盤中,也可以將佔用內存大的 value 存到磁盤。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis VM 的底層是讀寫磁盤,所以在從磁盤讀寫 value 時,阻塞VM會產生阻塞主線程,影響所有的客戶端,導致所有客戶端耗時增加。所以 Redis VM 又提供了線程化VM,可以將讀寫文件數據的操作,放在IO線程中執行,這樣就隻影響一個客戶端(需要從文件中讀出數據的客戶端),從而避免像阻塞VM那樣,提升所有客戶端的耗時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們從《Virtual Memory technical specification》https:\/\/redis.io\/topics\/internals-vm 能看到線程化VM的優勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"列舉線程化VM設計目標的重要性:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單的實現,很少條件競爭,簡單的鎖,VM系統多少與其餘Redis代碼解耦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"良好的性能,客戶端訪問內存中的value沒有鎖了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"能夠在I \/ O線程中,對對象進行解碼\/編碼。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但其實,Redis VM 是一個被棄用的短壽特性。在 Redis 1.3.x 出現 Redis VM 之後,Redis 2.4 是最後支持它的版本。Redis 1.3.x 在 2010年發佈,Redis 2.6 在 2012年發佈,Redis VM的生命在Redis項目中,只持續了兩年。我們現在從《Virtual Memory》https:\/\/redis.io\/topics\/virtual-memory能看到棄用 Redis VM 的原因:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"……我們發現使用VM有許多缺點和問題。在未來,我們只想提供有史以來最好的內存數據庫(但仍像往常一樣在磁盤上持久化),而至少現在,不考慮對大於RAM的數據庫的支持。我們未來的工作重點是提供腳本,羣集和更好的持久性。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我個人以爲,去掉Redis VM的根本原因,可能是定位問題。Redis的準確定位了磁盤備份的內存數據庫,去掉VM後的Redis更純粹,更簡單,更容易讓用戶理解和使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面簡單介紹下 Redis VM 的多線程代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis主線程和IO線程使用任務隊列和單個互斥鎖進行通信。隊列定義和互斥鎖定義如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/* Global server state structure *\/\nstruct redisServer {\n...\n list *io_newjobs; \/* List of VM I\/O jobs yet to be processed *\/\n list *io_processing; \/* List of VM I\/O jobs being processed *\/\n list *io_processed; \/* List of VM I\/O jobs already processed *\/\n list *io_ready_clients; \/* Clients ready to be unblocked. All keys loaded *\/\n pthread_mutex_t io_mutex; \/* lock to access io_jobs\/io_done\/io_thread_job *\/\n pthread_mutex_t io_swapfile_mutex; \/* So we can lseek + write *\/\n pthread_attr_t io_threads_attr; \/* attributes for threads creation *\/\n...\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis在需要處理IO任務時(比如使用的內存超過最大內存等情況),Redis通過queueIOJob函數,將一個IO任務(iojob)入隊到任務隊列(io_newjobs),在queueIOJob中,會根據VM的最大線程數,判斷是否需要創建新的IO線程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nvoid queueIOJob(iojob *j) {\n redisLog(REDIS_DEBUG,\"Queued IO Job %p type %d about key '%s'\\n\",\n (void*)j, j->type, (char*)j->key->ptr);\n listAddNodeTail(server.io_newjobs,j);\n if (server.io_active_threads < server.vm_max_threads)\n spawnIOThread();\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建出的IO線程,主邏輯是IOThreadEntryPoint。IO線程會先從io_newjobs隊列中取出一個iojob,然後推入io_processing隊列,然後根據iojob中的type來執行對應的任務:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從磁盤讀數據到內存"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算需要的page數"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將內存swap到磁盤"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"執行完成後,將iojob推入io_processed隊列。最後,IO線程通過UINX管道,向主線程發送一個字節,告訴主線程,有一個新的任務處理完成,需要主線程處理結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\ntypedef struct iojob {\n int type; \/* Request type, REDIS_IOJOB_* *\/\n redisDb *db;\/* Redis database *\/\n robj *key; \/* This I\/O request is about swapping this key *\/\n robj *id; \/* Unique identifier of this job:\n this is the object to swap for REDIS_IOREQ_*_SWAP, or the\n vmpointer objct for REDIS_IOREQ_LOAD. *\/\n robj *val; \/* the value to swap for REDIS_IOREQ_*_SWAP, otherwise this\n * field is populated by the I\/O thread for REDIS_IOREQ_LOAD. *\/\n off_t page; \/* Swap page where to read\/write the object *\/\n off_t pages; \/* Swap pages needed to save object. PREPARE_SWAP return val *\/\n int canceled; \/* True if this command was canceled by blocking side of VM *\/\n pthread_t thread; \/* ID of the thread processing this entry *\/\n} iojob;\n#define REDIS_IOJOB_LOAD 0 \/* Load from disk to memory *\/\n#define REDIS_IOJOB_PREPARE_SWAP 1 \/* Compute needed pages *\/\n#define REDIS_IOJOB_DO_SWAP 2 \/* Swap from memory to disk *\/\nvoid *IOThreadEntryPoint(void *arg) {\n iojob *j;\n listNode *ln;\n REDIS_NOTUSED(arg);\n pthread_detach(pthread_self());\n while(1) {\n \/* Get a new job to process *\/\n lockThreadedIO();\n if (listLength(server.io_newjobs) == 0) {\n \/* No new jobs in queue, exit. *\/\n ...\n unlockThreadedIO();\n return NULL;\n }\n ln = listFirst(server.io_newjobs);\n j = ln->value;\n listDelNode(server.io_newjobs,ln);\n \/* Add the job in the processing queue *\/\n j->thread = pthread_self();\n listAddNodeTail(server.io_processing,j);\n ln = listLast(server.io_processing); \/* We use ln later to remove it *\/\n unlockThreadedIO();\n ...\n \/* Process the Job *\/\n if (j->type == REDIS_IOJOB_LOAD) {\n vmpointer *vp = (vmpointer*)j->id;\n j->val = vmReadObjectFromSwap(j->page,vp->vtype);\n } else if (j->type == REDIS_IOJOB_PREPARE_SWAP) {\n j->pages = rdbSavedObjectPages(j->val);\n } else if (j->type == REDIS_IOJOB_DO_SWAP) {\n if (vmWriteObjectOnSwap(j->val,j->page) == REDIS_ERR)\n j->canceled = 1;\n }\n \/* Done: insert the job into the processed queue *\/\n ...\n lockThreadedIO();\n listDelNode(server.io_processing,ln);\n listAddNodeTail(server.io_processed,j);\n unlockThreadedIO();\n \/* Signal the main thread there is new stuff to process *\/\n redisAssert(write(server.io_ready_pipe_write,\"x\",1) == 1);\n }\n return NULL; \/* never reached *\/\n}"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲 Redis VM 特性已經從Redis中刪除,相關代碼也比較古早,就不展開闡述了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了學習到多線程下,Redis 對數據讀寫的優化,我們在學習源碼和Redis的官方博客時,能夠明顯感受到:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“去掉 Redis VM 的根本原因,可能是定位問題。Redis的準確定位了磁盤備份的內存數據庫,去掉VM後的Redis更純粹,更簡單,更容易讓用戶理解和使用。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有時候,砍掉性能不好、意義不明的特性代碼,就是最好的性能優化吧。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Redis BIO線程(Redis 2.4+ 和 Redis 4.0+)"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Redis BIO線程(Redis 2.4+)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從系列上一篇我們知道,從一開始,除了“短壽”的VM特性和VM線程,Redis主要還是單線程的。不過,我們在Redis的官方文章裏能看到,從 Redis 2.4 (2011年)開始,Redis會使用線程在後臺執行一些主要跟磁盤I\/O有關的慢速的I\/O操作。我們把代碼分支切到 Redis 2.4 的分支上,能發現有兩個 BIO 線程,協助 Redis 進行AOF文件同步刷盤和文件刪除的工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"怎麼找到多線程相關的代碼?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據Redis的配置appendfsync,我們在代碼裏面找到配置對應的定義。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ config.c\n...\n else if (!strcasecmp(c->argv[2]->ptr,\"appendfsync\")) {\n if (!strcasecmp(o->ptr,\"no\")) {\n server.appendfsync = APPENDFSYNC_NO;\n } else if (!strcasecmp(o->ptr,\"everysec\")) {\n server.appendfsync = APPENDFSYNC_EVERYSEC;\n } else if (!strcasecmp(o->ptr,\"always\")) {\n server.appendfsync = APPENDFSYNC_ALWAYS;\n } else {\n goto badfmt;\n }\n }\n..."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過搜索 APPENDFSYNC_EVERYSEC ,我們找到了 backgroundRewriteDoneHandler: "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ aof.c\nvoid backgroundRewriteDoneHandler(int statloc) {\n......\n else if (server.appendfsync == APPENDFSYNC_EVERYSEC)\n aof_background_fsync(newfd);\n......\n}\n在 aof_background_fsync 函數中,發現了後臺任務相關函數:\n\/\/ aof.c\nvoid aof_background_fsync(int fd) {\n bioCreateBackgroundJob(REDIS_BIO_AOF_FSYNC,(void*)(long)fd,NULL,NULL);\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"搜索關鍵詞 REDIS_BIO_AOF_FSYNC,最後我們找到了BIO模塊的頭文件(bio.h),包含了BIO相關的接口和常量定義:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ bio.h\n\/* Exported API *\/\nvoid bioInit(void);\nvoid bioCreateBackgroundJob(int type, void *arg1, void *arg2, void *arg3);\nunsigned long long bioPendingJobsOfType(int type);\nvoid bioWaitPendingJobsLE(int type, unsigned long long num);\ntime_t bioOlderJobOfType(int type);\n\/* Background job opcodes *\/\n#define REDIS_BIO_CLOSE_FILE 0 \/* Deferred close(2) syscall. *\/\n#define REDIS_BIO_AOF_FSYNC 1 \/* Deferred AOF fsync. *\/\n#define REDIS_BIO_NUM_OPS 2"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們找到了 bioInit,發現 Redis 創建了2個 BIO 線程來執行 bioProcessBackgroundJobs 函數,而 bioInit 又是在 server.c 的 main 方法中,通過 initServer 函數來調用:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ bio.c\n\/* Initialize the background system, spawning the thread. *\/\nvoid bioInit(void) {\n pthread_attr_t attr;\n pthread_t thread;\n size_t stacksize;\n int j;\n \/* Initialization of state vars and objects *\/\n for (j = 0; j < REDIS_BIO_NUM_OPS; j++) {\n pthread_mutex_init(&bio_mutex[j],NULL);\n pthread_cond_init(&bio_condvar[j],NULL);\n bio_jobs[j] = listCreate();\n bio_pending[j] = 0;\n }\n \/* Set the stack size as by default it may be small in some system *\/\n pthread_attr_init(&attr);\n pthread_attr_getstacksize(&attr,&stacksize);\n if (!stacksize) stacksize = 1; \/* The world is full of Solaris Fixes *\/\n while (stacksize < REDIS_THREAD_STACK_SIZE) stacksize *= 2;\n pthread_attr_setstacksize(&attr, stacksize);\n \/* Ready to spawn our threads. We use the single argument the thread\n * function accepts in order to pass the job ID the thread is\n * responsible of. *\/\n for (j = 0; j < REDIS_BIO_NUM_OPS; j++) {\n void *arg = (void*)(unsigned long) j;\n if (pthread_create(&thread,&attr,bioProcessBackgroundJobs,arg) != 0) {\n redisLog(REDIS_WARNING,\"Fatal: Can't initialize Background Jobs.\");\n exit(1);\n }\n }\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"BIO多線程的意義"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 backgroundRewriteDoneHandler 函數中,我們會給 BIO 線程增加後臺任務,然後讓 BIO 線程在後臺處理一些工作,爲了搞清楚 Redis 使用 BIO 多線程的意義,我們可以先弄清楚這個函數是做什麼的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看註釋的描述,這個函數是在後臺AOF重寫(BGREWRITEAOF)結束時調用,然後我們繼續往下看代碼,主要是一些寫文件的操作,直到我們看到 aof.c 中有一段很詳細的註釋:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"剩下要做的唯一事情就是將臨時文件重命名爲配置的文件,並切換用於執行AOF寫入的文件描述符。我們不希望close(2)或rename(2)調用在刪除舊文件時阻塞服務器。有兩種可能的方案:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AOF已禁用,這是一次重寫。臨時文件將重命名爲配置的文件。當該文件已經存在時,它將被取消鏈接(unlink),這可能會阻塞server。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AOF已啓用,重寫的AOF將立即開始接收寫操作。將臨時文件重命名爲配置文件後,原始AOF文件描述符將關閉。由於這將是對該文件的最後一個引用,因此關閉該文件將導致底層文件被取消鏈接(unlink),這可能會阻塞server。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了減輕取消鏈接(unlink)操作的阻塞效果(由方案1中的rename(2)或方案2中的close(2)引起),我們使用後臺線程來解決此問題。首先,通過打開目標文件,使方案1與方案2相同。rename(2)之後的取消鏈接(unlink)操作將在爲其描述符調用close(2)時執行。到那時,保證這條分支原子性的一切都已發生,因此,只要文件描述符再次被釋放,我們就不在乎該關閉操作的影響或持續時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們發現了Redis使用BIO線程(REDIS_BIO_CLOSE_FILE)的目的——後臺線程刪除文件,避免因爲刪除大文件耗時過長導致主線程阻塞:在AOF重寫時,rename(2)或者close(2)文件,可能會導致系統調用執行刪除文件的操作,而刪除文件的操作是在當前進程執行(內核態),所以如果文件較大,當前進程刪除文件的耗時就會比較長。而如果在主線程刪除比較大的文件,就會導致主線程被磁盤IO阻塞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/aof.c\n\/* A background append only file rewriting (BGREWRITEAOF) terminated its work.\n * Handle this. *\/\nvoid backgroundRewriteDoneHandler(int statloc) {\n int exitcode = WEXITSTATUS(statloc);\n int bysignal = WIFSIGNALED(statloc);\n if (!bysignal && exitcode == 0) {\n int newfd, oldfd;\n int nwritten;\n char tmpfile[256];\n long long now = ustime();\n ...\n \/* Flush the differences accumulated by the parent to the\n * rewritten AOF. *\/\n snprintf(tmpfile,256,\"temp-rewriteaof-bg-%d.aof\",\n (int)server.bgrewritechildpid);\n newfd = open(tmpfile,O_WRONLY|O_APPEND);\n if (newfd == -1) {\n redisLog(REDIS_WARNING,\n \"Unable to open the temporary AOF produced by the child: %s\", strerror(errno));\n goto cleanup;\n }\n nwritten = write(newfd,server.bgrewritebuf,sdslen(server.bgrewritebuf));\n if (nwritten != (signed)sdslen(server.bgrewritebuf)) {\n if (nwritten == -1) {\n redisLog(REDIS_WARNING,\n \"Error trying to flush the parent diff to the rewritten AOF: %s\", strerror(errno));\n } else {\n redisLog(REDIS_WARNING,\n \"Short write trying to flush the parent diff to the rewritten AOF: %s\", strerror(errno));\n }\n close(newfd);\n goto cleanup;\n }\n redisLog(REDIS_NOTICE,\n \"Parent diff successfully flushed to the rewritten AOF (%lu bytes)\", nwritten);\n \/* The only remaining thing to do is to rename the temporary file to\n * the configured file and switch the file descriptor used to do AOF\n * writes. We don't want close(2) or rename(2) calls to block the\n * server on old file deletion.\n *\n * There are two possible scenarios:\n *\n * 1) AOF is DISABLED and this was a one time rewrite. The temporary\n * file will be renamed to the configured file. When this file already\n * exists, it will be unlinked, which may block the server.\n *\n * 2) AOF is ENABLED and the rewritten AOF will immediately start\n * receiving writes. After the temporary file is renamed to the\n * configured file, the original AOF file descriptor will be closed.\n * Since this will be the last reference to that file, closing it\n * causes the underlying file to be unlinked, which may block the\n * server.\n *\n * To mitigate the blocking effect of the unlink operation (either\n * caused by rename(2) in scenario 1, or by close(2) in scenario 2), we\n * use a background thread to take care of this. First, we\n * make scenario 1 identical to scenario 2 by opening the target file\n * when it exists. The unlink operation after the rename(2) will then\n * be executed upon calling close(2) for its descriptor. Everything to\n * guarantee atomicity for this switch has already happened by then, so\n * we don't care what the outcome or duration of that close operation\n * is, as long as the file descriptor is released again. *\/\n if (server.appendfd == -1) {\n \/* AOF disabled *\/\n \/* Don't care if this fails: oldfd will be -1 and we handle that.\n * One notable case of -1 return is if the old file does\n * not exist. *\/\n oldfd = open(server.appendfilename,O_RDONLY|O_NONBLOCK);\n } else {\n \/* AOF enabled *\/\n oldfd = -1; \/* We'll set this to the current AOF filedes later. *\/\n }\n \/* Rename the temporary file. This will not unlink the target file if\n * it exists, because we reference it with \"oldfd\". *\/\n if (rename(tmpfile,server.appendfilename) == -1) {\n redisLog(REDIS_WARNING,\n \"Error trying to rename the temporary AOF: %s\", strerror(errno));\n close(newfd);\n if (oldfd != -1) close(oldfd);\n goto cleanup;\n }\n if (server.appendfd == -1) {\n \/* AOF disabled, we don't need to set the AOF file descriptor\n * to this new file, so we can close it. *\/\n close(newfd);\n } else {\n \/* AOF enabled, replace the old fd with the new one. *\/\n oldfd = server.appendfd;\n server.appendfd = newfd;\n if (server.appendfsync == APPENDFSYNC_ALWAYS)\n aof_fsync(newfd);\n else if (server.appendfsync == APPENDFSYNC_EVERYSEC)\n aof_background_fsync(newfd);\n server.appendseldb = -1; \/* Make sure SELECT is re-issued *\/\n aofUpdateCurrentSize();\n server.auto_aofrewrite_base_size = server.appendonly_current_size;\n \/* Clear regular AOF buffer since its contents was just written to\n * the new AOF from the background rewrite buffer. *\/\n sdsfree(server.aofbuf);\n server.aofbuf = sdsempty();\n }\n redisLog(REDIS_NOTICE, \"Background AOF rewrite successful\");\n \/* Asynchronously close the overwritten AOF. *\/\n if (oldfd != -1) bioCreateBackgroundJob(REDIS_BIO_CLOSE_FILE,(void*)(long)oldfd,NULL,NULL);\n redisLog(REDIS_VERBOSE,\n \"Background AOF rewrite signal handler took %lldus\", ustime()-now);\n } else if (!bysignal && exitcode != 0) {\n redisLog(REDIS_WARNING,\n \"Background AOF rewrite terminated with error\");\n } else {\n redisLog(REDIS_WARNING,\n \"Background AOF rewrite terminated by signal %d\",\n WTERMSIG(statloc));\n }\ncleanup:\n sdsfree(server.bgrewritebuf);\n server.bgrewritebuf = sdsempty();\n aofRemoveTempFile(server.bgrewritechildpid);\n server.bgrewritechildpid = -1;\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們回到 backgroundRewriteDoneHandler 函數中調用的 aof_background_fsync 函數,在這個函數裏,我們發現了另一個BIO線程(REDIS_BIO_AOF_FSYNC)的任務創建代碼:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nvoid aof_background_fsync(int fd) {\n bioCreateBackgroundJob(REDIS_BIO_AOF_FSYNC,(void*)(long)fd,NULL,NULL);\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"閱讀 bioCreateBackgroundJob 函數的代碼,我們發現 Redis 在寫對應Job類型的任務隊列時加了互斥鎖(mutex),寫完隊列後通過釋放條件變量和互斥鎖,用來激活等待條件變量的 BIO線程,讓 BIO線程繼續執行任務隊列的任務,這樣保證隊列在多線程下的數據一致性(還增加了對應 BIO類型的IO等待計數,暫時我們用不上),而 Redis BIO 線程就是從 BIO 的任務隊列不斷取任務的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ bio.c\nvoid bioCreateBackgroundJob(int type, void *arg1, void *arg2, void *arg3) {\n struct bio_job *job = zmalloc(sizeof(*job));\n job->time = time(NULL);\n job->arg1 = arg1;\n job->arg2 = arg2;\n job->arg3 = arg3;\n pthread_mutex_lock(&bio_mutex[type]);\n listAddNodeTail(bio_jobs[type],job);\n bio_pending[type]++;\n pthread_cond_signal(&bio_condvar[type]);\n pthread_mutex_unlock(&bio_mutex[type]);\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接着我們回到 BIO 線程的主函數 bioProcessBackgroundJobs,我們驗證了 BIO 線程執行邏輯,BIO線程通過等待互斥鎖和條件變量來判斷是否繼續讀取隊列。如前面的註釋所說,在執行 REDIS_BIO_CLOSE_FILE 類型的任務時,調用的是 close(fd) 函數。繼續閱讀代碼,發現在執行 REDIS_BIO_AOF_FSYNC 類型的任務時,調用的是函數 aof_fsync:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ bio.c\nvoid *bioProcessBackgroundJobs(void *arg) {\n struct bio_job *job;\n unsigned long type = (unsigned long) arg;\n pthread_detach(pthread_self());\n pthread_mutex_lock(&bio_mutex[type]);\n while(1) {\n listNode *ln;\n \/* The loop always starts with the lock hold. *\/\n if (listLength(bio_jobs[type]) == 0) {\n pthread_cond_wait(&bio_condvar[type],&bio_mutex[type]);\n continue;\n }\n \/* Pop the job from the queue. *\/\n ln = listFirst(bio_jobs[type]);\n job = ln->value;\n \/* It is now possible to unlock the background system as we know have\n * a stand alone job structure to process.*\/\n pthread_mutex_unlock(&bio_mutex[type]);\n \/* Process the job accordingly to its type. *\/\n if (type == REDIS_BIO_CLOSE_FILE) {\n close((long)job->arg1);\n } else if (type == REDIS_BIO_AOF_FSYNC) {\n aof_fsync((long)job->arg1);\n } else {\n redisPanic(\"Wrong job type in bioProcessBackgroundJobs().\");\n }\n zfree(job);\n \/* Lock again before reiterating the loop, if there are no longer\n * jobs to process we'll block again in pthread_cond_wait(). *\/\n pthread_mutex_lock(&bio_mutex[type]);\n listDelNode(bio_jobs[type],ln);\n bio_pending[type]--;\n }\n}\n\n\n我們繼續看 aof_fsync 的函數定義,發現 aof_fsync 其實就是 fdatasync 和 fsync :\n\n\n\/* Define aof_fsync to fdatasync() in Linux and fsync() for all the rest *\/\n#ifdef __linux__\n#define aof_fsync fdatasync\n#else\n#define aof_fsync fsync\n#endif"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"熟悉 Redis 的朋友知道,這是 Redis 2.4 中 BIO線程關於 Redis AOF 持久性的設計:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用AOF Redis更加持久;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"你有不同的fsync策略:完全不fsync,每秒fsync,每個查詢fsync。使用fsync的默認策略,每秒的寫入性能當然很好(fsync是使用後臺線程執行的,並且當沒有fsync執行時,主線程將盡力執行寫入操作),但是你會損失一秒鐘的寫入數據。——《Redis Persistence》https:\/\/redis.io\/topics\/persistence"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"AOF advantages"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而爲什麼fsync需要使用 BIO線程在後臺執行,其實就很簡單了。因爲 Redis 需要保證數據的持久化,數據寫入文件時,其實只是寫到緩衝區,只有數據刷入磁盤,才能保證數據不會丟失,而 fsync將緩衝區刷入磁盤是一個同步IO操作。所以,在主線程執行緩衝區刷盤的操作,雖然能更好的保證數據的持久化,但是卻會阻塞主線程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,爲了減少阻塞,Redis 使用 BIO線程處理 fsync。但其實這並不意味着 Redis 不再受 fsync 的影響,實際上如果 fsync 過於緩慢(數據2S以上未刷盤),Redis主線程會不計代價的阻塞執行文件寫入(Redis persistence demystified http:\/\/oldblog.antirez.com\/m\/p.php?i=251  #appendfsync everysec)。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Redis BIO線程(Redis 4.0+)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從 Redis 4.0 (2017年)開始,又增加了一個新的BIO線程,我們在 bio.h 中發現了新的定義——BIO_LAZY_FREE,這個線程主要用來協助 Redis 異步釋放內存。在antirez的《Lazy Redis is better Redis》http:\/\/antirez.com\/news\/93中,我們能瞭解到爲什麼要將釋放內存放在異步線程中:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(漸進式回收內存)這是一個很好的技巧,效果很好。但是,我們還是必須在一個線程中執行此操作,這仍然讓我感到很難過。當有很多邏輯需要處理,並且lazy free也非常頻繁時,ops(每秒的操作數)會減少到正常值的65%左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"釋放不同線程中的對象會更簡單:如果有一個線程正忙於僅執行釋放操作,則釋放應該總是比在數據集中添加新值快。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,主線程和lazy free線程之間在調用內存分配器上也存在一些競爭,但是Redis只會花一小部分時間在內存分配上,而將更多的時間花在I\/O,命令分派,緩存未命中等等。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對這個特性背景感興趣的朋友還可以看看這個issue: Lazy free of keys and databases #1748  github.com\/redis\/re...ues\/1748"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ bio.h\n\/* Background job opcodes *\/\n#define BIO_CLOSE_FILE 0 \/* Deferred close(2) syscall. *\/\n#define BIO_AOF_FSYNC 1 \/* Deferred AOF fsync. *\/\n#define BIO_LAZY_FREE 2 \/* Deferred objects freeing. *\/\n#define BIO_NUM_OPS 3\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們回頭看,發現在原來的基礎上,增加了 BIO_LAZY_FREE 的部分。lazy free 的任務有三種:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"釋放對象"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"釋放 Redis Database"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"釋放 跳錶(skip list)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ bio.c\nvoid *bioProcessBackgroundJobs(void *arg) {\n struct bio_job *job;\n unsigned long type = (unsigned long) arg;\n sigset_t sigset;\n \/* Check that the type is within the right interval. *\/\n if (type >= BIO_NUM_OPS) {\n serverLog(LL_WARNING,\n \"Warning: bio thread started with wrong type %lu\",type);\n return NULL;\n }\n \/* Make the thread killable at any time, so that bioKillThreads()\n * can work reliably. *\/\n pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);\n pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);\n pthread_mutex_lock(&bio_mutex[type]);\n \/* Block SIGALRM so we are sure that only the main thread will\n * receive the watchdog signal. *\/\n sigemptyset(&sigset);\n sigaddset(&sigset, SIGALRM);\n if (pthread_sigmask(SIG_BLOCK, &sigset, NULL))\n serverLog(LL_WARNING,\n \"Warning: can't mask SIGALRM in bio.c thread: %s\", strerror(errno));\n while(1) {\n listNode *ln;\n \/* The loop always starts with the lock hold. *\/\n if (listLength(bio_jobs[type]) == 0) {\n pthread_cond_wait(&bio_newjob_cond[type],&bio_mutex[type]);\n continue;\n }\n \/* Pop the job from the queue. *\/\n ln = listFirst(bio_jobs[type]);\n job = ln->value;\n \/* It is now possible to unlock the background system as we know have\n * a stand alone job structure to process.*\/\n pthread_mutex_unlock(&bio_mutex[type]);\n \/* Process the job accordingly to its type. *\/\n if (type == BIO_CLOSE_FILE) {\n close((long)job->arg1);\n } else if (type == BIO_AOF_FSYNC) {\n aof_fsync((long)job->arg1);\n } else if (type == BIO_LAZY_FREE) {\n \/* What we free changes depending on what arguments are set:\n * arg1 -> free the object at pointer.\n * arg2 & arg3 -> free two dictionaries (a Redis DB).\n * only arg3 -> free the skiplist. *\/\n if (job->arg1)\n lazyfreeFreeObjectFromBioThread(job->arg1);\n else if (job->arg2 && job->arg3)\n lazyfreeFreeDatabaseFromBioThread(job->arg2,job->arg3);\n else if (job->arg3)\n lazyfreeFreeSlotsMapFromBioThread(job->arg3);\n } else {\n serverPanic(\"Wrong job type in bioProcessBackgroundJobs().\");\n }\n zfree(job);\n \/* Unblock threads blocked on bioWaitStepOfType() if any. *\/\n pthread_cond_broadcast(&bio_step_cond[type]);\n \/* Lock again before reiterating the loop, if there are no longer\n * jobs to process we'll block again in pthread_cond_wait(). *\/\n pthread_mutex_lock(&bio_mutex[type]);\n listDelNode(bio_jobs[type],ln);\n bio_pending[type]--;\n }\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中釋放對象的主要邏輯在 decrRefCount 中:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ lazyfree.c\n\/* Release objects from the lazyfree thread. It's just decrRefCount()\n * updating the count of objects to release. *\/\nvoid lazyfreeFreeObjectFromBioThread(robj *o) {\n decrRefCount(o);\n atomicDecr(lazyfree_objects,1);\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"按照不同的數據類型,執行不同的內存釋放邏輯:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ object.c\nvoid decrRefCount(robj *o) {\n if (o->refcount == 1) {\n switch(o->type) {\n case OBJ_STRING: freeStringObject(o); break;\n case OBJ_LIST: freeListObject(o); break;\n case OBJ_SET: freeSetObject(o); break;\n case OBJ_ZSET: freeZsetObject(o); break;\n case OBJ_HASH: freeHashObject(o); break;\n case OBJ_MODULE: freeModuleObject(o); break;\n default: serverPanic(\"Unknown object type\"); break;\n }\n zfree(o);\n } else {\n if (o->refcount <= 0) serverPanic(\"decrRefCount against refcount <= 0\");\n if (o->refcount != OBJ_SHARED_REFCOUNT) o->refcount--;\n }\n}"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"擴展"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其他的相關內容就不一一說明了,這裏有一個擴展內容,算是 Redis 開發背後的故事。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我參考學習了文章《Lazy Redis is better Redis》http:\/\/antirez.com\/news\/93,發現其實 antirez 在設計 lazy free 時還是比較糾結的。因爲 lazy free 的特性涉及到了 Redis 本身的內部特性 —— 共享對象 (sharing objects),lazy free 特性的推進受到了共享對象的影響。這裏只說說結論,最後爲了實現 lazy free 的特性,antirez 去掉了共享對象的特性。直到現在 (Redis 6.0),共享對象僅在少部分地方出現,我們追蹤代碼的話,可以發現 robj 結構體的 refcount 目前大部分情況下等於 1。當然還有少部分情況,比如 server.c 中初始化創建整型數字的共享字符串,又或者手動增加計數來降低內存對象的回收速度等等。這就是爲什麼 Redis 明明去掉了共享對象的設計,但是我們還能看到 refcount 相關的代碼,這大概就是歷史遺留原因吧(手動狗頭)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ server.c\n#define OBJ_SHARED_REFCOUNT INT_MAX\ntypedef struct redisObject {\n unsigned type:4;\n unsigned encoding:4;\n unsigned lru:LRU_BITS; \/* LRU time (relative to global lru_clock) or\n * LFU data (least significant 8 bits frequency\n * and most significant 16 bits access time). *\/\n int refcount;\n void *ptr;\n} robj;\n\/\/ server.c\nvoid createSharedObjects(void) {\n......\n for (j = 0; j < OBJ_SHARED_INTEGERS; j++) {\n shared.integers[j] =\n makeObjectShared(createObject(OBJ_STRING,(void*)(long)j));\n shared.integers[j]->encoding = OBJ_ENCODING_INT;\n }\n......\n}"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Redis 網絡IO線程(Redis 6.0+)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從2020年正式發佈的 Redis 6.0 開始開始,Redis增加了與客戶端IO讀寫線程,減輕主線程與客戶端的網絡IO負擔。而實際上,這個設想在2015年開發 lazy free 特性的時候就已經出現了。《Lazy Redis is better Redis》http:\/\/antirez.com\/news\/93 #Not just lazy freeing :"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既然聚合數據類型的值是完全不共享的,並且客戶端輸出緩衝區也不包含共享對象,有很多地方可以利用這一點。例如,最終有可能在 Redis 中實現線程化I\/O,以便由不同的線程爲不同的客戶端提供服務。這意味着我們僅在訪問數據庫時才具有全局鎖定,但是客戶端讀取\/寫入系統調用,甚至解析客戶端發送的指令數據,都可以在不同的線程中進行。這是一種類似 memcached 的設計,我期待去實現和測試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而且,有可能實現對某一線程中的聚合數據類型執行某些慢速操作,只會導致“幾個”鍵被“阻塞”,而所有其他客戶端都可以繼續工作。這可以通過與我們當前使用阻塞操作(請參閱blocking.c)非常相似的方式來實現,此外還可以使用哈希表來存儲當前正在使用哪些鍵以及它使用的客戶端。因此,如果客戶要求使用SMEMBERS之類的東西,就能夠僅鎖定鍵,處理創建輸出緩衝區的請求,然後再次釋放鍵。如果某個鍵被阻塞了,則嘗試訪問同一鍵的客戶端都將被阻塞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有這些都需要進行更大幅度的內部修改,但是最重要的是,我們的禁忌要少一些。我們可以用更少的緩存丟失和更少內存佔用的聚合數據類型,來彌補對象複製的時間,我們現在可以暢想無共享設計的線程化 Redis ,這是唯一可以輕鬆戰勝我們單線程架構的設計。過去,如果爲了實現併發訪問,在數據結構和對象中增加一系列互斥鎖,始終會被視爲一個壞主意。但現在幸運的是,有方法可以兩全其美。我們可現在以仍然像過去那樣,從主線程繼續執行所有快速的操作。而要在性能方面有所收穫,需要增加一些複雜性作爲代價。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上述是 antirez 在《Lazy Redis is better Redis》的 Not just lazy freeing 部分所分享的內容,理解這個,我們就能知道爲何 Redis 要實現 IO 線程化了:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IO單線程時,某些鍵的阻塞操作會阻塞整個線程,而使用多線程,可以實現只有訪問相同鍵的客戶端被阻塞。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"去掉了共享對象,讓IO線程化更加簡單,不再需要向數據結構和對象中增加一系列的互斥鎖來實現多線程,從而保留了Redis單線程的“傳統藝能”。(PS:去掉共享對象,會增加內存的複製,但是也可以帶來內存上更緊湊的數據類型,也因爲內存上更加連續帶來更少的緩存丟失。)"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,我們從 redis server.c 中的main()函數開始,看看IO線程是怎麼運行的。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"IO線程的創建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過 pthread_create 搜索到 initThreadedIO() 函數,然後整理下IO線程的創建過程:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無論是否哨兵模式,Redis都會執行InitServerLast:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nint main(int argc, char **argv) {\n struct timeval tv;\n int j;\n server.supervised = redisIsSupervised(server.supervised_mode);\n int background = server.daemonize && !server.supervised;\n if (background) daemonize();\n ......some log......\n readOOMScoreAdj();\n initServer();\n if (background || server.pidfile) createPidFile();\n redisSetProcTitle(argv[0]);\n redisAsciiArt();\n checkTcpBacklogSettings();\n if (!server.sentinel_mode) {\n moduleLoadFromQueue();\n ACLLoadUsersAtStartup();\n InitServerLast();\n loadDataFromDisk();\n ......\n } else {\n InitServerLast();\n sentinelIsRunning();\n ......\n }\n ......\n redisSetCpuAffinity(server.server_cpulist);\n setOOMScoreAdj(-1);\n aeMain(server.el);\n aeDeleteEventLoop(server.el);\n return 0;\n}\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"initServer()中,Redis會初始化相關的任務隊列,而在InitServerLast中,纔會初始化網絡IO相關的線程資源,因爲Redis的網絡IO多線程是可以配置的。Redis實現了網絡IO多線程,但是網絡IO的邏輯,既可以在ThreadedIO線程執行,也可以在主線程執行,給用戶提供了選擇:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nvoid initServer(void) {\n ......\n \/* Initialization after setting defaults from the config system. *\/\n server.aof_state = server.aof_enabled ? AOF_ON : AOF_OFF;\n server.hz = server.config_hz;\n server.pid = getpid();\n server.in_fork_child = CHILD_TYPE_NONE;\n server.main_thread_id = pthread_self();\n server.current_client = NULL; \/\/ 當前正在執行命令的客戶端\n server.errors = raxNew();\n server.fixed_time_expire = 0;\n server.clients = listCreate(); \/\/ 活躍的客戶端列表\n server.clients_index = raxNew(); \/\/ 按照 client_id 索引的活躍的客戶端字典\n server.clients_to_close = listCreate(); \/\/ 需要異步關閉的客戶端列表\n server.slaves = listCreate();\n server.monitors = listCreate();\n server.clients_pending_write = listCreate(); \/\/ 等待寫或者安裝handler的客戶端列表\n server.clients_pending_read = listCreate(); \/\/ 等待讀socket緩衝區的客戶端列表\n server.clients_timeout_table = raxNew();\n server.replication_allowed = 1;\n server.slaveseldb = -1; \/* Force to emit the first SELECT command. *\/\n server.unblocked_clients = listCreate(); \/\/ 下一個循環之前,要取消阻塞的客戶端列表\n server.ready_keys = listCreate();\n server.clients_waiting_acks = listCreate();\n server.get_ack_from_slaves = 0;\n server.client_pause_type = 0;\n server.paused_clients = listCreate();\n server.events_processed_while_blocked = 0;\n server.system_memory_size = zmalloc_get_memory_size();\n server.blocked_last_cron = 0;\n server.blocking_op_nesting = 0;\n ......\n}\n在 InitServerLast()中 ,除了 initThreadedIO (Redis網絡IO線程),我們還能看到bioInit(background I\/O 初始化),兩個模塊使用了不同的資源:\n\/* Some steps in server initialization need to be done last (after modules\n * are loaded).\n * Specifically, creation of threads due to a race bug in ld.so, in which\n * Thread Local Storage initialization collides with dlopen call.\n * see: https:\/\/sourceware.org\/bugzilla\/show_bug.cgi?id=19329 *\/\nvoid InitServerLast() {\n bioInit();\n initThreadedIO();\n set_jemalloc_bg_thread(server.jemalloc_bg_thread);\n server.initial_memory_usage = zmalloc_used_memory();\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我們來看看 Redis 源碼的 networking.c 文件:io_threads 線程池,io_threads_mutex 互斥鎖,io_threads_pending IO線程客戶端等待數,io_threads_list 每個IO線程的客戶端列表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/* ==========================================================================\n * Threaded I\/O\n * ========================================================================== *\/\n#define IO_THREADS_MAX_NUM 128\n#define IO_THREADS_OP_READ 0\n#define IO_THREADS_OP_WRITE 1\npthread_t io_threads[IO_THREADS_MAX_NUM];\npthread_mutex_t io_threads_mutex[IO_THREADS_MAX_NUM];\nredisAtomic unsigned long io_threads_pending[IO_THREADS_MAX_NUM];\nint io_threads_op; \/* IO_THREADS_OP_WRITE or IO_THREADS_OP_READ. *\/\n\/* This is the list of clients each thread will serve when threaded I\/O is\n * used. We spawn io_threads_num-1 threads, since one is the main thread\n * itself. *\/\nlist *io_threads_list[IO_THREADS_MAX_NUM];"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後就是創建線程的initThreadedIO 函數。初始化的時候IO線程處於未激活狀態,等待後續激活,如果 Redis 配置的 io_threads_num 爲 1,代表IO使用主線程單線程處理,如果線程數配置超過最大值 IO_THREADS_MAX_NUM (128) 則異常退出,最後,創建的線程都將被鎖上直到被喚醒:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/* Initialize the data structures needed for threaded I\/O. *\/\nvoid initThreadedIO(void) {\n server.io_threads_active = 0; \/* We start with threads not active. *\/\n \/* Don't spawn any thread if the user selected a single thread:\n * we'll handle I\/O directly from the main thread. *\/\n if (server.io_threads_num == 1) return;\n if (server.io_threads_num > IO_THREADS_MAX_NUM) {\n serverLog(LL_WARNING,\"Fatal: too many I\/O threads configured. \"\n \"The maximum number is %d.\", IO_THREADS_MAX_NUM);\n exit(1);\n }\n \/* Spawn and initialize the I\/O threads. *\/\n for (int i = 0; i < server.io_threads_num; i++) {\n \/* Things we do for all the threads including the main thread. *\/\n io_threads_list[i] = listCreate();\n if (i == 0) continue; \/* Thread 0 is the main thread. *\/\n \/* Things we do only for the additional threads. *\/\n pthread_t tid;\n pthread_mutex_init(&io_threads_mutex[i],NULL);\n io_threads_pending[i] = 0;\n pthread_mutex_lock(&io_threads_mutex[i]); \/* Thread will be stopped. *\/\n if (pthread_create(&tid,NULL,IOThreadMain,(void*)(long)i) != 0) {\n serverLog(LL_WARNING,\"Fatal: Can't initialize IO thread.\");\n exit(1);\n }\n io_threads[i] = tid;\n }\n}"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"IO線程的工作流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Redis 在啓動時,初始化函數 initServer 將 beforeSleep 和 afterSleep 註冊爲事件循環休眠前和休眠後的handler :"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nvoid initServer(void) {\n......\n server.el = aeCreateEventLoop(server.maxclients+CONFIG_FDSET_INCR);\n......\n \/* Register before and after sleep handlers (note this needs to be done\n * before loading persistence since it is used by processEventsWhileBlocked. *\/\n aeSetBeforeSleepProc(server.el,beforeSleep);\n aeSetAfterSleepProc(server.el,afterSleep);\n......\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事件循環執行 beforeSleep 時,會調用handleClientsWithPendingReadsUsingThreads 和handleClientsWithPendingWritesUsingThreads,分別是IO讀寫任務的分配邏輯。特殊情況下,在AOF和RDB數據恢復(從文件讀取數據到內存)的時候,Redis會通過processEventsWhileBlocked調用 beforeSleep,這個時候,只會執行handleClientsWithPendingReadsUsingThreads ,這個時候IO寫是同步的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/* This function gets called every time Redis is entering the\n * main loop of the event driven library, that is, before to sleep\n * for ready file descriptors.\n *\n * Note: This function is (currently) called from two functions:\n * 1. aeMain - The main server loop\n * 2. processEventsWhileBlocked - Process clients during RDB\/AOF load\n *\n * If it was called from processEventsWhileBlocked we don't want\n * to perform all actions (For example, we don't want to expire\n * keys), but we do need to perform some actions.\n *\n * The most important is freeClientsInAsyncFreeQueue but we also\n * call some other low-risk functions. *\/\nvoid beforeSleep(struct aeEventLoop *eventLoop) {\n......\n \/* Just call a subset of vital functions in case we are re-entering\n * the event loop from processEventsWhileBlocked(). Note that in this\n * case we keep track of the number of events we are processing, since\n * processEventsWhileBlocked() wants to stop ASAP if there are no longer\n * events to handle. *\/\n if (ProcessingEventsWhileBlocked) {\n uint64_t processed = 0;\n processed += handleClientsWithPendingReadsUsingThreads();\n processed += tlsProcessPendingData();\n processed += handleClientsWithPendingWrites();\n processed += freeClientsInAsyncFreeQueue();\n server.events_processed_while_blocked += processed;\n return;\n }\n......\n \/* We should handle pending reads clients ASAP after event loop. *\/\n handleClientsWithPendingReadsUsingThreads();\n......\n \/* Handle writes with pending output buffers. *\/\n handleClientsWithPendingWritesUsingThreads();\n \/* Close clients that need to be closed asynchronous *\/\n freeClientsInAsyncFreeQueue();\n......\n \/* Before we are going to sleep, let the threads access the dataset by\n * releasing the GIL. Redis main thread will not touch anything at this\n * time. *\/\n if (moduleCount()) moduleReleaseGIL();\n \/* Do NOT add anything below moduleReleaseGIL !!! *\/\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在handleClientsWithPendingReadsUsingThreads函數中,Redis會執行IO讀的任務分配邏輯,當Redis配置了IO線程的讀取和解析(io_threads_do_reads),可讀的handler會將普通的客戶端放到客戶端隊列中處理,而不是同步處理。這個函數將隊列分配給IO線程處理,累積讀取buffer中的數據:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IO線程在初始化時未激活,Redis配置了用IO線程讀取和解析數據(io_threads_do_reads),纔會繼續執行;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀取待處理的客戶端列表 clients_pending_read,將任務按照取模平均分配到不同線程的任務隊列io_threads_list[target_id];"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過setIOPendingCount給對應的IO線程設置條件變量,激活IO線程;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"依然在主線程處理一些客戶端請求;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果客戶端等待寫入,並且響應的buffer還有待寫數據,或有待發送給客戶端的響應對象,則給客戶端的連接安裝寫handler;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/* When threaded I\/O is also enabled for the reading + parsing side, the\n * readable handler will just put normal clients into a queue of clients to\n * process (instead of serving them synchronously). This function runs\n * the queue using the I\/O threads, and process them in order to accumulate\n * the reads in the buffers, and also parse the first command available\n * rendering it in the client structures. *\/\nint handleClientsWithPendingReadsUsingThreads(void) {\n \/\/ IO線程在初始化時未激活,Redis配置了用IO線程讀取和解析數據(io_threads_do_reads),纔會繼續執行\n if (!server.io_threads_active || !server.io_threads_do_reads) return 0;\n int processed = listLength(server.clients_pending_read);\n if (processed == 0) return 0;\n \/* Distribute the clients across N different lists. *\/\n \/\/ 讀取待處理的客戶端列表 clients_pending_read,\n \/\/ 將任務按照取模平均分配到不同線程的任務隊列io_threads_list[target_id]\n listIter li;\n listNode *ln;\n listRewind(server.clients_pending_read,&li);\n int item_id = 0;\n while((ln = listNext(&li))) {\n client *c = listNodeValue(ln);\n int target_id = item_id % server.io_threads_num;\n listAddNodeTail(io_threads_list[target_id],c);\n item_id++;\n }\n \/* Give the start condition to the waiting threads, by setting the\n * start condition atomic var. *\/\n \/\/ 通過setIOPendingCount給對應的IO線程設置條件變量,激活IO線程\n io_threads_op = IO_THREADS_OP_READ;\n for (int j = 1; j < server.io_threads_num; j++) {\n int count = listLength(io_threads_list[j]);\n setIOPendingCount(j, count);\n }\n \/* Also use the main thread to process a slice of clients. *\/\n \/\/ 依然在主線程處理一些客戶端請求\n listRewind(io_threads_list[0],&li);\n while((ln = listNext(&li))) {\n client *c = listNodeValue(ln);\n readQueryFromClient(c->conn);\n }\n listEmpty(io_threads_list[0]);\n \/* Wait for all the other threads to end their work. *\/\n while(1) {\n unsigned long pending = 0;\n for (int j = 1; j < server.io_threads_num; j++)\n pending += getIOPendingCount(j);\n if (pending == 0) break;\n }\n \/* Run the list of clients again to process the new buffers. *\/\n while(listLength(server.clients_pending_read)) {\n ln = listFirst(server.clients_pending_read);\n client *c = listNodeValue(ln);\n c->flags &= ~CLIENT_PENDING_READ;\n listDelNode(server.clients_pending_read,ln);\n if (processPendingCommandsAndResetClient(c) == C_ERR) {\n \/* If the client is no longer valid, we avoid\n * processing the client later. So we just go\n * to the next. *\/\n continue;\n }\n processInputBuffer(c);\n \/* We may have pending replies if a thread readQueryFromClient() produced\n * replies and did not install a write handler (it can't).\n *\/\n \/\/ 如果客戶端等待寫入,\n \/\/ 並且響應的buffer還有待寫數據,或有待發送給客戶端的響應對象,\n \/\/ 則給客戶端的連接安裝寫handler\n if (!(c->flags & CLIENT_PENDING_WRITE) && clientHasPendingReplies(c))\n clientInstallWriteHandler(c);\n }\n \/* Update processed count on server *\/\n server.stat_io_reads_processed += processed;\n return processed;\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 handleClientsWithPendingWritesUsingThreads 中,Redis會執行IO線程的啓動,IO線程寫任務的分配等邏輯:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果沒有開啓多線程,或者等待的客戶端數量小於線程數的兩倍,則執行同步代碼;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果 IO 線程沒有激活,則激活(在initThreadedIO函數創建線程時處於未激活狀態);"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果遇到需要關閉的客戶端(CLIENT_CLOSE_ASAP),則將其從待處理的客戶端列表裏刪除;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀取待處理的客戶端列表 clients_pending_write ,將任務按照取模平均分配到不同線程的任務隊列io_threads_list[target_id];"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過setIOPendingCount給對應的IO線程設置條件變量,激活IO線程;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"依然在主線程處理一些客戶端請求;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果響應的buffer還有待寫數據,或者還有待發送給客戶端的響應對象,則給客戶端的連接安裝寫handler;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後調用freeClientAsync 將待釋放的客戶端放入clients_to_close隊列,等待beforeSleep執行freeClientsInAsyncFreeQueue時實現異步釋放客戶端;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nint handleClientsWithPendingWritesUsingThreads(void) {\n int processed = listLength(server.clients_pending_write);\n if (processed == 0) return 0; \/* Return ASAP if there are no clients. *\/\n \/* If I\/O threads are disabled or we have few clients to serve, don't\n * use I\/O threads, but the boring synchronous code. *\/\n \/\/ 如果沒有開啓多線程,或者等待的客戶端數量小於線程數的兩倍,則執行同步代碼\n if (server.io_threads_num == 1 || stopThreadedIOIfNeeded()) {\n return handleClientsWithPendingWrites();\n }\n \/* Start threads if needed. *\/\n \/\/ 如果 IO 線程沒有激活,則激活(在initThreadedIO函數創建線程時處於未激活狀態)\n if (!server.io_threads_active) startThreadedIO();\n \/* Distribute the clients across N different lists. *\/\n listIter li;\n listNode *ln;\n listRewind(server.clients_pending_write,&li);\n int item_id = 0;\n while((ln = listNext(&li))) {\n client *c = listNodeValue(ln);\n c->flags &= ~CLIENT_PENDING_WRITE;\n \/* Remove clients from the list of pending writes since\n * they are going to be closed ASAP. *\/\n \/\/ 如果遇到需要關閉的客戶端(CLIENT_CLOSE_ASAP),則將其從待處理的客戶端列表裏刪除\n if (c->flags & CLIENT_CLOSE_ASAP) {\n listDelNode(server.clients_pending_write, ln);\n continue;\n }\n int target_id = item_id % server.io_threads_num;\n listAddNodeTail(io_threads_list[target_id],c);\n item_id++;\n }\n \/* Give the start condition to the waiting threads, by setting the\n * start condition atomic var. *\/\n \/\/ 通過setIOPendingCount給對應的IO線程設置條件變量,激活IO線程\n io_threads_op = IO_THREADS_OP_WRITE;\n for (int j = 1; j < server.io_threads_num; j++) {\n int count = listLength(io_threads_list[j]);\n setIOPendingCount(j, count);\n }\n \n \/* Also use the main thread to process a slice of clients. *\/\n \/\/ 依然在主線程處理一些客戶端請求\n listRewind(io_threads_list[0],&li);\n while((ln = listNext(&li))) {\n client *c = listNodeValue(ln);\n writeToClient(c,0);\n }\n listEmpty(io_threads_list[0]);\n \/* Wait for all the other threads to end their work. *\/\n while(1) {\n unsigned long pending = 0;\n for (int j = 1; j < server.io_threads_num; j++)\n pending += getIOPendingCount(j);\n if (pending == 0) break;\n }\n \/* Run the list of clients again to install the write handler where\n * needed. *\/\n listRewind(server.clients_pending_write,&li);\n while((ln = listNext(&li))) {\n client *c = listNodeValue(ln);\n \/* Install the write handler if there are pending writes in some\n * of the clients. *\/\n \/\/ 如果響應的buffer還有待寫數據,或者還有待發送給客戶端的響應對象,\n \/\/ 則給客戶端的連接安裝寫handler\n if (clientHasPendingReplies(c) &&\n connSetWriteHandler(c->conn, sendReplyToClient) == AE_ERR)\n {\n \/\/ 將待釋放的客戶端放入clients_to_close隊列,\n \/\/ 等待beforeSleep執行freeClientsInAsyncFreeQueue時實現異步釋放客戶端\n freeClientAsync(c);\n }\n }\n listEmpty(server.clients_pending_write);\n \/* Update processed count on server *\/\n server.stat_io_writes_processed += processed;\n return processed;\n}"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"IO線程的主邏輯"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 IOThreadMain 函數中,是 Redis IO線程的主邏輯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們發現IO線程在創建後,會通過redisSetCpuAffinity函數和server_cpulist參數,來設置線程的CPU的親和性,合理配置線程的CPU親和性,能夠一定程度上提升性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之後,IO線程會根據條件變量 io_threads_pending[id] 判斷是否有等待的IO需要處理,然後從 io_threads_list[myid] 中獲取分給自己的 client,再根據 io_thread_op 來判斷,這個時候需要執行讀寫IO中的哪一個, readQueryFromClient 還是 writeToClient :"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nvoid *IOThreadMain(void *myid) {\n \/* The ID is the thread number (from 0 to server.iothreads_num-1), and is\n * used by the thread to just manipulate a single sub-array of clients. *\/\n long id = (unsigned long)myid;\n char thdname[16];\n snprintf(thdname, sizeof(thdname), \"io_thd_%ld\", id);\n redis_set_thread_title(thdname);\n redisSetCpuAffinity(server.server_cpulist);\n makeThreadKillable();\n while(1) {\n \/* Wait for start *\/\n for (int j = 0; j < 1000000; j++) {\n if (io_threads_pending[id] != 0) break;\n }\n \/* Give the main thread a chance to stop this thread. *\/\n if (io_threads_pending[id] == 0) {\n pthread_mutex_lock(&io_threads_mutex[id]);\n pthread_mutex_unlock(&io_threads_mutex[id]);\n continue;\n }\n serverAssert(io_threads_pending[id] != 0);\n if (tio_debug) printf(\"[%ld] %d to handle\\n\", id, (int)listLength(io_threads_list[id]));\n \/* Process: note that the main thread will never touch our list\n * before we drop the pending count to 0. *\/\n listIter li;\n listNode *ln;\n listRewind(io_threads_list[id],&li);\n while((ln = listNext(&li))) {\n client *c = listNodeValue(ln);\n if (io_threads_op == IO_THREADS_OP_WRITE) {\n writeToClient(c,0);\n } else if (io_threads_op == IO_THREADS_OP_READ) {\n readQueryFromClient(c->conn);\n } else {\n serverPanic(\"io_threads_op value is unknown\");\n }\n }\n listEmpty(io_threads_list[id]);\n io_threads_pending[id] = 0;\n if (tio_debug) printf(\"[%ld] Done\\n\", id);\n }\n}"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從Redis VM開始,到Redis BIO,再到最後的IO多線程,我們能看到 Redis 正在逐漸的向線程化的方向發展。特別是在實現Lazy Free之後(Redis BIO),antirez似乎嚐到了多線程的好處,在保證db操作單線程的情況下,讓Redis發揮CPU一部分多核多線程的實力。我們不難發現,Redis 的多線程不過是順勢而爲罷了,如果單線程沒有瓶頸,就不會產生使用多線程的Redis。再結合現狀來看,畢竟時代變了,從多年前的單核服務器,到後來的雙核,四核服務器,再到現在動輒八核,十六核的服務器:單線程模型固然簡單,代碼清晰,但是在摩爾定律失效,多核多線程的時代洪流下,有誰能夠拒絕多線程的好處呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Insutanto,"},{"type":"text","text":"一個普通的編程手藝人。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:dbaplus社羣(ID:dbaplus)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/pno27D0FcwfiCxv6iM0leA","title":"xxx","type":null},"content":[{"type":"text","text":"Redis單線程不行了,快來割VM\/ BIO\/ IO多線程的韭菜!(附源碼)"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章