如何快速實現一個定時器?

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、什麼是定時器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定時器(Timer)是一種在指定時間開始執行某一任務的工具(也有周期性反覆執行某一任務的Timer,我們這裏暫不討論)。它常常與延遲隊列這一概念關聯。那麼在什麼場景下我才需要使用定時器呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先看看以下業務場景:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當訂單一直處於未支付狀態時,如何及時的關閉訂單,並退還庫存?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何定期檢查處於退款狀態的訂單是否已經退款成功?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"新創建店鋪,N天內沒有上傳商品,系統如何知道該信息,併發送激活短信?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決以上問題,最簡單直接的辦法就是定時去掃表。每個業務都要維護一個自己的掃表邏輯。當業務越來越多時,我們會發現掃表部分的邏輯會非常類似。我們可以考慮將這部分邏輯從具體的業務邏輯裏面抽出來,變成一個公共的部分。這個時候定時器就出場了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、定時器的本質"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個定時器本質上是這樣的一個數據結構:deadline越近的任務擁有越高優先級,提供以下幾種基本操作:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. Add 新增任務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. Delete 刪除任務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. Run 執行到期的任務\/到期通知對應業務處理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. Update 更新到期時間 (可選)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Run通常有兩種工作方式:1.輪詢,每隔一個時間片就去查找哪些任務已經到期;2.睡眠\/喚醒,不停地查找deadline最近的任務,如到期則執行;否則sleep直到其到期。在sleep期間,如果有任務被Add或Delete,則deadline最近的任務有可能改變,線程會被喚醒並重新進行1的邏輯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它的設計目標通常包含以下幾點要求:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 支持任務提交(消息發佈)、任務刪除、任務通知(消息訂閱)等基本功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 消息傳輸可靠性:消息進入延遲隊列以後,保證至少被消費一次(到期通知保證At-least-once ,追求Exactly-once)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 數據可靠性:數據需要持久化,防止丟失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. 高可用性:至少得支持多實例部署。掛掉一個實例後,還有後備實例繼續提供服務,可橫向擴展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5. 實時性:盡最大努力準時交付信息,允許存在一定的時間誤差,誤差範圍可控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、數據結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們談談定時器的數據結構。定時器通常與延遲隊列密不可分,延時隊列是什麼?顧名思義它是一種帶有延遲功能的消息隊列。而延遲隊列底層通常可以採用以下幾種數據結構之一來實現:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 有序鏈表,這個最直觀,最好理解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 堆,應用實例如Java JDK中的DelayQueue、Go內置的定時器等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 時間輪\/多級時間輪,應用實例如Linux內核定時器、Netty工具類HashedWheelTimer、Kafka內部定時器等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏重點介紹一下時間輪(TimeWheel)。一個時間輪是一個環形結構,可以想象成時鐘,分爲很多格子,一個格子代表一段時間(越短Timer精度越高),並用一個List保存在該格子上到期的所有任務,同時一個指針隨着時間流逝一格一格轉動,並執行對應List中所有到期的任務。任務通過取模決定應該放入哪個格子。示意圖如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/57\/57484b7ac1c292798babb256dac7e246.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"時間輪"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果任務的時間跨度很大,數量也多,傳統的單輪時間輪會造成任務的round很大,單個格子的任務List很長,並會維持很長一段時間。這時可將Wheel按時間粒度分級(與水錶的思想很像),示意圖如下所示:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9b\/9b9c2ec5761ce2272731dfa0df3f4b54.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"多級時間輪"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時間輪是一種比較優雅的實現方式,且如果採用多級時間輪時其效率也是比較高的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、業界實現方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業界對於定時器\/延時隊列的工程實踐,則通常基於以下幾種方案來實現:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 基於Redis ZSet實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 採用某些自帶延時選項的隊列實現,如RabbitMQ、Beanstalkd、騰訊TDMQ等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 基於Timing-Wheel時間輪算法實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、方案詳述"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"介紹完定時器的背景知識,接下來看下我們系統的實現。我們先看一下需求背景。在我們組的實際業務中,有延遲任務的需求。一種典型的應用場景是:商戶發起扣費請求後,立刻爲用戶下發扣費前通知,24小時後完成扣費;或者發券給用戶,3天后通知用戶券過期。基於這種需求背景,我們引出了定時器的開發需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們首先調研了公司內外的定時器實現,避免重複造輪子。調研了諸如例如公司外部的Quartz、有讚的延時隊列等,以及公司內部的PCG tikker、TDMQ等,以及微信支付內部包括營銷、代扣、支付分等團隊的一些實現方案。最後從可用性、可靠性、易用性、時效性以及代碼風格、運維代價等角度考慮,我們決定參考前人的一些優秀的技術方案,並根據我們團隊的技術積累和組件情況,設計和實現一套定時器方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先要確定定時器的存儲數據結構。這裏借鑑了時間輪的思想,基於微信團隊最常用的存儲組件tablekv進行任務的持久化存儲。使用到tablekv的原因是它天然支持按uin分表,分表數可以做到千萬級別以上;其次其單表支持的記錄數非常高,讀寫效率也很高,還可以如mysql一樣按指定的條件篩選任務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們的目標是實現秒級時間戳精度,任務到期只需要單次通知業務方。故我們方案主要的思路是基於tablekv"},{"type":"text","marks":[{"type":"strong"}],"text":"按任務執行時間分表"},{"type":"text","text":",也就是使用使用方指定的start_time(時間戳)作爲分表的uin,也即是時間輪bucket。爲什麼不使用多輪時間輪?主要是因爲首先kv支持單表上億數據, 其二kv分表數可以非常多,例如我們使用1000萬個分表需要約115天的間隔纔會被哈希分配到同一分表內。故暫時不需要使用到多輪時間輪。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終我們採用的分表數爲1000w,uin=時間戳mod分表數。這裏有一個注意點,通過mod分表數進行Key收斂, 是爲了避免時間戳遞增導致的key無限擴張的問題。示例圖如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a2\/a25abc544c8e99a8a3992a22da8ec173.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"kv時間輪"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務持久化存儲之後,我們採用一個Daemon程序執行定期掃表任務,將到期的任務取出,最後將請求中帶的業務信息(biz_data添加任務時帶來,定時器透傳,不關注其具體內容)回調通知業務方。這麼一看流程還是很簡單的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏掃描的流程類似上面講的時間輪算法,會有一個指針(我們在這裏不妨稱之爲time_pointer)不斷向後移動,保證不會漏掉任何一個bucket的任務。這裏我們採用的是commkv(可以簡單理解爲可以按照key-value形式讀寫的kv,其底層仍是基於tablekv實現)存儲CurrentTime,也就是當前處理到的時間戳。每次輪詢時Daemon都會通過GetByKey接口獲取到CurrentTime,若大於當前機器時間,則sleep一段時間。若小於等於當前機器時間,則取出tablekv中以CurrentTime爲uin的分表的TaskList進行處理。本次輪詢結束,則CurrentTime加一,再通過SetByKey設置回commkv。這個部分的工作模式我們可以簡稱爲Scheduler。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Scheduler拿到任務後只需要回調通知業務方即可。如果採用同步通知業務方的方式,由於業務方的超時情況是不可控的,則一個任務的投遞時間可能會較長,導致拖慢這個時間點的任務整體通知進度。故而這裏自然而然想到採用"},{"type":"text","marks":[{"type":"strong"}],"text":"異步解耦"},{"type":"text","text":"的方式。即將任務發佈至事件中心(微信內部的高可用、高可靠的消息平臺,支持事務和非事務消息。由於一個任務的投遞到事件中心的時間僅爲幾十ms,理論上任務量級不大時1s內都可以處理完。此時time_pointer會緊跟當前時間戳。當大量任務需要處理時,需要採用多線程\/多協程的方式併發處理,保證任務的準時交付。broker訂閱事件中心的消息,接受到消息後由broker回調通知業務方,故broker也充當了Notifier的角色。整體架構圖如下所示:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e9\/e987f5872606e594c53f9f78878414c7.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"架構圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要模塊包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"任務掃描Daemon"},{"type":"text","text":":充當Scheduler的角色。掃描所有到期任務,投遞到事件中心,讓它通知broker,由broker的Notifier通知業務方。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"定時器broker"},{"type":"text","text":":集業務接入、Notifier兩者功能於一身。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務狀態圖如下所示,只有兩種狀態。當任務插入kv成功時即爲pending狀態,當任務成功被取出並通知業務方成功時即爲finish狀態。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a8\/a8b653eec936a449b7261bbc11bb4ca0.png","alt":"Image","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"狀態圖"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"六、實現細節與難點思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面就上面的方案涉及的幾個技術細節進行進一步的解釋。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1. 業務隔離"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過biz_type定義不同的業務類型,不同的biz_type可以定義不同的優先級(目前暫未支持),任務中保存biz_type信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務信息(主鍵爲biz_type)採用境外配置中心進行配置管理。方便新業務的接入和配置變更。業務接入時,需要在配置中添加諸如回調通知信息、回調重試次數限制、回調限頻等參數。業務隔離的目的在於使各個接入業務不受其他業務的影響,這一點由於目前我們的定時器用於支持本團隊內部業務的特點,僅採取對不同的業務執行不同業務限頻規則的策略,並未做太多優化工作,就不詳述了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2. 時間輪空轉問題"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於1000w分表,肯定是大部分Bucket爲空,時間輪的指針推進存在低效問題。聯想到在飯店排號時,常有店員來登記現場尚存的號碼,就是因爲可以跳過一些號碼,加快叫號進度。同理,爲了減少這種“空推進”,Kafka引入了DelayQueue,以bucket爲單位入隊,每當有bucket到期,即queue.poll能拿到結果時,才進行時間的“推進”,減少了線程空轉的開銷。在這裏類似的,我們也可以做一個優化,維護一個有序隊列,保存表不爲空的時間戳。大家可以思考一下如何實現,具體方案不再詳述。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3. 限頻"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於定時器需要寫kv,還需要回調通知業務方。因此需要考慮對調用下游服務做限頻,保證下游服務不會雪崩。這是一個"},{"type":"text","marks":[{"type":"strong"}],"text":"分佈式限頻"},{"type":"text","text":"的問題。這裏使用到的是微信支付的限頻組件。保證1.任務插入時不超過定時器管理員配置的頻率。2.Notifier回調通知業務方時不超過業務方申請接入時配置的頻率。這裏保證了1.kv和事件中心不會壓力太大。2.下游業務方不會受到超過其處理能力的請求量的衝擊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4. 分佈式單實例容災"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"出於容災的目的,我們希望Daemon具有容災能力。換言之若有Daemon實例異常掛起或退出,其他機器的實例進程可以繼續執行任務。但同時我們又希望同一時刻只需要一個實例運行,即“分佈式單實例”。所以我們完整的需求可以歸納爲"},{"type":"text","marks":[{"type":"strong"}],"text":"“分佈式單實例容災部署”"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實現這一目標,方式有很多種,例如:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接入“調度中心”,由調度中心來負責調度各個機器;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"各節點在執行任務前先分佈式搶鎖,只有成功佔用鎖資源的節點才能執行任務;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"各節點通過通信選出“master\"來執行邏輯,並通過心跳包持續通信,若“master”掉線,則備機取代成爲master繼續執行。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要從開發成本,運維支撐兩方面來考慮,選取了基於chubby分佈式鎖的方案來實現單實例容災部署。這也使得我們真正執行業務邏輯的機器具有隨機性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"5. 可靠交付"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是一個核心問題,如何保證任務的通知滿足At-least-once的要求?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們系統主要通過以下兩種方式來保證。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.任務達到時即存入tablekv持久化存儲,任務成功通知業務方纔設置過期(保留一段時間後刪除),故而所有任務都是落地數據,保證事後可以對賬。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.引入可靠事件中心。在這裏使用的是事件中心的普通消息,而非事務消息。實質是當做一個高可用性的消息隊列。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏引入消息隊列的意義在於:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將任務調度和任務執行解耦(調度服務並不需要關心任務執行結果)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"異步化,保證調度服務的高效執行,調度服務的執行是以ms爲單位。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"藉助消息隊列實現任務的可靠消費。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事件中心相比普通的消息隊列還具有哪些優點呢?"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某些消息隊列可能丟消息(由其實現機制決定),而事件中心本身底層的分佈式架構,使得事件中心保證極高的可用性和可靠性,基本可以忽略丟消息的情況。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事件中心支持按照配置的不同事件梯度進行多次重試(回調時間可以配置)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事件中心可以根據自定義業務ID進行消息去重。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事件中心的引入,基本保證了任務從Scheduler到Notifier的可靠性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,最爲完備的方式,是增加另一個異步Daemon作爲兜底策略,掃出所有超時還未交付的任務進行投遞。這裏思路較爲簡單,不再詳述。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"6. 及時交付"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若同一時間點有大量任務需要處理,如果採用串行發佈至事件中心,則仍可能導致任務的回調通知不及時。這裏自然而然想到採用多線程\/多協程的方式併發處理。在本系統中,我們使用到了微信的BatchTask庫,BatchTask是這樣一個庫,它把每一個需要併發執行的RPC任務封裝成一個函數閉包(返回值+執行函數+參數),然後調度協程(BatchTask的底層協程爲libco)去執行這些任務。對於已有的同步函數,可以很方便的通過BatchTask的Api去實現任務的批量執行。Daemon將發佈事件的任務提交到BatchTask創建的線程池+協程池(線程和協程數可以根據參數調整)中,充分利用流水線和併發,可以將任務List處理的整體時延大大縮短,盡最大努力及時通知業務方。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"7. 任務過期刪除"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從節省存儲資源考慮,任務通知業務成功後應當刪除。但刪除應該是一個異步的過程,因爲還需要保留一段時間方便查詢日誌等。這種情況,通常的實現方式是啓動一個Daemon異步刪除已完成的任務。我們系統中,是利用了tablekv的自動刪除機制,回調通知業務完成後,除了設置任務狀態爲完成外,同時通過tablekv的update接口設置kv的過期時間爲1個月,避免了異步Daemon掃表刪除任務,簡化了實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"8. 其他風險項"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.由於time_pointer的CurrentTime初始值置爲首次運行的Daemon實例的機器時間,而每次輪詢時都會對比當前Daemon實例的機器時間與CurrentTime的差別,故機器時間出錯可能會影響任務的正常調度。這裏考慮到現網機器均有時間校正腳本在跑,這個問題基本可以忽略。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.本系統的架構對事件中心構成了強依賴。定時器的可用性和可靠性依賴於事件中心的可用性和可靠性。雖然目前事件中心的可用性和可靠性都非常高,但如果要考慮所有異常情況,則事件中心的短暫不可用、或者對於訂閱者消息出隊的延遲和堆積,都是需要正視的問題。一個解決方案是使用MQ做雙鏈路的消息投遞,解決對於事件中心單點依賴的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏的定時器服務目前僅用於支持境外的定時器需求,調用量級尚不大,已可滿足業務基本要求。如果要支撐更高的任務量級,還需要做更多的思考和優化。隨時歡迎大家和我交流探討。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後打個廣告,境外支付團隊在不斷追求卓越的路上尋找同路人,歡迎加入我們的團隊(點擊下方鏈接加入~)"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章