閒魚社區如何快準穩的完成無縫數據遷移

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在內容社區中,內容標籤用來輔助說明內容,內容標籤對內容分發和理解具有重要作用,能夠幫助內容社區將對的內容分發到對的人。目前閒魚會玩社區的標籤分爲分類和屬性標籤兩種,兩種標籤的用法不同,但是目前二者都是存在於同一個標籤系統中,這樣會出現打錯標和屬性與分類無法獨立擴展的問題。爲了解決以上問題,現在要將原標籤系統拆分爲分類系統和屬性系統。整個過程,不僅需要遷移底層數據,還需要將屬性服務從原標籤系統遷移到新屬性系統。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ac\/acc0317792709d3908639fb990c31d22.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"現狀"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下圖所示,閒魚會玩社區對屬性標籤依賴非常之多,除了自身的業務系統之外,還有圖中標藍的算法、搜索、數據等也會依賴。遷移過程中需要保證這些上層依賴不受影響。另外,業務方希望能夠儘早啓用屬性系統,儘快提高打標效率。總結起來,遷移過程中面臨瞭如下的挑戰。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"快,短時間內遷移完成,儘早使用新屬性系統"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩,依賴衆多,在遷移時需要保證屬性服務可用,不影響上層業務,用戶無感知"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"準,遷移最最最基本的要求就是要保證數據正確"}]}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/17\/17bcc9649072a7745d7cc1a627b899d7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"遷移方案"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們按照影響面從小到大的順序,制定了1.底層存儲同步、2.建設隔離層、3.業務讀寫遷移、4.依賴遷移、5.啓用新屬性系統共五步。其中依賴是指搜索、算法和數據,這些依賴影響面較廣,遷移需要更加慎重,需要有己方業務遷移正確這個前提作爲保證。因此依賴遷移需要單獨列爲一部分,且放在己方業務遷移完成後進行。遷移中的每一步遷移需要保證上一步正確,且出現問題,每一步都可以回滾回上一個正確狀態。爲了保證數據的一致性,我們全程都有數據校對服務。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/11\/11cc6dedde25e3c73d80d773957f083b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"遷移過程"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"存儲遷移"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個遷移過程,我們首先先從底層存儲開始同步,爲了保證原標籤系統和屬性系統中的屬性標籤相同。實現這個目標需要從兩方面做起,一方面是全量同步,就是將全部數據同步一遍。一方面是實時增量同步,一旦有新的屬性標籤被寫入原標籤系統中,那麼屬性系統中也要同步一份。這個套路比較固定。下面是我們全量同步的數據處理過程。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e9\/e9362e4ee840f030a0e9acc24818e4c8.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中同步任務我們直接使用了 阿里雲 提供的 scheduleX 分佈式任務工具。全量同步過程中要注意,控制同步速率、配置告警、做好異常記錄和完成進度記錄。控制同步速率是需要避免把依賴服務系統打掛,影響其他正常業務。配置告警是爲了及時發現異常,及時排查。其中異常記錄是爲了排查問題的,必然要記錄的。完成進度記錄也是非常重要的,一旦出現問題,修復完了要繼續時,可以直接從上次記錄的同步位點開始同步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"講完了全量同步,下面講一下實時增量拆分的過程,數據流過程如下圖所示。即一旦觸發了原標籤系統變更,會實時將屬性標的變化同步到屬性系統。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f4\/f482165b3699dd03a6b98d6903a46e87.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"隔離層建設"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"屏蔽遷移細節"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"做了存儲遷移,那麼現在屬性系統和原標籤系統中的屬性標相同了,下面可以做業務讀遷移了。業務遷移,這是遷移過程中工作量最大最複雜的地方,需要梳理清楚依賴很多。經過梳理髮現,多個應用多個業務場景都在使用原標籤系統。假如逐個遷移改造,一方面重複性工作比較多,需要重複驗證的功能點多,另一方面,遷移細節無法統一管理,一旦有問題,需要逐個應用修改。所以在做業務遷移之前,我們做了一件事情,就是將散落在多個應用的讀寫原標籤系統的操作都收歸到了一個 jar 包中,然後在該 jar 中統一控制讀寫切換。jar 包相當於一個隔離層,隔離了上層業務和屬性標籤的存儲邏輯。這樣上層業務系統不必關心屬性標是從原標籤系統讀取的抑或是從屬性系統中讀取的,只管從隔離層提供的方法中讀取即可。有了隔離層統一管理,可以同時統一遷移,減少不必要的重複工作。具體的變化從 a 圖變換到 b 圖。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fa\/fa677773ed90084b5869cc8180d9e3ef.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"接口適配 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於分類系統和屬性系統兩種標籤的業務語義和功能不同,內容中臺對這兩個系統的領域建模也不同,對應的數據服務接口能力也不同,爲了讓上層業務對此無感知,我們保持原服務接口不變,在隔離層做了對於分類系統和屬性系統兩個系統的適配。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"業務讀寫遷移"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"讀遷移"},{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前置工作做完之後,我們開始具體的讀遷移。讀遷移是最好做的,可以先按照比例切流,逐步放量從屬性系統中讀取屬性標,一旦發現有問題,那麼直接將流量切回讀原標籤系統 ,沒有問題,就直至全量。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/be\/becc2e84d881ba2d313f306313d64138.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"寫遷移 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"寫遷移是很難一刀切的,一方面,即使是經過了梳理,還是可能會有遺漏的地方。另一方面,還有部分依賴方的寫入(比如內容中臺的標籤寫入)不在我們這邊,暫時無法遷移。假如己方業務的寫遷移了,那麼己方業務只會寫到屬性系統 ,而沒遷移的依賴方只會寫到原標籤系統。這會造成原標籤系統和屬性系統的屬性標數據不一致。這導致兩方面的問題,一方面,影響依賴方讀,依賴方讀的還是原標籤系統,讀取不到屬性系統上的屬性標。另一方面,兩方不一致的話,如果屬性系統出現問題,那麼寫無法即時回滾到原標籤系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此在遷移過程中,還需要保證原標籤系統和屬性系統上的屬性標時刻是相同的,在存儲拆分中,已經有了 原標籤系統-> 屬性系統的同步鏈路,現在還需要將只寫入屬性系統的屬性標同步回原標籤系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼現在有 原標籤系統->屬性系統 和 屬性系統 -> 原標籤系統 兩條同步鏈路。如果不做任何控制的話,明顯會出現同步死循環原標籤系統->屬性系統->原標籤系統。解決這種雙向同步死循環的一般辦法是加標誌,用標誌標識這次數據變更是源自哪一方,當發現自己的更新消息迴流到了自己這裏,那麼就不再更新,這樣就切斷了更新的死循環。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b4\/b432a6b655b3c93d5081d1f8322a3158.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"依賴方遷移"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當己方業務能夠保證無誤的時候,那麼可以做依賴遷移了。依賴方一般爲算法、搜索和數據等。鏈路分爲實時鏈路和離線鏈路。對於實時鏈路,這部分工作因爲有了前面的業務讀遷移,可以複用;對於離線鏈路,提供好離線數據表給到相關依賴方,讓他們按時遷移完成即可。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"啓用新屬性系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"依賴方也遷移完成後,經過一段時間的觀察,沒有問題,便可以使用新的屬性標籤系統來打標了。啓用新系統打標的之後,便不再需要 原標籤系統->屬性系統 的同步鏈路了,可以停止。後續再觀察一段時間後,確認使用屬性系統無問題後,便不再有將讀寫切回原標籤系統的必要了,可以再切斷 屬性系統 -> 原標籤系統 的同步鏈路,至此遷移完成。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據校對任務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在整個同步過程中,還要有對賬任務來實時查看同步的數據是否正確。如果發現數據不正確,要及時找到原因,修正,然後做好數據矯正。"}]},{"type":"heading","attrs":{"align":null,"level":2}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/81\/814aef7e0d8c0a3190871984bbe17d73.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結與思考"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次遷移排除依賴方遷移所用時間外,共計兩週。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遷移過程中屬性服務正常。零客訴輿情。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遷移後,數據零差異。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後捋一下本次遷移中的經驗點"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遷移過程中,如果服務依賴衆多,可以用隔離層來隔離依賴對遷移的感知。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雙寫,過程中避免出現同步死循環,可以通過給更新消息加標誌避免。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:閒魚技術(ID:XYtech_Alibaba)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/mAwaZHiFDyh-iir3Tf4zPQ","title":"xxx","type":null},"content":[{"type":"text","text":"閒魚社區如何快準穩的完成無縫數據遷移"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章