長在雲原生架構上的小紅書

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採訪嘉賓 | 張雷、高飛"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“內容社區”已經成爲小紅書最被外界認可的“標籤”,而作爲一家互聯網公司,支撐起社區運營、有着很多員工的技術團隊更是不可被忽視的存在。小紅書的技術團隊又細分爲後端基礎架構、SRE、大數據、AI算法、端技術、音視頻技術和安全技術等團隊。其中業務技術團隊直接承接業務需求,幫助業務快速奔跑。中臺技術團隊則側重更基礎和長遠的技術研發,並將新技術融合進業務當中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正是小紅書技術團隊的多年努力,造就了當前小紅書“土生土長”的雲原生架構,並研發了更爲先進的多樣化內容分發算法。本文,InfoQ專訪了小紅書技術負責人張雷和系統架構負責人高飛,揭開小紅書技術團隊的神祕面紗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"長在雲原生架構上"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"小紅書的雲原生架構歷史可以追溯到這個公司成立之前。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"2013年,"},{"type":"text","text":"Pivotal公司的Matt Stine正式提出了雲原生的概念,雲原生開始大規模出現在公衆視野,市場上也有了可用的IaaS產品。同年"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"6月,小紅書在上海成立。"},{"type":"text","text":"成立之初的小紅書在搭建系統架構時便選擇了同樣剛剛發展起來的雲原生,這也成爲了小紅書架構與衆不同的地方。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“創業之初,選擇雲其實是很自然的事情。”小紅書技術負責人張雷說道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從成本角度考慮,起步階段自建IDC太重。自建機房、自己管理及其高昂的運維成本,這些對於剛剛成立的企業來說是很大的負擔,而云可以幫助企業省去這些費用。從業務角度看,雖然當時小紅書並未上線電商業務,但電商卻是計劃中必定要做的事情。電商業務的特點之一就是對系統的彈性要求很高。對一家創業公司來說,爲了滿足促銷活動帶來的臨時性資源需求而購買機器是不合理的,雲卻可以很好地解決這個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如今,小紅書的基礎技術部會承擔很多雲原生技術的研發,這個部門包含中間件、存儲、緩存、DB、SRE和質量保障等不同團隊。土生土長在雲上的架構讓小紅書擁有很好的先發優勢,也使得團隊在對新技術的採用上少了很多後顧之憂。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過8年不斷地發展和升級,小紅書整體架構的容器化率已經達到80%,架構整體迭代效率大幅提高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲原生架構帶來的不止是自身迭代速度的提高,對於小紅書這樣對算法要求較高的企業來說,其算法模型從實驗到上線的速度也得到了極大提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,得益於雲原生架構,任何基於容器化的技術都可以實現硬件資源隔離。算法的模型訓練和線上服務大量採用容器化技術後,研發人員可以不用太關心開發環境問題,而是更專注在算法迭代上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,合理的架構與硬件的有機結合可以釋放更多算力。隨着GPU、分佈式計算架構等利用率的提高,算力也得到極大提高。這些算力無論在訓練階段還是線上服務階段都可以提供更大的發揮空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,實時流式數據能夠更好地支持算法的時效性。之前傳統架構上的算法模型的更新時效性要以“天”爲單位,但小紅書基於流式數據,通過使用Kafka和Flink,算法模型的更新時效性達到了“分鐘”級別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"自研多樣化內容分發算法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲一個多元化的生活方式分享平臺,如何做好內容分發是小紅書技術團隊面臨的重要考驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常情況下,內容分發流程是這樣的:用戶上傳內容,之後平臺做內容理解,理解後做內容審覈,審覈合格的內容進入內容分發系統,分發系統進行推薦、搜索等操作,這些內容會得到曝光並被用戶看見,用戶再與這些內容產生交互,如點贊、分享或者評論等。系統通過捕捉到的交互行爲,對用戶喜好進行分析,之後再優化其內容分發策略。張雷表示,小紅書與其他公司不同的地方在於,小紅書在每個步驟上都更加重視多樣化內容的曝光。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在張雷團隊看來,生活方式有成千上萬種,但與商品不同,生活方式不會存在“爆款”。因此,小紅書擁抱多樣化的內容分發是順理成章的事情。“如果只對頭部內容進行分發,那麼曝光量多的內容會得到更多的曝光,而用戶能感知到的多樣性內容就會變少。中長尾內容的多樣化分發對於生活方式平臺是非常關鍵的。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,整個過程中存在兩個主要的技術難點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一,對於中長尾內容的表示和相似性度量。中長尾內容跟頭部內容相比,用戶交互數據更稀疏。純基於內容(CB, Content Based)的相似性度量依賴大量標註數據且不一定能反映用戶感知的相似性。而稀疏的用戶交互數據使得基於協同過濾(CF, Collaborative Filtering)的相似性度量方法也不太準確。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二,如何去從技術上捕捉和表示用戶對內容多樣性的感知。之前經典的DPP方法是一個基於內容集合的方法,對內容出現的順序不敏感。但是在App實際使用過程中,不同的內容出現順序會很大程度上影響用戶對多樣性的感知。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對上述問題,小紅書採用了自己研發的滑動譜分解(SSD, Sliding Spectrum Decomposition)模型,該方法可以捕捉用戶在瀏覽長項目序列時對多樣性的感知。與其他多元化推薦方式相比,SSD 將內容序列視爲用戶觀察到的時間序列,在整個序列中組合多個滑動窗口,以此對齊用戶在瀏覽時的感知。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/24\/5e\/2497e4f9435520cf1b930198e47c485e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"SSD模型中多窗口堆疊的內容張量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對中長尾內容相似性度量的問題,小紅書研發並採用了通過內容學習行爲的方法:CB2CF (Content Based to Collaborative Filtering)。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c3\/30\/c3c4b7b38fa9755b861316eca4572c30.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"ECB2CF模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CB2CF通過神經網絡,從內容本身出發去學習用戶協同過濾的交互數據,並依此判斷是否內容相似性。模型輸入上僅使用內容,這樣即使對於新內容和中長尾內容也能依賴模型的泛化能力得到較好的結果。模型目標上學習全體用戶的協同過濾的結果,使得模型能夠在統計上學習用戶感知的相似性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過線上 A\/B 測試,與 SOTA 的 DPP 模型相比,小紅書SSD和CB2CF模型下的用戶瀏覽時長提高0.42%、互動率提高0.81%,而ILAD(用戶瀏覽筆記之間的平均距離,即曝光多樣性)提升0.32%,MRT(用戶平均閱讀類目數,即消費多樣性)提升了0.68%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在張雷看來,多樣化內容分發從長遠角度考慮會是一種趨勢,但是否採用多樣化內容分發還要取決於企業具體的業務形態。有的業務需要打造爆款,有的需要多元化,不同的產品和業務對內容分發方式的偏好是不同的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"去年2月,小紅書美食類消費DAU一度超過美妝,成爲小紅書社區第一大垂直品類。在內容運營和多元化內容分發機制的共同作用下,其他中長尾內容數量也迅速增長。公開數據顯示,教育類同比增長400%,科技數碼類同比增長500%,體育賽事同比增長1140%,運動健身增長300%。過去一年,小紅書用戶全年筆記總體發佈量同比增長超150%。同時,截至2020年6月,小紅書月活躍用戶數已經過億。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"向多雲架構轉型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着業務規模的不斷增長,小紅書已經開啓了從單雲架構到多雲架構的轉型之路。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前小紅書對整體架構的目標有三點:第一,架構可以很好地支撐業務快速發展帶來的規模的持續擴張,比如能夠穩定支撐億級DAU的規模。第二,能夠做到較高的可靠性和可用性,這主要表現在跨地域容災能力和跨雲基礎設施的容災設計等方面。第三,架構必須是高效率的,這包括相對低廉的成本和較高的資源利用率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這三個目標也是小紅書做多雲架構轉型的動力。小紅書架構負責人高飛表示,多雲可以更加靈活的支撐更大的業務規模。不同的雲技術特點不同,小紅書可以根據不同雲廠商的特點部署不一樣的技術,如離線和在線的混布等。另外,多雲對資源的冗餘要求也更低一些,在容災上有一定的效率優勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“先進的架構和理念可以幫助一個起步較晚的企業實現彎道超車。”張雷表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據高飛透露,現在小紅書團隊基本用兩個月的時間就可以完成搜索、推薦等核心業務在另外一個雲上的驗證,同時小紅書很多機器學習模型已經至少在四家雲上進行訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,一旦擁抱多雲架構,很多技術挑戰也會接踵而至。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,多雲架構需要統一的資源管理。多雲上的資源管理需要做到像單雲一樣容易管理,否則很難統籌調配。其次,如何保證不同雲之間數據的及時同步和一致性也是問題,尤其那些對數據一致性要求較高的業務對此要求更加急迫。最後,多雲架構怎麼做好穩定性、高可用,做好不同雲之間的流量調度也是一個挑戰。此外,小紅書還有自己的要求:讓自己的技術棧做到雲獨立,即不綁定在特定的雲上,業務無論部署在哪朵雲上都可以跑得通。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面對這些技術挑戰,除了利用現有的開源技術外,小紅書也會進行自主研發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對資源管理問題,小紅書聯合華爲、工商銀行和中國一汽開發了Karmada開源項目。Karmada是一個Kubernetes管理系統,可跨多個Kubernetes羣集和雲來運行雲原生應用程序,而無需更改應用程序。對於數據一致性問題,小紅書會在數據存儲和緩存層基於分佈式一致性協議,結合不同的業務場景,進行自主架構設計和研發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲加強多雲架構穩定性,小紅書使用混沌技術定期進行故障演練,保證一個機房出現故障時可以快速切換到其他機房,同時對不是特別重要的服務進行降級處理。而對於雲獨立問題,由於要擺脫對單個雲廠商的依賴,一些PaaS能力必須自研。小紅書的KV存儲、控制面,甚至整個微服務架構都是自研。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"管理層面,小紅書由各個專業方向的架構師們組成了技術專業委員會,根據不同技術領域制定相應的技術規劃和規範,以此來提高迭代效率並保證產品質量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於未來,小紅書的目標很清晰:隨着業務的增長,多雲架構必將登上舞臺。真誠的、令人嚮往的多元化的內容的規劃一直沒變,更注重UGC中長尾內容分發的策略也沒變。技術團隊的主要任務就是努力完成這些目標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“一切都在進行中。”張雷說。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章