獨家專訪抖音春晚互動總指揮:如何做到27天成功交付?| 頂尖技術團隊訪談

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"優秀的產品背後,必定有優秀的團隊做支撐。《頂尖技術團隊訪談錄》系列採訪以國內知名公司的 IT 技術團隊爲線索,展示他們的文化、思想與經驗。本期,InfoQ 走進抖音春晚紅包支持團隊,瞭解扛起 703 億紅包互動總數和累計 12.21 億春晚直播間觀看人次的抖音背後的技術平臺和研發團隊。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2020 年 9 月 24 日,中央廣播電視總檯宣佈拼多多成爲中央廣播電視總檯 2021 年《春節聯歡晚會》獨家紅包互動合作伙伴。2021 年初,一系列事件將這家公司推上輿論的風口浪尖,與此同時,春晚紅包出現史上首次交棒,抖音順利接棒,成爲 2021 年總檯春晚獨家紅包互動合作伙伴,此時距離 2021 年《春節聯歡晚會》的正式播出已不足一個月。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"“差不多 27 天吧,我們只有這麼多時間備戰春晚紅包,一直到臘月 28,我們都還在接需求,疫情其實也沒有完全結束,不確定的因素太多了。”"},{"type":"text","text":"抖音春晚紅包總指揮顧修銘對 InfoQ 如是說道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衆所周知,春晚紅包堪稱是互聯網公司的宕機血淚史,再強的系統在海內外超過 10 億人的觀看規模面前都會如履薄冰,即便扛過了除夕夜,也難抵用戶同時提現帶來的流量衝擊。與此同時,抖音背後的技術團隊還面臨着史上最短的準備週期,開局倉促的抖音最終是如何抗住 703 億紅包互動流量衝擊的?這是一個怎樣的基礎設施架構?本文,InfoQ 希望通過抖音春晚紅包總指揮顧修銘的講述還原備戰全過程。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"抖音春晚紅包:四地研發團隊協同作戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2021 年除夕剛過,抖音官方發佈捷報顯示:抖音春晚紅包互動總數高達 703 億,春晚直播間觀看人次累計 12.21 億,直播間實時在線最高人數 498.46 萬,新媒體行動總曝光量 813 億,用戶雲拜年視頻總播放量 506 億,雲拜年視頻總點贊量 62 億。字節跳動旗下的火山引擎爲抖音春晚互動提供了技術支持,保障活動順利完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一串數據的背後有着抖音北京會場、上海會場、深圳會場、杭州會場四地研發同學的努力,顧修銘在接受 InfoQ 採訪時表示,這些同學主要來自於抖音、火山引擎以及其他中臺部門,每天早上都會開站會,平時通過飛書等各種方式實時在線交流。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"回顧整個過程,顧修銘認爲,“倉促上陣”帶來了諸多困難,唯一的解決辦法來自日常的積累。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"“事實上,接手的時候我們都還不確定具體需求是什麼。對研發來說,確定的需求和時間節點遠優於不確定的需求和確定的時間節點。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這不是抖音首次與春晚合作,但作爲第一身份承接春晚紅包,卻是第一次。在接到通知之後,抖音研發團隊迅速盤了下自己當時的處境:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"需求不確定。事實上,抖音的產品團隊幾乎與研發團隊同時得知需要爲春晚紅包做準備,需求本身的變數又比較多,需要不停地迭代,很難短時間內給出最終確定版本,研發時間格外緊張。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"活動形式複雜。抖音這次共準備了 20 餘億元紅包,其中 12 億元紅包在除夕當晚發放,另外超過 10 億元紅包給抖音火山版 APP 和抖音極速版等 APP。此外,從小年夜 2 月 4 日起,打開抖音拍全家福視頻、賀歲照、集燈籠,完成任務即可搶紅包,最高還可獲得 8888 元錦鯉紅包,活動鏈路變長的同時出錯的可能性也增加了。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":3,"normalizeStart":3},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"抖音本身是一個短視頻爲主的應用,佔據的 CDN 資源較多,即使內部資源沒問題,客戶端也可能由於帶寬限制而出現問題。此外,抖音本身的日活、月活已經相當高了,抖音官方曾在 2020 年 9 月份表示日活已經突破 6 億,春晚紅包的放大效應會將這一數值逼近極限。與此同時,這也表明很多用戶其實已經預裝了該應用,除夕當晚同時打開會導致瞬時流量集中爆發。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":4,"normalizeStart":4},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"資金安全。涉及到大額的金錢發放時,就需要考慮風控問題,這也是非常重要的。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相比於前幾屆春晚紅包,疫情也爲本屆增添了幾分不確定性。“1 月份的時候,北京的疫情其實沒有完全結束,我們當時比較擔心大年三十那天因爲管控措施無法集中辦公,北京會場這邊一千多研發人員就需要通過遠程的方式進行整體協作,這本身也是一個風險。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在想清楚自己的處境之後,研發團隊開始想辦法逐個解決問題。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"27 天,基礎設施準備就緒"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在春晚紅包這件事情上,大廠們喫過的虧各有不同。說簡單也簡單,無非是基礎資源準備,技術架構優化,以及充分的壓測和演練,但面對龐大的流量洪峯和“挑剔”的用戶,一步錯則全盤皆輸,難度陡然升級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"“當務之急是確定需求,這樣研發團隊纔好做事情”,對於需求本身存在的不確定性和持續迭代,研發團隊最終決定在臘月 28 之後停止接單,所有需求在臘月 28 號之前完成。“這件事情達成共識之後,需求團隊很快就給了我們初版。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拿到需求之後,整個團隊迅速按照活動鏈路、核心業務、常規業務,以及基礎資源支持分成四組。其中,活動鏈路主要指的是春晚紅包一系列活動相關的鏈路,比如用戶打開抖音搶紅包等;核心業務鏈路主要指的是抖音的信息流、視頻發佈、搜索、直播、電商等場景,例如發視頻的鏈路是非常長的,後端需要處理帶寬資源、視頻審覈之類的;常規業務指的是抖音之外的其他產品,需要跟進資源分配情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"“事實上,技術層面的大部分能力得益於我們平常的技術積累,只不過在這一次春晚紅包活動中得到了大規模落地。”"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"離在線混部平臺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"“我們也看到了一些開發者對春晚紅包的討論,大家最關心的第一個問題就是資源夠不夠。但其實對我們來說還好,主要得益於我們的離在線混部能力。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"抖音是比較純粹的 PaaS 部署模式,由火山引擎雲原生團隊提供支持,擴縮容非常方便。離在線混合部署則可以更容易得發揮這種部署模式的好處,研發人員可以從整個平臺清楚看到抖音每個業務的資源佔用情況,對機器的把控力也比較強,假設分給某業務 100 臺機器而實際的 CPU 利用率不到 10%,系統可以自動快速篩選出閒置機器並分配給其他活動,甚至不需要業務主動發出擴容請求,這也可以極大降低成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,一旦出現需要立即響應的突發資源請求,也可以快速頂上,因爲離線任務是有優先級之分的,可以將中低優任務的資源先分給需要的業務,這個是非常靈活的。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"存儲能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"字節跳動經過多年發展內部已經具備了完善的存儲產品矩陣,比如圖存儲、對象存儲、KV 存儲以及傳統的關係型數據庫,這保證了春晚紅包活動可以根本不同的數據類型選擇不同的存儲方式,且多套存儲產品之間互爲災備,一個掛了可以快速切換到另一個,切換的過程中如何保證數據的一致性還是非常重要的,這需要靠平時技術能力的積累。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,字節跳動本身的存儲量級也比較大,KV 存儲的量級可以達到幾十億 QPS,規模小的業務也可以達到上千萬 QPS,所以可以在春晚紅包活動中自然而然的頂住超大量級的存儲流量。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"流量接入能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於短視頻應用來說,日常主要是將視頻分發到 CDN 上,即把視頻文件發送到離用戶最近的地方,利用 CDN 節點分擔用戶觀看流量,但這種方式只適合靜態資源,春晚紅包依靠主持人口播搶紅包的時間節點,顯然沒辦法是靜態的,其中是有計算邏輯的,這就需要一套完善的流量接入調度體系。顧修銘表示,抖音在客戶端利用火山引擎雲原生能力,對流量和網絡系統進行調度,擁有良好的邊緣計算能力,可以保證多層調度,從而很好地解決如下問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"流量瞬時爆發的問題。用戶打開應用的瞬間會產生很多請求,每個業務都希望可以在啓動階段做初始化,就會導致啓動階段佔據的帶寬較多,解耦成本也比較高,如果打開應用都有問題,用戶肯定不會參與後面的紅包雨活動了,多層調度可以解決這個問題;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":2,"normalizeStart":2},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"多層調度意味着具備很好的容災能力,抖音的邊緣機房數量有數百個,即使某個機房出現問題,也可以做到良好的調度。此外,邊緣機房的數量也在一定程度上反映了接入點的數量,這可以保證處在弱網環境下的用戶也可以有很好的體驗和接入質量。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與此同時,要保證系統不雪崩,也就是一個系統損壞不會造成大規模癱瘓,還需要做好服務治理和流量隔離。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"全天候壓測、十餘次預演"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"“開始準備的第 8、9 天之後,我們的壓測系統就開始全天候開着”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之所以讓壓測系統一直開着,主要是因爲需求在不斷迭代,一旦上線新的需求或者技術本身進行了迭代優化都需要重新進行壓測。顧修銘表示,壓測是分流量、分階段和分場景進行的,初期以分段壓測爲主,比如只壓存儲、只壓業務,後期開始進行全鏈路壓測。此外,很多功能都需要在壓測的情況下進行驗證,比如容災能力、資金安全等,團隊需要模擬峯值流量下的資金分配、容災情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在抖音內部,前後共進行了十餘次春晚紅包活動預演,主要分爲活動鏈路(爲春晚紅包設置的各類活動)和常規業務兩部分,由於常規業務本身沒有過多迭代,一開始便進入壓測範圍,然後逐步增加流量,比如 130%、150%、200%...... 真正開始將活動鏈路和常規業務結合起來預演是從 2 月 2 日開始的,這也是第一次完全模擬除夕當晚做的預演。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"動態引擎框架 Lynx"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Lynx 是一套客戶端動態引擎框架,類似於 React Native ,可以做到即使客戶端發佈之後依然可以很好地進行動態迭代,這一點非常重要,因爲客戶端發版是需要週期的,而抖音春晚紅包團隊只有 20 多天的準備時間,按照客戶端的發版節奏走肯定是來不及的,Lynx 起到了很大的作用,這也是抖音 2019 年參與春晚紅包後得到的收穫之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2019 年,抖音以第二身份參與了春晚紅包活動,當年採用的全部是 Native 的方式,導致需求變動成本非常高昂,於是當年的覆盤文檔中就出現了搭建動態引擎框架的需求,整個團隊迅速完成了立項、調研及研發的工作,今年這套框架第一次大規模應用落地,效果頗好。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"活動覆盤"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"顧修銘認爲,從用戶側來看,沒有感受到明顯的卡頓或者大致流暢就可以了。但對於抖音春晚紅包戰隊而言,需要考察業務是否達到預期,以紅包雨爲例,發出去的紅包數量和實際領取金額、發送成功率、穩定性等都是業務指標,任何一個指標出現異常都是事故。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面對如此重大的項目,顧修銘表示只靠臨時抱佛腳是不可能的,需要技術團隊本身有相當豐厚的技術積累,這一點其實也是業界普遍認可的,要想春晚發紅包,產品日活先過億。原因很簡單,用戶量過低,技術很難支撐起春晚級別的高併發流量。“這就跟考試一樣,平常不努力,只靠臨時抱佛腳是不可能取得好成績的”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次是企業文化,像這麼大型的活動很難找到一個人把方方面面的事情全部考慮到,這麼複雜的業務需要很多人在一起協作,但每個人都是獨立個體,沒辦法做到從上到下像複製粘貼一樣想法完全一樣,這很依賴團隊的戰鬥力和凝聚力,只要每個人將自己負責的工作做到極致就可以保證春晚紅包活動的順利執行,“我們內部有一個非常長的操作手冊,規定了大家協同時按照何種格式填寫內容,可以看到大家考慮到了每一個可能的故障或者異常情況,互相彌補而不是互相推諉,這是極爲重要的。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"應受訪者要求,文中的顧修銘爲化名"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"相關推薦:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/minibook\/FvOLLZMfjLWyPvrZQkq9","title":"xxx","type":null},"content":[{"type":"text","text":"中國頂尖技術團隊訪談錄(2021年第一季)"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章