亮相智源大會,字節跳動自研同傳系統的技術實現

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6月1日至6月3日,由北京智源人工智能研究院主辦的2021北京智源大會在北京中關村國家自主創新示範區會議中心成功召開,來自近80個國家數萬名人工智能領域專業人士參會。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本次大會共持續三天,設置了13場主旨報告/重磅對話,29場由各領域領軍學者主導的專題論壇,4場講習班。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2018年圖靈獎得主Yoshua Bengio,2017年圖靈獎得主David Patterson,馬克斯·普朗克生物控制論研究所所長Peter Dayan","attrs":{}},{"type":"text","text":"等業內權威人士作爲大會嘉賓發表演講,與大家共同探索人工智能的創新應用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/49/49cb065f8f565247b9b310c18d2c24a0.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大會期間,火山翻譯推出的產品 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"「火山同傳」","attrs":{}},{"type":"text","text":" 爲會議現場和線上直播提供 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"「低延時、高質量」","attrs":{}},{"type":"text","text":" 的同傳字幕,助力大會順利召開。要想實現高質量同傳,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"精準快速的語音識別能力和準確流暢的翻譯能力","attrs":{}},{"type":"text","text":"是關鍵因素——聽對聽懂,是成功還原原文的基礎;學得更多反應更快,才能給觀衆帶來更優的翻譯體驗。這種「所聽即所得」的美妙體驗是如何在技術上實現的呢?字節跳動的工程師們爲我們揭開了同傳技術的神祕面紗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/88/88477b47ac2cd8e21ec1febaf73076a7.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"字節跳動自研端到端語音識別系統","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"打造優質的同傳效果,需要精準的語音識別結果作爲基礎。傳統的語音識別系統將語音識別分解成多個子模塊分別建模(包括髮音字典,聲學模型和語言模型等),各模塊的構建需要較強的語音、語言學知識作爲指導;而端到端的語音識別系統將上述各子模塊融爲一體,直接根據訓練數據來學習語音到文字的映射關係,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"整個建模過程不需要語音、語言學知識的介入,在大數據上表現出了顯著的性能提升","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"Recurrent Neural Network Transducer (RNN-T) 模型","attrs":{}},{"type":"text","text":"是目前在語音識別領域應用最廣泛的端到端模型之一。我們可以將該模型看作是Connectionist Temperal Classification (CTC)模型的一個拓展。由於它去除了CTC每幀語音概率條件獨立性的假設,通過引入額外的自迴歸網絡,將每幀語音的概率和之前的輸入輸出關聯起來,因此對數據建模能力更強。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"字節跳動AI-LAB智能語音團隊對RNN-T模型做了較多嘗試和探索,通過數據的不斷積累、算法的持續優化,基於RNN-T模型的自研端到端語音識別系統獲得了顯著優於CTC模型的性能,被應用到了會議轉錄、同傳等多個業務中,都取得了較好的反饋","attrs":{}},{"type":"text","text":"。團隊針對具體業務場景和需求對模型進行多項改進和優化,使得模型具有很強的泛化能力。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/82/82626bc4fc5ebf9b7239f5b7e6ee1945.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖1 CTC和RNN-T的對比","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"模型魯棒性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"RNN-T模型對口音和複雜場景具有很強的魯棒性","attrs":{}},{"type":"text","text":"。一場會議的參會人員通常都來自全國各地,口音多種多樣,大家使用的麥克風設備、所處的環境也各不相同。在這種複雜場景中要準確識別出每個人的語音極具挑戰性。團隊在模型訓練中,採集了數十萬小時數據,涵蓋不同地域、不同年齡的說話人,包含遠場、噪聲、口音等豐富場景。在此基礎上,團隊結合了數據仿真算法,來進一步提升數據的多樣性,如加混響,加噪等方法,在梅爾譜特徵上採用了SpecAug的數據增強方法;團隊針對RNN-T模型特點設計了多級訓練流程,從而更高效地挖掘和吸收數據中的知識以加強模型泛化能力。最終模型在各種場景下都取得了優異的識別效果。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2b/2bbd44783938bcfb93e104d9a8752954.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖2 多種數據增強方法","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"中英文混合說","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"RNN-T模型支持中英混合說,即單個語音識別模型可以判斷出用戶使用的語言類別並識別出對應內容","attrs":{}},{"type":"text","text":"。如今,作爲大多數國人的第一外語,英語很自然地被穿插在日常的交流和對話當中。理解這種混合語言的表達對於人類來說相對容易,但對機器來說卻不簡單。要想準確地識別中英混場景,模型需同時具有語種識別和語音識別的能力。因此,針對該場景團隊做了特定的數據篩選,並通過語音合成的方式來擴充中英混合訓練數據;此外,在模型層面團隊對兩種語言構建了不同的建模單元,使得一個模型同時具備中英語種分類和語音識別的能力。最終僅使用一個模型便能夠支持中英混合說的場景,該模型在中/英單語言場景下和單語言語音識別系統效果相當。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"熱詞增強","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"RNN-T模型對特定詞語(熱詞)識別加強","attrs":{}},{"type":"text","text":"。一個通用的語音識別系統難免會存在badcase,即對某些特定詞彙容易識別出錯。比如參與會議的人名、公司名、產品名等等。這些詞語的識別錯誤不僅影響用戶閱讀體驗,且會影響到後續機器翻譯等下游任務效果。針對該場景,團隊研發了一套高效的熱詞解決方案,該方案僅基於熱詞文本數據,通過在解碼過程中對熱詞相關建模單元概率做干預,使得系統可以準確識別出參會人名、公司名、產品名等等。此外,該方案的效率很高,在毫秒級即可生效,對用戶使用不會產生任何影響。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"推理加速","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"RNN-T系統提供了更低的時延與更高的吞吐","attrs":{}},{"type":"text","text":"。相較於傳統CTC模型,RNN-T模型巧妙的將語言模型與聲學模型整合在一起,同時進行聯合優化,是一種理論上更加完美的模型結構。但是由於網絡參數量的暴增與更復雜的解碼策略,給RNN-T模型帶來了比CTC更大的延遲與更小的吞吐。針對RNN-T的這些特點,團隊使用了自研的高性能推理框架Panther,通過極致優化且定製化的算子實現,使得網絡推理速度獲得了2~3倍的提升;同時,團隊研發了一套高效的RNN-T解碼方案,通過減少時間複雜度和增大並行度的方法讓解碼速度獲得了數十倍的提升且效果基本相當。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"基於Transformer的神經網絡機器翻譯技術","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"準確流暢的翻譯是實現高質量同傳的另一個關鍵因素。如今,基於Transformer的神經網絡機器翻譯技術已廣泛應用於各大商業機器翻譯系統當中,它不同於傳統的統計機器翻譯模型需要很多子模塊來構建不同特徵進行翻譯(如調序模型、語言模型等),而是一個統一的端到端模型,接收源語言文本的輸入,直接翻譯輸出譯文。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"火山翻譯團隊自研了一套神經機器翻譯框架,並對機器翻譯技術進行了大量探索和改進","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f8/f80687b23c75a72299a4990111404ce7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖3 Transformer模型","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"海量數據挖掘和領域適應","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"神經網絡模型的訓練依賴極大規模的雙語語料。爲了提升學習效率,火山翻譯實現了一套語料清洗和挖掘系統,集成語言識別、詞語對齊、跨語言預訓練的語義匹配度模型等,可以從海量的互聯網數據中篩選出數億高質量的雙語語料,最終訓練得到的模型在日常聊天、新聞時政、財經、體育等領域均取得了業內領先的水平。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"同時,在2020年國際機器翻譯大賽(WMT2020)的語料清洗和翻譯評測任務中,火山翻譯團隊研發的系統也取得了七項第一的成績","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,本次智源大會涉及領域廣泛、用語要求精準、參會人員語言各異,這些都加大了機器翻譯的難度。爲了在本次智源大會AI方向的主題上獲得更好的同傳翻譯效果,火山翻譯團隊利用跨語言文本檢索技術,更加精細化地從語料庫中篩選出與本次大會主題最爲接近的句對,進而對模型進行領域適應優化。除了行業通用的領域模型微調方法之外,火山翻譯","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"首次在線上系統中使用了翻譯記憶模塊:機器翻譯模型在翻譯的過程中,會實時、動態決定是否參考翻譯記憶庫中相似的句子、詞組,最終通過融合得到更爲準確的譯文","attrs":{}},{"type":"text","text":"。如此一來,我們可以不用頻繁地更新模型,而是通過更新翻譯記憶從而實現更高效、便捷的領域定製。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/93/93f9df9a0eebd13ac6100a69ceb64e88.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖4 融合翻譯記憶模塊的神經機器翻譯系統示意圖","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"深層模型和預訓練","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"深層神經網絡模型可以學習到更好的模型表達能力","attrs":{}},{"type":"text","text":",目前標準的Transformer模型僅有6層。對此,火山翻譯團隊加深加寬了Transformer模型,用以從大規模語料庫中學習更好的翻譯能力。同時,團隊引入了","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"BERT和GPT等基於更大規模單語語料庫預訓練的模型用以初始化","attrs":{}},{"type":"text","text":",如此一來,既緩解了深層大模型的訓練壓力,加速收斂,又能將單語數據中學到的理解能力和生成能力輸送給翻譯模型,進而提升翻譯質量。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Lightseq推理加速","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"深層模型帶來了更好的表達能力,但會大大增加翻譯的延遲;不僅如此,火山翻譯還需要支持字節跳動衆多產品和業務,這就要求了團隊線上推理引擎能夠快速響應並且支持高併發。爲了滿足需求,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"火山翻譯推出自研的高性能序列推理引擎LightSeq,通過極致優化的CUDA算子融合和層級式解碼,讓推理速度達到行業領先水平,相比同類競品提速1.4倍左右,相比基於TensorFlow serving的方式更是提速10倍以上","attrs":{}},{"type":"text","text":"。此外,通過動態顯存複用技術,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"LightSeq可以在一張T4顯卡上同時部署最多8個推理模型","attrs":{}},{"type":"text","text":",大大提升了低頻或錯峯業務場景下顯卡的利用率。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"火山翻譯旗下的智能同傳產品 ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"「火山同傳」","attrs":{}},{"type":"text","text":" 的不俗表現,離不開強大的自研端到端語音識別系統和業界領先的神經網絡機器翻譯技術。此外,各種研發成果,如高性能序列推理引擎LightSeq等技術的加成也讓火山同傳能夠更好地爲不同領域和級別的活動服務。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,火山同傳已經賦能過多場線上線下的大型活動,如:村上隆首場中國直播、CTDC第四屆首席技術官領袖峯會暨年度技術頒獎盛典、第四屆MTPE大賽開幕式等,爲觀衆帶來優質的翻譯體驗。其背後的團隊火山翻譯擁有包括","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"火山同傳、火山翻譯API、火山翻譯Studio、VolctransGlass AR智能翻譯眼鏡、瀏覽器翻譯助手","attrs":{}},{"type":"text","text":"等在內的一系列矩陣產品,應用於辦公、娛樂、新聞等各類場景中,每天爲來自全球的過億用戶提供優質的翻譯服務。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着越來越多自研產品的誕生與投用,火山翻譯將會在更多領域進行技術深耕,希望能幫助更多用戶進行跨語言交流,繼續爲行業乃至整個社會發展貢獻一份力量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/Xyc0w88Ha2SIAdylgveUBA","title":"","type":null},"content":[{"type":"text","text":"亮相智源大會,字節跳動自研同傳系統的技術實現","attrs":{}}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章