谷歌推出Translatotron 2,一種沒有深度僞造潛力的語音到語音直接翻譯神經模型

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"谷歌已經在"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzU1NDA4NjU2MA==&mid=2247488107&idx=1&sn=a48b079d763455b0db819ddbde299141&chksm=fbe9aba4cc9e22b2a57942fde7c8edf6a7605a31fc89652d6856fb04ff43bbf37be17e41d04b&scene=27#wechat_redirect","title":"xxx","type":null},"content":[{"type":"text","text":"人工智能領域"}]},{"type":"text","text":"努力了很長一段時間,併成功實現了一些驚人的成果,2019年發佈的直接語音翻譯系統Translatotron就是其中之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Translatotron是一種人工智能系統,能夠將一段語音直接翻譯成另一種語言。該系統可以創建原始語音的合成翻譯,保留說話者的原始音調音色,讓翻譯出來的語音聽起來就像是本人說的一樣。但與它的突出優勢伴隨而來的是一個顯著缺陷:該系統創建的語音還能換一種聲音,因此很容易被濫用。一個類似的例子就是圖像領域的"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/LYNYr4Ke-5LZFwSX5VRd","title":"xxx","type":null},"content":[{"type":"text","text":"deepfakes"}]},{"type":"text","text":",也就是深度僞造圖像。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/88\/c7\/88bb4c3489e704437ca8cfayya11b7c7.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"來源:"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2107.08661.pdf","title":"","type":null},"content":[{"type":"text","text":"https:\/\/arxiv.org\/pdf\/2107.08661.pdf"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"新系統:Translatotron2"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"谷歌現在聲稱,他們已經在Translatotron 2中給出瞭解決方案。這個新的AI系統解決了濫用問題,因爲它被限制爲保持源講話人的聲音特性不變。新系統通過減少不需要的僞像(如說話間的躊躇和長時間停頓)提高了"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzU1NDA4NjU2MA==&mid=2247501770&idx=2&sn=b6bfd9f508e41e7edee9a114c017b431&chksm=fbea7e05cc9df7131b4655f5be63851b96923a03ecfe137cd283e7d12bd726a70f8a85ae0aac&scene=27#wechat_redirect","title":"xxx","type":null},"content":[{"type":"text","text":"質量"}]},{"type":"text","text":"並讓聲音聽起來更加自然。不僅如此,這個新系統的性能也更出色,大大超過了第一代版本。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"新元素"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"人工智能研究人員在他們的論文中進一步提到了幾個新元素:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"源語音編碼器"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目標音素解碼器"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過一個注意力模塊連接的合成器"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有這些元素都是相輔相成的;編碼器和解碼器處理輸入系統的所有數據,然後注意力模塊研究每條信息在提供的數據中的相關性。這是一個系統過程,之後整個系統會生成輸出。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這一過程中,編碼器創建語音的數字表示,解碼器描述生成的翻譯語音的音素(這些音素是聲音的次級單位,使系統\/聽衆更容易將一段語音與來自任意語種的另一段語音區分開來)。之後合成器開始工作,從解碼器中獲取輸出以及隨後產生的上下文,來合成翻譯後的語音。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/0c\/bd\/0c7f6ddd6ddfb243051f26e18d2d34bd.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"來源:"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2107.08661.pdf","title":"","type":null},"content":[{"type":"text","text":"https:\/\/arxiv.org\/pdf\/2107.08661.pdf"}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"限制翻譯器的深度僞造能力"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於利於深度僞造方法來生成僞造語音的做法,研究人員的對策是在開發時限定系統只能保留原始說話者的聲音。爲此,研究人員從宏觀視角入手開發了一種方法,其不需依賴明確和給定的ID來識別說話者(Translatotron中使用的舊技術)。因此,谷歌的研究人員聲稱Translatotron更適合用來生成翻譯語音,因爲它能預防潛在的濫用風險。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"研究人員還聲稱,近年來語音轉換已成爲一種越來越流行的趨勢。機器語音質量的水平已經提升到了自動化驗證器通常無法分辨其是否來自人類、是否經過處理的程度。因此,這一領域的系統本身就應該避免任何形式的濫用,而新一代的Translatotron 2就聲稱自己能做到這一點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在媒體生成技術不斷改進的道路上,Translatotron 2是研究人員對抗深度僞造技術的一項突破,如果它能取得成功,則未來影響會相當可觀。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"論文:"},{"type":"link","attrs":{"href":"https:\/\/arxiv.org\/pdf\/2107.08661.pdf","title":"","type":null},"content":[{"type":"text","text":"https:\/\/arxiv.org\/pdf\/2107.08661.pdf"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"項目示例:"},{"type":"link","attrs":{"href":"https:\/\/google-research.github.io\/lingvo-lab\/translatotron2\/","title":"","type":null},"content":[{"type":"text","text":"https:\/\/google-research.github.io\/lingvo-lab\/translatotron2\/"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.marktechpost.com\/2021\/08\/07\/google-ai-introduces-translatotron-2-a-neural-direct-speech-to-speech-translation-model-without-the-deepfake-potential","title":"","type":null},"content":[{"type":"text","text":"https:\/\/www.marktechpost.com\/2021\/08\/07\/google-ai-introduces-translatotron-2-a-neural-direct-speech-to-speech-translation-model-without-the-deepfake-potential"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章