解碼AI多語種技術創新,跨語種溝通正成爲現實

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如今,語音已經成爲萬物互聯時代人機交互的關鍵入口,在智能家居、智能汽車、穿戴式設備等場景不可或缺。我們看到的各類便捷的智能語音應用,背後是語音識別、語義理解、語音合成等技術的創新發展。全球化背景下,AI 多語種智能語言技術在各行各業的應用越來越廣泛。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"科大訊飛作爲智能語音行業的執牛耳者,在多語種智能語言技術上不斷進行技術創新和應用落地實踐,迎接市場環境變化下的新挑戰。7 月 15 日,"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/Rm6GoMJzxaN16ZXKHgeZ","title":"xxx","type":null},"content":[{"type":"text","text":"科大訊飛"}]},{"type":"text","text":"在武漢的“訊飛樂享 A.I. 技術沙龍”專場,面向開發者,對科大訊飛在 AI+ 多語種智能語言技術上的研發、實踐、求索進行了全方位的解讀和分享。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"多語種語言技術面臨三大挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f2\/f2252b44dcbf10e141373cd038091a60.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"活動開場,訊飛 AI 研究院副院長方昕帶來了《科大訊飛多語種智能語言技術進展與應用落地》的主題分享。在他看來,當下無論是從社會剛需應用場景,還是“一帶一路”等國家戰略、信息安全等方面考慮,佈局多語種語音語言技術都顯得尤爲重要。但是,如今多語種語言技術仍然面臨三大挑戰:多語種語言分析研究和專家知識積累不足;多語種訓練數據稀缺,難以支撐大量語種系統研發;技術級聯誤差擴散和衆多系統批量構建難題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先是多語種語言分析研究和專家知識積累不足。多語種智能語言技術的實現,需要根據語言積累構建語種系統,不同語言之間差異很大,所以需要根據不同的語言特性單獨建模。世界上大概有上千種語言,使用人口最多的語言僅十多種,其他的小語種的語言分析積累不足,爲語種系統的構建帶來不小的困難。據方昕介紹,阿拉伯語是科大訊飛在研究多語種系統時面臨的最難的語種之一。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次是多語種訓練數據稀缺,難以支撐大量語種系統研發挑戰。目前,通用語音識別率達 98%,背後是大量訓練數據的支持,爲此科大訊飛付出了數千萬元的數據成本。然而一些小語種的數據訓練數據只有幾百小時,如果以中文的數據積累爲標準,難度可想而知。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後就是"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/53W9bg9A0pPh6foD-zVG","title":"xxx","type":null},"content":[{"type":"text","text":"技術級聯誤差擴散"}]},{"type":"text","text":"和衆多系統批量構建難題。以語音翻譯爲例,傳統的做法是先用語音識別成文本,再用機器翻譯成其他語言。在這個環節中,一旦語音識別出現錯誤,翻譯結果可能謬以千里。在構建多語言系統方面,據方昕介紹,構建 4 類技術系統,需要做 70 個語種的雲端和本地的系統,再應用到 N 個領域,背後的工作量和耗費都是海量級。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"科大訊飛的應戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對這三大挑戰,科大訊飛在數據、算法、平臺三個層面搭建了多語種智能語言技術創新技術框架,具體包括基於人機協同的多語種數據標註平臺,多語種端到端統一建模框架,無監督 \/ 弱監督模型訓練技術,語音 \/ 圖片翻譯多任務協同優化,以及多語種模型自動訓練及定製優化平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據層面,除了耗費高額成本自行採集,科大訊飛還與澳鵬、數據堂等二十多家業內主流數據公司達成合作,與北京外國語大學、上海外國語大學等多所重點外語院校深度合作,確保數據質量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,科大訊飛已經初步構建了一套多語種整體的多語言系統,包含了數十個語種的語音合成、語音識別、圖文識別、機器翻譯系統,提出的解決方案包括語音助手、智能家居、AI 字幕、內容審覈等,提供通用解決方案和重點領域的定製化解決方案服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以譯製行業爲例。在堅持文化自信和文娛行業大爆發的背景下,譯製領域迎來了“短平快”和“高精尖”的新業態。所謂短平快,指的是對翻譯內容要求不高、單個譯稿內容少、整體譯稿數量多和交稿時間緊。行業發展的新動能在於,需要能夠提高行業整體效能的產品解決方案,幫助不同團隊快速適應行業的快速發展,而 AI 技術很好地解決了這個問題,通過替代人工低效環節,釋放人員精力,聚焦翻譯質量,提高產能和服務質量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"武漢譯滿天下科技有限公司產品經理金煒龍表示,譯製領域的痛點有三:一是人工翻譯一部作品時需要來回切換軟件查詞,反覆完整觀看視頻,翻譯效率低;二是原視頻沒有原文字幕,譯員需要先看一遍視頻,手動抄錄對話字幕、對準字幕時間軸;三是對已有內嵌字幕的視頻進行字幕提取。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對這些痛點,以及自身用戶體量較大的現狀,譯滿天下基於訊飛的機器翻譯服務,開發出一鍵翻譯功能,實現快速進行字幕翻譯和字幕製作;基於"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/NyIUBZTeUdr4zOO7RYAU","title":"xxx","type":null},"content":[{"type":"text","text":"訊飛開放平臺"}]},{"type":"text","text":"的語言轉寫功能,一鍵完成人工填寫和打軸操作;基於訊飛開放平臺的印刷文字識別功能,加上自己的優化算法,解決字幕提取和翻譯難題。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"下一站,發力出海"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"成立至今 22 年,科大訊飛已經擁有語音及語言國家工程實驗室和認知智能國家重點實驗室,核心 AI 能力包括語音識別、語音喚醒、語義 NLU 等。下一站,科大訊飛將在出海上發力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"白鯨出海數據顯示,2019 年中國出海企業共 7415 家,遊戲、社交、短視頻直播、手機及硬件、電商等佔據了 80% 以上的份額。目前,科大訊飛在手機和硬件領域應用比較多,與出海勁頭強勢的華爲、小米等企業在手機和可穿戴設備等方面都有不少合作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6f\/6f7119b065d2d8c4d45863dbe1c0aea8.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訊飛 AI 多語種業務部總經理周傳福介紹道:“目前整個多語種落地還是瞄準國內出海的企業,先跟國內企業一起出海把語音技術真正落實到實際應用中去,下一步整個客戶會瞄準國際市場。”2019 年至今,科大訊飛出海主要瞄準谷歌和亞馬遜等,想要在重點領域超過他們,引領海外語音市場。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不過在出海過程中,多語種落地也遇到了不少困難:場景多,設備多和語種環境複雜。周傳福解釋:“場景特別多,有家庭場景、辦公場景、商場、高噪等各種應用場景;設備也特別多,像手機、車機、音箱大屏、家居家電等;在出海的過程我們可以不斷地對效果做優化,但海外語種的工作量就太大了,在多語種情況下如何佈局,是我們面臨的問題。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/56\/567bdde1d2a15fe26efe630f5ae51aaa.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"科大訊飛給出了系統性的規劃。語種分類方面,科大訊飛將語種分爲重點語種、主要語種和其他語種三類,計劃在未來 3 年實現全球數十個主流語種支持,並實現包括廣東話、四川話在內的多箇中文方言支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在語音助手解決方案方面,科大訊飛主要面向手機、音箱、大屏等需要助手功能的場景,通過海量數據訓練來優化重點垂類和打造最自然的識別效果。所謂垂類指的是手機語言助手調用聯繫人、天氣、音視頻等。除此之外,語音助手解決方案在海量數據的基礎上,支持語種混說和語音識別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 AI 字幕解決方案方面,科大訊飛針對短音視頻場景,推出短音視頻處理引擎,可將幾分鐘內的音頻文件,快速反饋結果,涵蓋時間戳、中英文雙字幕等功能,幫助視頻製作用戶解決字幕添加問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在內容審覈方面,科大訊飛給出的解決方案,通過獲取文本圖片和音視頻,基於本土環境,進行圖片識別和身份語音識別,識別出不合規的內容自動分類歸檔,確定不過關的直接下架。目前主要採取機器爲輔、人力爲主的方式,後續在和視頻直播或內容審覈合作伙伴的合作中,繼續優化方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在語音雲方面,基於 11 年的積累,科大訊飛目前在新加坡有部署,後續也會在歐洲部署,不僅可以提供基於公有云的服務,還能滿足私有化部署需求。"}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"算法層面開放,科大訊飛建設 AI 生態的野心"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訊飛近幾年一直重點發力“平臺 + 賽道”的戰略,爲了進行生態建設,訊飛開放平臺承載着訊飛在 AI 生態建設的野心。訊飛開放平臺產品總監孫力健表示:“訊飛開放平臺把基於科大訊飛對於語音技術和語義理解和 AIUI 的研究,形成接口的形式對外開放,把一些在垂直行業沉澱的場景化的解決方案開放給大家。”從 2010 年成立至今,訊飛開放平臺擁有 330 萬生態合作伙伴、433 項 AI 能力及方案、連接 31 億終端。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“我們之前做的很多事情是把訊飛的語音技術、圖像 AIUI、語義理解和翻譯放在開放平臺上給所有開發者和企業用,現在也會做一些偏平臺型的事情,我們願意把一些根本的算法和平臺的東西開放出來,讓很多算法的研究人員也加入到這裏來,讓他們把他們更好的東西放在我們這兒,得到更好地應用和推廣。”孫力健強調:“無論你是什麼角色或者有什麼需求,都可以聯繫我們,在我們開放平臺上一定會找到解決你現在需求的方案。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,訊飛開放平臺在招聘行業推出了全流程智能化招聘解決方案,在 AI 虛擬數字人上推出多語種環境下的實時展示。接下來,科大訊飛將在技術方向做投資孵化,希望通過生態建設連接頂尖 AI 能力和優質合作伙伴,推動整個行業向前更進一步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關聯閱讀:“訊飛樂享 A.I. 技術沙龍”成都專場:"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/article\/gcz68kyoTS2h8O4Oql9b","title":"xxx","type":null},"content":[{"type":"text","text":"AI 虛擬人多模態交互落地難題如何破解?我們在樂享 A.I. 技術沙龍成都站找到了答案"}]}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章