科大訊飛在AI源頭技術上的突破,實現系統性創新

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"科大訊飛如何在人工智能的技術層面進行源頭技術突破和多技術融合,以此來推動實現系統性創新。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在今年的1024開發者節活動上,科大訊飛高級副總裁胡國平在分享中提到,系統性創新有三個關鍵要素,第一是重大系統性命題到科學問題的轉化能力。第二是從單點的核心技術效果上要取得突破,跨過應用門檻。第三是把創新鏈條上各個關鍵技術深度融合,實現真正意義上的系統性創新。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"隨後,他圍繞端到端建模、無監督訓練、多模態融合和深度學習框架等技術研究,展開介紹這些技術是如何應用於語音識別、翻譯等業務上的,以及取得了哪些效果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"以下是胡國平老師的演講整理:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/05\/05b3ff31cbecf474acdf9c1f864fe61b.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"從技術角度上講,處在人工智能大時代,在深度學習框架下,訊飛研發了包括語音識別、語音合成、機器翻譯手寫識別等一系列的人工智能的技術和應用,當然,技術的生長和發展需要強大的源頭技術的突破和滋養。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在過去的幾年,科大訊飛一直關注並持續在4個 AI 源頭和底層技術上投入,分別是"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"端到端的建模"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",以此來解決分段建模式的信息損失問題。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"無監督訓練"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",來實現用更少的有監督數據獲得更好的效果。在"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"多模態融合"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"上的研究,能夠充分的利用多維多元的信息。以及"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"外部知識的融合"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",即如何把人類的常識、知識融入到算法模型的構建中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#1c7231","name":"user"}},{"type":"strong"}],"text":"一、端到端的建模技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在深度學習框架下,端到端建模可以有效的緩解分段建模所帶來的信息損失,以及錯誤的級聯傳播問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/db\/dbcc520195cf3e3e38b2017a7453296f.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"首先,科大訊飛把端到端的建模技術應用到了複雜場景下的語音識別,構建了前後端一體化的語音識別系統。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"做法是將前端麥克風陣列所產生的多路信號同步輸入到後端的聲學模塊,讓聲學模塊可以更充分的得到相關的信息,實現更精細的建模,從而可以顯著的提高複雜場景下的語音識別效果。更關鍵的是在這樣統一的建模框架之下,可以使得後端的聲學模塊的解碼結果,如端點、文本信息等反向指導前端麥克風陣列的波數形成,進一步提升降噪的效果。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"以商場賣場這種極其嘈雜場景下的語音交互爲例,科大訊飛的語音識別效果由原來幾乎不可用的35%,提升到了88%,而且語音喚醒的成功率也從原來40%提升到了90%,首次實現了極複雜場景下的語音識別的成功應用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另外,端到端建模的技術成功運用到了語音的翻譯任務上,"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"實現了基於建立的 CATT 語音翻譯技術,把一個語種的語音輸入自動識別並翻譯成另外一個語種的文本輸出,實現了語音識別和機器翻譯任務的統一建模,"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"緩解了語音識別錯誤對翻譯效果的影響。更關鍵的是在這樣一個統一建模的框架下,可以很方便的引入翻譯延時損失函數,實現翻譯效果和延時的有效平衡,實現更好的語音翻譯體驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"基於這樣的創新技術,訊飛在今年的 IWSLT 國際口語機器翻譯大賽上包攬了所有三個賽道的冠軍。此外,端到端的建模技術也有效支撐了科大訊飛的語音合成、作文評分、說話人分離等相關技術的持續進步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#1c7231","name":"user"}},{"type":"strong"}],"text":"二、無監督訓練"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這裏麪包括訊飛團隊在弱監督半監督等一系列的創新。衆所周知,AI 技術的前進,需要突破對大規模無監督訓練數據的依賴,這樣的依賴也成了 AI 進程中的關鍵瓶頸。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"訊飛在無監督的源頭之上,也實現了兩個方面的關鍵突破。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"一是基於海量原始數據,無監督的自動挖掘出更緊緻的特徵表示。另一方面是,充分藉助其他的弱標註,將弱監督數據更好的實現模型的優化訓練。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/af\/af773a2389a20f2747d13ecd1773f0a7.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"無監督訓練在語音合成上的應用,爲了降低對語音合成音庫規模的要求,研發團隊提出了聽感量化的編碼方法,充分藉助了語音識別數據,用其他人的語音合成音庫來實現多人的混合模型的訓練,也就是說,只需少量發音人的數據,就可以實現發音人的高音質語音合成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"今年,訊飛提出了全屬性可控語音合成方法,實現了從海量語音數據中無監督的學習發音內容、情感和音色這三個屬性,並且使用信息約束訓練,使得三個屬性相互結合,這樣就可以實現對音色情感等屬性的自由控制。例如可以通過對音色空間的調節,實現音色甜美程度的自由控制。同樣也可以進行情緒的調解,在悲傷情緒和高興情緒之間任意切換。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f6\/f65456c5f5dd18ed5e26b59e913baa24.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"同樣,無監督訓練模型也運用到了語音識別方向。對於小語種的語音識別而言,有監督的語音訓練數據的獲取其實是非常困難的,但小語種文本數據的獲取是相對容易的。所以爲了實現對海量文本數據的充分利用,胡國平團隊提出了基於語音和文本統一空間表達的半監督語音識別技術。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"首先,用抗噪性能好,風格更加豐富的 Follow CPC 語音合成系統,生成了大量的仿真語音,並與真實的語音訓練數據一起實現了語音識別模型的構建。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"其次,使用掩碼語言模型(Masked Language Model,MLM)的任務與語音識別任務的聯合訓練,並且共享語音和文本的解碼模塊,使得它們的表達可以在同樣的一個特徵空間,可以充分利用海量的無監督文本數據,最終可以實現 100 個小時的有監督數據,加上大量的無標籤文本數據,就可以達到1萬小時的有監督語音數據的效果。這是一個關鍵技術的突破,也是全新的進展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"無監督訓練的另一個重要應用領域,即基於弱監督的句子級的語義表達,這對於認知智能等相關任務的構建是非常重要的,利用大規模弱監督的數據來解決這一難題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"首先,從互聯網上挖掘了大量相關聯的問答,並且引入包括句法分析,知識增強以及回譯模型來構建困難訓練樣本,讓模型具有更好的語義區分性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"然後,胡國平團隊提出了深層互動對比學習框架,大大提升了語義理解的深度,從而能夠有效的區分在文本上“雖然很像,但意義相反”的句子,比如我喜歡你和我不喜歡你,最終在混淆的分類層的建立上,識別準確率從原來65%提升到了85%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#1c7231","name":"user"}},{"type":"strong"}],"text":"三、多模態融合"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"人機交互就是多模態融合的典型應用場景。訊飛在實現的基於多模態免喚醒的交互系統中,通過語音和口脣檢測兩個維度的信息來控制誤喚醒。同時用語義理解語義視線檢測的信息,來區分用戶是在進行人機交互,還是在與其他人聊天,最終能將誤喚醒率控制在0.01%以下,使得交互響應的成功率從原來88%提升到93%,而且是從原來依賴喚醒式的語音交互升級到更自然的免喚醒的交互,可以說訊飛的研究重新定義了人機交互的方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f7\/f7c9959fcd85626edee8a2c8e8428e8e.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"另一個多模態融合的例子是複雜文檔的結構化,比如考試中的一份試卷,具有各種各樣的題目、表格、插圖,以及學生手寫的答題信息等。訊飛的OCR技術,包括手寫數學公式識別等都已取得進展,但如何對這樣的複雜文檔,複雜版面進行自動的語義結構化,對於智能閱卷的相關應用是非常重要的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"訊飛也是基於多模態信息融合技術,不僅用到了題目中的相關語義信息,還用到了各種版面的特徵,例如表示質地大小的視覺特徵,表示縮進居中的空間特徵等等,最終實現了不同場景下文檔結構化的精度大幅提升。類似於教育這樣一個場景下的教輔作業的語義結構化的精度,從原來92%提升到了98%。訊飛已將這些技術開放,幫助開發者在自己不同的行業領域開展智能化的文檔處理,提供更好的技術支撐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#1c7231","name":"user"}},{"type":"strong"}],"text":"四、將外部知識融入到現有的深度學習框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"現在大部分的深度學習的模型都是基於有監督,或大量無監督的數據訓練出來的。但是從智能系統角度來說,人類的知識其實是一個非常重要的信息來源,在這方面訊飛也做了兩個關鍵的技術突破。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第一是在語音交互任務中,把人類的常識、知識總結爲事理圖譜,融入到整個交互系統中,從而實現機器能夠與人主動交互。以兒童交互場景爲例,機器人依據事理圖譜的相關推理,實現對兒童更長時間更好玩的陪伴。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"外部知識融入的第二個成功案例,即融入海量醫學文獻知識和病例的診療推理系統,傳統的診療推理技術是基於電子病歷的語義深層的解析和建模來實現的,如何使用海量的人類已有的醫學文獻知識,提高機器的自動診療的準確率也是非常關鍵。訊飛已經把各種醫學文獻知識進行了結構化處理,形成了醫學的知識圖譜,並且使用圖神經網絡對於圖譜進行編碼,這樣就可以基於深度實時的推理網絡,從知識和病歷兩個視角進行注意力的交互學習,綜合決策給出最終的治療結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"單點技術的突破是人工智能技術落地的前提,但在使用的系統,人工智能系統往往是一個複雜的系統,從單點技術的創新,單點技術深度融合的複雜系統演進,需要攻克以下三個方面的系統層面的技術挑戰:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第一,面向全局目標的技術架構的解析能力;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第二,全鏈條貫穿的多技術融合的這樣一個創新能力;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第三,基於人機耦合複雜系統的自進化能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"胡國平以三個具體的用複雜系統構建的案例,介紹了科大訊飛在多技術融合方面的關鍵進展和突破。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"首先,以低延時下的多技術融合的語音同傳系統爲例來解釋一下面向全局目標的技術架構能力。語音同傳有個關鍵的技術挑戰,關鍵的全局性約束,那就是時間延時問題,如前所述,訊飛已經基於模型,端到端的實現了從語音到文本的自動翻譯,但要實現同聲傳譯,還需要把傳統句子級別的語音合成系統改造爲流式的語音合成,以此實現對實時輸入的片段文本進行合成。同時還需要實現基於一句話的語音合成系統的音色遷移,使得合成的語音人能夠保持原始說話人的音色,實現更好的同傳體驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f8\/f8f2ac1be6a49ba4ee295d6056b2eafd.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"與此同時,爲了提高類似於大會演講上語音識別和翻譯效果,訊飛還進一步的把大會演講PPT中的文字全部OCR出來,特別是專業術語,實時送入語音識別系統進行實時優化,這樣纔可以在儘可能的保證翻譯效果的同時又能實現低延時的同傳翻譯。目前訊飛最新的翻譯系統平均延時已經從原來的8秒下降到4秒。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第二個案例是多模態虛擬人交互系統,虛擬人的交互需要集成語音識別、對話理解、對話生成、語音合成、虛擬人形象生成等多項人工智能單點技術。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"訊飛需要實現全面和技術貫穿,才能實現更一致、更和諧的虛擬人交互系統。例如以情感維度而言,要實現繼續多模態信息的情緒感知,基於情緒的回覆對話文本生成,以及可展現對應情緒的虛擬人的表情和語音合成。只有基於全局的系統性的規劃設計,以及全鏈條單點技術的有效配合,才能成功的造就一個有情感、有個性的多模態虛擬的人。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f3\/f3613aa7d5304caff78cba66be179883.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"其實訊飛做出來的所有系統都是給人用的,都是人機耦合的複雜系統,人作爲用戶,特別是像醫療教育領域的這種專家型用戶,人和機器的交互時機的選擇,交互界面的設計,也是人機耦合複雜系統的關鍵創新所在。同時,如何利用人機交互過程中所留下來的信息,實現系統的自淨化能力,也是系統性創新的核心所在。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"以科大訊飛所研發的智醫助理系統爲例,智醫助理系統在基層醫生的診斷過程中就直接提供包括診斷建議,合理用藥,進一步問診問題等核心功能,幫助基層醫生實現更好的診療。同時當現場的基層醫生和機器診斷結果不一致的時候,系統還會將病歷轉移到上級的醫院進一步診斷。系統也會持續的收集基層醫生和專家醫生在整個交互過程中的反饋信息,用於系統的實時淨化。可以說 AI 系統和醫生實現了相互啓發,相得益彰,共同進步,實現了這樣一個人機耦合複雜系統的持續進化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#1c7231","name":"user"}},{"type":"strong"}],"text":"【活動推薦】"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"以上是胡國平老師的演講內容,如果您對科大訊飛的技術很感興趣,可以關注11月26日北京AICon人工智能大會上科大訊飛研究院研究主管王寶鑫老師的演講,他會介紹NLP在中文自動校正的技術實踐。此外,NLP相關的話題,還有華爲、微軟、百度的專家來分享,大家可以查看會議官網 "},{"type":"text","text":"https:\/\/aicon.infoq.cn\/2021\/beijing\/schedule"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c7\/c7854a545a6f42d7e67650eb108c332b.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章