漫遊語音識別技術——帶你走進語音識別技術的世界

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 前有古人,後有小王,大家好,我是你們愛思考的小王學長,今天我來帶大家漫遊一下當下發展火熱的語音識別技術,通俗易懂、乾貨滿滿、一定要學到最後呦!","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 一看到語音識別,不知道大家有沒有想到智能語音交互助手,蘋果的“Siri”、華爲的“小E”、OPPO的“小歐”、小米的“小愛同學”,總有一款你接觸過,還有目前發展火熱的智能音箱“小度小度”、天貓精靈、微信的“語音轉文字功能”、“智能家電”、車聯網人機交互系統,這些都是依靠語音識別技術來實現的。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/25/2543cc08a40e6fd5bb17f59bfc91f73d.png","alt":null,"title":"應用場景","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 平時我們用的電腦大都是微軟的windows系列,其中的語音助手小娜更是被大家所熟知。那麼究竟什麼是語音識別技術呢?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、什麼是語音識別技術?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 語音識別是將人說出的話轉換爲文本的技術,也被稱爲","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"自動語音識別","attrs":{}},{"type":"text","text":"(Automatic Speech Recognition, ASR),簡單來說就是與機器進行交流,讓機器明白你說的話是什麼意思。用更爲廣義的概念就是把人類發出語音到計算機理解人類所說內容爲止的所有技術手段統稱爲語音識別。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 用專業術語來說,就是讓機器通過識別和理解過程把語音信號轉變爲相應的文本或命令的高技術。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在這裏可能會有人問","attrs":{}},{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"語音識別和自然語言處理","attrs":{}},{"type":"text","text":"(NLP)有什麼區別呢,語音識別是自然語言處理的一項比較基礎的分支範疇。很多情況下,你得先讓機器知道你在說什麼,才能進一步讓機器去理解和做出特定的反應。其他分支範疇有機器翻譯、搜索、摘要、問答等等。用一句話說就是語音識別技術是自然語言處理的一部分、一個分支。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 好了,我們接着漫遊語音識別技術,我們知道了語音識別的簡單概念,接下來簡單瞭解下語音識別的歷史。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、語音識別的歷史","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 語音識別自計算機誕生(","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"20世紀50年代","attrs":{}},{"type":"text","text":")以來,就一直是一個人類夢寐以求的技術。在以前的科幻電影中,人類就是用語音向計算機傳達指令的。在1968年上映的美國電影《2001太空漫遊》中,宇宙飛船上搭載的計算機HAL9000就是通過語音與乘務員交流的。而從1966年播放至今的美國電視劇《星際迷航》中,主人公只要用語音詢問計算機就可以得到準備探索的星球的數據。自計算機被髮明之後,人類就堅信通過語音來驅動計算機的時代終會到來。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 語音識別的研究正式開始於","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"20世紀60年代","attrs":{}},{"type":"text","text":",這時期人們曾嘗試提取語音的頻譜圖0與音素2之間的關聯規則。1970 年在大阪舉辦的世界博覽會上就展出過基於聲譜圖工作的打字機原型。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 進入","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"20世紀70年代","attrs":{}},{"type":"text","text":",人們研究出了動態規劃( Dyamic Pogramming,DP)匹配方法。該方法能夠將輸人語音與樣本語音的各自特徵,按時間軸進行伸縮、匹配。基於這個技術,人們成功地將包含少量單詞的短句的識別速度提高了一大截兒。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" 20世紀90年代","attrs":{}},{"type":"text","text":"以後,基於統計方法的語音識別成爲主流,市面上出現了面向普通用戶的計算機聽寫軟件,可以將輸人的語音轉換成文本輸出。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、語音識別的原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 從20世紀80年代開始,現在語音識別採用模式識別的基本框架,分爲數據準備、信號處理、特徵提取、模型訓練、測試應用這5個步驟,爲了方便大家理解,特意畫了流程圖,如圖所示:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f2/f2a0bb2378ae6c88efa6310847aee82e.jpeg","alt":null,"title":"語音識別處理流程","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 此圖是爲了方便大家理解語音識別的大致識別處理流程:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":" 第一步 聲音信號採集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 首先,我們需要進行語音信號採集,也就是俗話說的錄音,由我們手機裏或者電腦等電子設備裏所帶的麥克風、語音採集模塊把聲音存儲下來。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":" 第二步 聲音信號處理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 大家應該知道聲音實際上是一種波。常見的mp3、wmv等格式都是壓縮格式,必須轉成非壓縮的純波形文件來處理,比如Windows PCM文件,也就是俗稱的wav文件。wav文件裏存儲的除了一個文件頭以外,就是聲音波形的一個個點了。下圖是波形的一個示例:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/98/98d8724fc59efbb43770176cb407dffb.png","alt":null,"title":"聲音波形圖","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 信號處理分爲降噪處理和預處理兩部分,我們採集到的聲音數據裏包含大部分噪聲和無用的聲音頻段,先利用譜減法等降噪處理方法去噪,留得有用的聲音信號,簡單去噪對比圖如下所示:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c3/c31056677cc849c588abbd7f8f2f920d.png","alt":null,"title":"去噪前","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d6/d61d47dbfbe0762e9296d52cc914dee0.png","alt":null,"title":"去噪後","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 然後利用預加重等","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"預處理手段","attrs":{}},{"type":"text","text":"使得想識別出的語音信號特徵變得更加明顯。在預處理部分還有分幀加窗和端點檢測,目的是移除信號當中的直流偏置分量和一些低頻噪聲大家先明白是爲了方便下一步更準確的提取特徵參數就好,下一篇我會專門給大家講解相關專業術語的含義。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":" 第三步 特徵提取","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 特徵提取就是使用計算機提取聲音信號中屬於特徵性的信息的方法及過程。舉個例子,我說:“我喜歡你”,在語音識別過程中,會把文字變成編碼的形式,並以音節、音素等分開,把wo這個字識別出來,在音頻波紋中提取w和o就是相當於特徵提取。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/65/654ca4e73cb539f0a9115e97d9923ee9.jpeg","alt":null,"title":"連續語音識別框架圖","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":" 第四步 分類識別","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" 分類識別","attrs":{}},{"type":"text","text":"就是利用語音識別系統根據對輸入語音的限制加以分類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 從說話者與識別系統的相關性考慮可以將識別系統分爲3類:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" (1) 特定人語音識別系統:僅考慮對於專人的話音進行識別;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" (2) 非特定人語音系統:識別的語音與人無關,通常要用大量不同人的語音數據庫對識別系統進行學習;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" (3) 多人的識別系統:通常能識別一組人的語音,或者成爲特定組語音識別系統,該系統僅要求對要識別的那組人的語音進行訓練。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"語音識別技術主要分爲三大類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一類是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"模型匹配法","attrs":{}},{"type":"text","text":",包括矢量量化(VQ) 、動態時間規整(DTW)等;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二類是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"概率統計方法","attrs":{}},{"type":"text","text":",包括高斯混合模型(GMM) 、隱馬爾科夫模型(HMM)等;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三類是","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"辨別器分類方法","attrs":{}},{"type":"text","text":",如支持向量機(SVM) 、人工神經網絡(ANN)和深度神經網絡(DNN)等以及多種組合方法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在分類識別方法這塊,有傳統算法模型HMM等,也有當今發展火熱的深度學習、機器學習算法SVM等等,大家對算法感興趣的可以自己去搜索一下,也可以跟我留言,我會以通俗易懂的方式帶大家學習相關知識的哦!","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3a/3a0420a8218ac1865b4a8bc832eb3b78.png","alt":null,"title":"語音編碼解碼","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 最後,總結一下,語音識別其實就是一個先編碼後解碼的過程,信號處理和特徵提取就是編碼的過程。換句話說,就是一種基於語音特徵參數的模式識別,即通過學習,系統能夠把輸入的語音按一定模式進行分類,進而依據判定準則找出最佳匹配結果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  ","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、語音識別主要在線開發平臺","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1、科大訊飛語音","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2、百度語音","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3、Microsoft Speech API","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4、Google Speech API","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5、IBM viaVoice","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6、Nuance NVP","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、語音識別的學習乾貨","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}},{"type":"strong","attrs":{}}],"text":"書籍","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"《圖解語音識別》","attrs":{}},{"type":"text","text":"荒木雅弘 (作者) 陳舒揚 , 楊文剛 (譯者)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這本書對於小白特別友好,很基礎,以圖解的形式讓大家輕鬆入門。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"《解析深度學習:語音識別實踐》","attrs":{}},{"type":"text","text":",俞棟、鄧力著。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這本書算是中文寫的比較好的教程了,內容非常新,而且深度學習的篇幅很大,喜歡算法的同學推薦這本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"《Spoken Language Processing-A Guide to Theory, Algorithm and System Development》","attrs":{}},{"type":"text","text":",黃學東等著。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這本書基本上是ASR傳統方法的大全了,無論理論還是工程實踐都有相當大的篇幅。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}},{"type":"strong","attrs":{}}],"text":"教程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"學有餘力","attrs":{}},{"type":"text","text":"的同學可以學習以下教程:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://tts.speech.cs.cmu.edu/courses/11492/schedule.html","title":null,"type":null},"content":[{"type":"text","text":"http://tts.speech.cs.cmu.edu/courses/11492/schedule.html","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Speech Processing。CMU的這個教程主要包含ASR(Automatic Speech Recognition)、TTS(Text To Speech)和SDS(Spoken Dialog Systems)等三方面的內容。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://www.cs.cmu.edu/~awb/","title":null,"type":null},"content":[{"type":"text","text":"http://www.cs.cmu.edu/~awb/","attrs":{}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#555666","name":"user"}}],"text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"蘇格蘭計算機科學家,語音處理專家,他的主頁上有好多Speech、NLP方面的教程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http://www.inf.ed.ac.uk/teaching/courses/asr/index.html","title":null,"type":null},"content":[{"type":"text","text":"http://www.inf.ed.ac.uk/teaching/courses/asr/index.html","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Automatic Speech Recognition。這個課程至少從2012年就開始了,每年都有更新。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"作者介紹","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"王凱,計算機在讀碩士,兩年音視頻學習開發經驗,主攻音頻語音識別方向,對 NLP、深度學習、神經網絡、數學建模、音視頻編解碼技術有一定研究和實踐經驗。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章