新一代Kaldi技術細節揭祕:K2是核心部分

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"2020年11月15日,由北京希爾貝殼科技有限公司、中國計算機學會語音對話與聽覺專業組、AISHELL基金會主辦,小米科技、崑山杜克大學、西北工業大學音頻語音與語言處理研究組、中國科學技術大學共同協辦的第五屆Kaldi技術交流會在北京舉辦。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"受疫情影響,本次技術交流會採用全天線上直播&下午(北京·小米科技園)線下技術交流會的形式。值得注意的是,本次線下交流活動,Kaldi之父Daniel Povey博士首次親臨現場,與來自北京各大互聯網公司、知名高校的開發者們深入交流下一代Kaldi社區未來的發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"加入小米一年,Daniel Povey設計並開發出了新一代Kaldi。新一代Kaldi分成三個部分,包括核心算法部分,訓練數據準備部分、示例腳本集合部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"其中,Lhotse(訓練數據準備部分)將替代以前Kaldi中所有數據準備相關的工作,操作各種音頻和文本的元數據。Lhotse除了Kaldi本身,也適用於其他應用。而且Lhotse純Python代碼,方便易用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Icefall(示例腳本集合部分)將代替Kaldi中的示例腳本集合,並獨立成爲一個單獨的子項目。之所以要把示例腳本集合與核心算法分開,是考慮到示例腳本可能會非常龐大,且經常變動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"新一代Kaldi的核心部分叫“k2”。k2可以讓開發者很容易在PyTorch\/TensorFlow中實現各種語音識別相關算法,比如CTC、LF—MMI、RNN—T、2nd—pass語言模型等,消除以往語音識別算法中訓練跟解碼不匹配的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"同時,通過k2可以非常容易實現(置信度逐漸提高的)多輪解碼過程,這在以往是很難做到的。相較於其他一些語音識別庫的優勢,k2速度更快,通用性強(可以用來建模多種語音識別算法)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"現場,據Daniel Povey博士透露,k2核心代碼已完成。約41000行代碼(主要是C++),本週剛發佈0.1版本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Daniel Povey博士目前擔任小米集團語音首席科學家,由他開發和維護Kaldi 集成了多種語音識別模型,公認是業界語音識別框架的基石。他在本次線下活動中強調:“今天有太多人依託Kaldi在做自己的事業,有很多人爲Kaldi社區一直在做貢獻,Kaldi會始終堅持開源。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"小米集團副總裁、技術委員會主席崔寶秋出席活動並指出,擁抱開源是小米工程文化的重要組成部分,我們要共同努力實現“四贏”:讓Kaldi項目和Daniel贏,小米語音贏,全球的Kaldi社區贏,所有跟Kaldi相關的創業公司贏。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"最後,Daniel Povey博士感謝了所有人對社區的貢獻,他歡迎更多中國和全球的工程師來社區貢獻代碼,共同爲Kaldi和全球語音行業的發展做貢獻。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章