滴滴单通道语音分离与目标说话人提取和抑制技术进展

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"语音分离(Speech Separation),就是在一个有多个说话人同时说话的场景里,把不同说话人的声音分离出来。目标说话人提取(Target Speaker Extraction)则是根据给定的目标说话人信息,把混合语音当中属于目标说话人的声音抽取出来。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下图汇总了目前主流的语音分离和说话人提取技术在两个不同的数据集上的性能,一个是 WSJ0-2mix 纯净数据集,只有两个说话人同时说话,没有噪声和混响。WHAM是与之相对应的含噪数据集。可以看到,对于纯净数据集,近两年单通道分离技术在 SI-SDRi 指标上有明显的进步,图中已PSM方法为界,PSM之前的方法都是基于频域的语音分离技术,而PSM之后的绝大多数(除了deep CASA)都是基于时域的语音分离方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e6\/6e\/e6b8ab0d50f612f3c57b6a46fa41d06e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"噪声场景相对更贴近于真实的环境。目前,对于噪声场景下的分离技术性能的研究还不是特别完备,我们看到有一些在安静环境下表现比较好的方法,在噪声环境下性能下降比较明显,大多存在几个 dB 的落差。同时,与纯净数据集相比,噪声集合下各种方法的性能统计也不是很完备。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章