開源TTS(Text To Speah)的選擇和使用

TTS是Text To Speech的縮寫,即“從文本到語音”,是人機對話的一部分,讓機器能夠說話。

TTS是語音合成應用的一種,它將文件內容或應用上的文字等,如應用菜單或者網頁,轉換成自然語音輸出。

TTS不僅能幫助有視覺障礙的人閱讀計算機上的信息,更能增加文本文檔的可讀性。

一、比較流行的開源TTS項目

以下信息來自:TTS open source project

MARY
-- Text-to-Speech System
MARY is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It supports German, British and American English, Telugu, Turkish, and Russian.
SpeakRight Framework
-- Helps to build Speech Recognition Applications
SpeakRight is an Java framework for writing speech recognition applications in VoiceXML. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Although VoiceXML uses a similar web architecture as HTML, the needs of a speech app are very different. SpeakRight lives in application code layer, typically in a servlet. The SpeakRight runtime dynamically generates VoiceXML pages, one per HTTP request.
Festival
-- Speech Synthesis System
Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. It offers full text to speech through a APIs via shell and though a Scheme command interpreter. It has native support for Apple OS. It supports English and Spanish languages.
FreeTTS
-- Speech Synthesizer in Java
FreeTTS is a speech synthesis system written entirely in the Java. It is based upon Flite, a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University. FreeTTS supports a subset of the JSAPI 1.0 java speech synthesis specification.
Festvox
-- Builds New Synthetic Voices
The Festvox project aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Festvox is the base for most of the Speech Synthesis libraries.
Kaldi
-- Speech Recognition Toolkit
Kaldi is a Speech recognition research toolkit. It is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend.
eSpeak
-- Text to Speech
eSpeak is a compact open source software speech synthesizer for English and other languages. eSpeak uses a formant synthesis method. This allows many languages to be provided in a small size. It supports SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. It can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Flite
-- Fast Run time Synthesis Engine
Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools.

 

二、開源項目的選擇

基於需求,選擇C/C++的開源項目,主要有以下三個:

(1)Festvial

它提供一個通用框架來建立語音合成系統,而且包含了多種模塊的示例。它提供了完整的從文本到語音的API。它原生支持Apple OS,支持英語和西班牙語。

(2)eSpeak

它是一個開源語音合成軟件,支持英語和其他多種語言。使用共振峯合成的方法。這就使得提供的很多語言文件很小。它位windows支持SAPI5版本,所以也能用於那些支持Windows SAPI5接口的屏幕閱讀和其他程序。它可以翻譯文本爲音速代碼,所以能用於另一種語音合成引擎的前端。

(3)Flite

Festival-lite版,是一種小型,反應快速的合成引擎,由CMU開發,主要設計用於小的嵌入式機器或大服務器。它是一種可代替Festival的語音合成引擎,使用FestVix語音建立工具套件來建立語音庫。

下面將對這三個項目的使用分別進行介紹。環境:NMware Workstation + Lubuntu-16.04.2 32位

三、開源TTS項目的使用(一)eSpeak

1、下載

espeak.sourceforge.net

espeak依賴portaudio進行播放,因此還要下載

http://www.portaudio.com/download.html

2、編譯

eSpeak編譯:

cd src
make
make install

protaudio編譯:

http://portaudio.com/docs/v19-doxydocs/compile_linux.html

編譯後生成在:lib/.libs/ 目錄下,爲其製作軟鏈接

ln -s lib/.libs/libportaudio.so.2.0.0 /usr/lib/libportaudio.so

3、使用

espeak "hello world" -w hello.wav

4、問題與解決

(1)編譯問題

複製代碼
g++  -o speak speak.o compiledict.o dictionary.o intonation.o readclause.o setlengths.o numbers.o synth_mbrola.o synthdata.o synthesize.o translate.o mbrowrap.o tr_languages.o voices.o wavegen.o phonemelist.o klatt.o sonic.o -lstdc++ -lportaudio -lpthread 
wavegen.o:在函數‘WavegenOpenSound() [clone .part.2]’中:
wavegen.cpp:(.text+0x23a):對‘Pa_StreamActive’未定義的引用
wavegen.o:在函數‘WavegenCloseSound()’中:
wavegen.cpp:(.text+0x552):對‘Pa_StreamActive’未定義的引用
collect2: error: ld returned 1 exit status
Makefile:105: recipe for target 'speak' failed
make: *** [speak] Error 1
複製代碼

(1)解決方法

cp portaudio19.h portaudio.h
make clean
make

5、應用舉例

複製代碼
#include "./speak_lib.h"  // espeak頭文件
#include <string.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    char word[] = "吃葡萄不吐葡萄皮";
    espeak_Initialize(AUDIO_OUTPUT_PLAYBACK, 0, NULL, 0);
    espeak_SetVoiceByName("zh+f2");
    espeak_Synth(word, strlen(word) + 1, 0, POS_CHARACTER, 0,
                    espeakCHARS_UTF8, NULL, NULL);
    sleep(3);
    espeak_Terminate();
}
複製代碼

如果需要將文字轉的wav語音文件保存下來,需要實現callback。如需具體代碼示例,可發私信。

 

四 、開源TTS項目的使用(二) Flite

1、下載

http://www.speech.cs.cmu.edu/flite/index.html

2、編譯

sodu su
./configure
make
make install

3、使用

複製代碼
flite -t hello 
    語音讀出“Hello world”
flite "hello world."
    語音讀出“Hello world”
flite hello
    語音讀出文件“hello”的內容
flite -f "hello world"
    語音讀出文件“hello world”的內容
複製代碼

4、問題

(1)問題

root@lubuntu:# flite "hello world"
oss_audio: failed to open audio device /dev/dsp

(1)解決

ls /dev/dsp  發現該目錄並不存在,搜索瞭解到flite使用oss框架進行語音播放。

root@lubuntu:# cat /proc/asound/version 
Advanced Linux Sound Architecture Driver Version k4.8.0-36-generic.

說明當前系統使用ALSA音頻驅動框架。嘗試:

方法一:
        安裝程序padsp,可以把對OSS的請求派發到ALSA
        apt install pulseaudio-utils
        padsp flite
        失敗!!!

方法二:
        sudo apt-get install pulseaudio
        sudo apt-get install libpulse-dev
        sudo apt-get install osspd
        成功!!!   能看到/dev/dsp目錄了,但是依然提示failed to open!!!

最後發現將vmware的聲卡設備連接上,就不報錯,能正常出聲了!( *¯ㅿ¯*)

發佈了13 篇原創文章 · 獲贊 21 · 訪問量 13萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章