開源TTS(Text To Speah)的選擇和使用

TTS是Text To Speech的縮寫，即“從文本到語音”，是人機對話的一部分，讓機器能夠說話。

TTS是語音合成應用的一種，它將文件內容或應用上的文字等，如應用菜單或者網頁，轉換成自然語音輸出。

TTS不僅能幫助有視覺障礙的人閱讀計算機上的信息，更能增加文本文檔的可讀性。

一、比較流行的開源TTS項目

以下信息來自：TTS open source project

MARY -- Text-to-Speech System	MARY is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It supports German, British and American English, Telugu, Turkish, and Russian.
SpeakRight Framework -- Helps to build Speech Recognition Applications	SpeakRight is an Java framework for writing speech recognition applications in VoiceXML. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Although VoiceXML uses a similar web architecture as HTML, the needs of a speech app are very different. SpeakRight lives in application code layer, typically in a servlet. The SpeakRight runtime dynamically generates VoiceXML pages, one per HTTP request.
Festival -- Speech Synthesis System	Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. It offers full text to speech through a APIs via shell and though a Scheme command interpreter. It has native support for Apple OS. It supports English and Spanish languages.
FreeTTS -- Speech Synthesizer in Java	FreeTTS is a speech synthesis system written entirely in the Java. It is based upon Flite, a small run-time speech synthesis engine developed at Carnegie Mellon University. Flite is derived from the Festival Speech Synthesis System from the University of Edinburgh and the FestVox project from Carnegie Mellon University. FreeTTS supports a subset of the JSAPI 1.0 java speech synthesis specification.
Festvox -- Builds New Synthetic Voices	The Festvox project aims to make the building of new synthetic voices more systemic and better documented, making it possible for anyone to build a new voice. Festvox is the base for most of the Speech Synthesis libraries.
Kaldi -- Speech Recognition Toolkit	Kaldi is a Speech recognition research toolkit. It is similar in aims and scope to HTK. The goal is to have modern and flexible code, written in C++, that is easy to modify and extend.
eSpeak -- Text to Speech	eSpeak is a compact open source software speech synthesizer for English and other languages. eSpeak uses a formant synthesis method. This allows many languages to be provided in a small size. It supports SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface. It can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Flite -- Fast Run time Synthesis Engine	Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools.

二、開源項目的選擇

基於需求，選擇C/C++的開源項目，主要有以下三個：

（1）Festvial

它提供一個通用框架來建立語音合成系統，而且包含了多種模塊的示例。它提供了完整的從文本到語音的API。它原生支持Apple OS，支持英語和西班牙語。

（2）eSpeak

它是一個開源語音合成軟件，支持英語和其他多種語言。使用共振峯合成的方法。這就使得提供的很多語言文件很小。它位windows支持SAPI5版本，所以也能用於那些支持Windows SAPI5接口的屏幕閱讀和其他程序。它可以翻譯文本爲音速代碼，所以能用於另一種語音合成引擎的前端。

（3）Flite

Festival-lite版，是一種小型，反應快速的合成引擎，由CMU開發，主要設計用於小的嵌入式機器或大服務器。它是一種可代替Festival的語音合成引擎，使用FestVix語音建立工具套件來建立語音庫。

下面將對這三個項目的使用分別進行介紹。環境：NMware Workstation + Lubuntu-16.04.2 32位

三、開源TTS項目的使用（一）eSpeak

1、下載

espeak.sourceforge.net

espeak依賴portaudio進行播放，因此還要下載

http://www.portaudio.com/download.html

2、編譯

eSpeak編譯：

cd src
make

make install

protaudio編譯：

http://portaudio.com/docs/v19-doxydocs/compile_linux.html

編譯後生成在：lib/.libs/ 目錄下，爲其製作軟鏈接

ln -s lib/.libs/libportaudio.so.2.0.0 /usr/lib/libportaudio.so

3、使用

espeak "hello world" -w hello.wav

4、問題與解決

（1）編譯問題

g++  -o speak speak.o compiledict.o dictionary.o intonation.o readclause.o setlengths.o numbers.o synth_mbrola.o synthdata.o synthesize.o translate.o mbrowrap.o tr_languages.o voices.o wavegen.o phonemelist.o klatt.o sonic.o -lstdc++ -lportaudio -lpthread 
wavegen.o：在函數‘WavegenOpenSound() [clone .part.2]’中：
wavegen.cpp:(.text+0x23a)：對‘Pa_StreamActive’未定義的引用
wavegen.o：在函數‘WavegenCloseSound()’中：
wavegen.cpp:(.text+0x552)：對‘Pa_StreamActive’未定義的引用
collect2: error: ld returned 1 exit status
Makefile:105: recipe for target 'speak' failed
make: *** [speak] Error 1

（1）解決方法

cp portaudio19.h portaudio.h
make clean
make

5、應用舉例

#include "./speak_lib.h"  // espeak頭文件
#include <string.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    char word[] = "吃葡萄不吐葡萄皮";
    espeak_Initialize(AUDIO_OUTPUT_PLAYBACK, 0, NULL, 0);
    espeak_SetVoiceByName("zh+f2");
    espeak_Synth(word, strlen(word) + 1, 0, POS_CHARACTER, 0,
                    espeakCHARS_UTF8, NULL, NULL);
    sleep(3);
    espeak_Terminate();
}

如果需要將文字轉的wav語音文件保存下來，需要實現callback。如需具體代碼示例，可發私信。

四、開源TTS項目的使用（二） Flite

1、下載

http://www.speech.cs.cmu.edu/flite/index.html

2、編譯

sodu su
./configure
make
make install

3、使用

flite -t hello

    語音讀出“Hello world”

flite "hello world."

    語音讀出“Hello world”
flite hello

    語音讀出文件“hello”的內容

flite -f "hello world"

    語音讀出文件“hello world”的內容

4、問題

（1）問題

root@lubuntu:# flite "hello world"

oss_audio: failed to open audio device /dev/dsp

（1）解決

ls /dev/dsp 發現該目錄並不存在，搜索瞭解到flite使用oss框架進行語音播放。

root@lubuntu:# cat /proc/asound/version 
Advanced Linux Sound Architecture Driver Version k4.8.0-36-generic.

說明當前系統使用ALSA音頻驅動框架。嘗試：

方法一：
        安裝程序padsp，可以把對OSS的請求派發到ALSA
        apt install pulseaudio-utils
        padsp flite
        失敗!!!

方法二：
        sudo apt-get install pulseaudio
        sudo apt-get install libpulse-dev
        sudo apt-get install osspd
        成功!!!   能看到/dev/dsp目錄了，但是依然提示failed to open!!!

最後發現將vmware的聲卡設備連接上，就不報錯，能正常出聲了！( *¯ㅿ¯*)

junixwu

發佈了13 篇原創文章 · 獲贊 21 · 訪問量 13萬+

私信關注

開源TTS(Text To Speah)的選擇和使用

一、比較流行的開源TTS項目

二、開源項目的選擇

三、開源TTS項目的使用（一）eSpeak

四、開源TTS項目的使用（二） Flite

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

JFFS2 文件系統的工作原理

linux tty設置詳解

EC20 TCP/IP AT指令

深入理解學習Git工作流（git-workflow-tutorial）

Linux-4.4-x86_64 內核配置選項簡介

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

開源TTS(Text To Speah)的選擇和使用

一、比較流行的開源TTS項目

二、開源項目的選擇

三、開源TTS項目的使用（一）eSpeak

四 、開源TTS項目的使用（二） Flite

四、開源TTS項目的使用（二） Flite