android系統tts TextToSpeech源碼原理解析及定製tts引擎

TextToSpeech 即文字轉語音服務，是Android系統提供的原生接口服務，原生的tts引擎應用通過檢測系統語言，用戶可以下載對應語言的資源文件，達到播報指定語音的文字的能力。但是一切都是在google service的環境下的，在國內使用的Android設備中谷歌服務都是禁用的，而國內最主要的也是需要中文的文字播報能力，那如何實現呢。

TextToSpeech源碼解析

如何查看系統源碼，請查看我之前的文章：{
如何查看Android系統源碼
https://blog.csdn.net/caizehui/article/details/103823057}
首先，我習慣讀一下類註釋，這裏講的主要是TextToSpeech可以將文本轉語音播放或者生成音頻文件，且功能必須在初始化完成之後，而這個初始化接口就是TextToSpeech.OnInitListener，當你使用完成TextToSpeech實例，記得shutdown去釋放引擎使用的native資源

/**
 *
 * Synthesizes speech from text for immediate playback or to create a sound file.
 * <p>A TextToSpeech instance can only be used to synthesize text once it has completed its
 * initialization. Implement the {@link TextToSpeech.OnInitListener} to be
 * notified of the completion of the initialization.<br>
 * When you are done using the TextToSpeech instance, call the {@link #shutdown()} method
 * to release the native resources used by the TextToSpeech engine.
 */
public class TextToSpeech {

然後我們看下這個初始化回調接口，可以看到onInit的status參數返回Success時表示初始化成功，任何事都是需要在這之後才能去調用，比如設置參數，或者調用播放接口等，否則是不管用的。
這裏要學習下谷歌的註釋方法，把參數的所有狀態也能列出來，很清晰。

 /**
     * Interface definition of a callback to be invoked indicating the completion of the
     * TextToSpeech engine initialization.
     */
    public interface OnInitListener {
        /**
         * Called to signal the completion of the TextToSpeech engine initialization.
         *
         * @param status {@link TextToSpeech#SUCCESS} or {@link TextToSpeech#ERROR}.
         */
        void onInit(int status);
    }

繼續往下分析的話，首先我們先附上一個TextToSpeech的使用demo程序片段。

TextToSpeech使用示例

 ........我代表省略..........
 textToSpeech = new TextToSpeech(this, this); // 參數Context,TextToSpeech.OnInitListener
    }
    /**
     * 初始化TextToSpeech引擎
     * status:SUCCESS或ERROR
     * setLanguage設置語言
     */
    @Override
    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            int result = textToSpeech.setLanguage(Locale.CHINA);
            if (result == TextToSpeech.LANG_MISSING_DATA
                    || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Toast.makeText(this, "數據丟失或不支持", Toast.LENGTH_SHORT).show();
            }
        }
    }
    @Override
    public void onClick(View v) {
        if (textToSpeech != null && !textToSpeech.isSpeaking()) {
            textToSpeech.setPitch(0.0f);// 設置音調
            textToSpeech.speak(“我是要播放的文字”,
                    TextToSpeech.QUEUE_FLUSH, null);
        }
    }
    @Override
    protected void onStop() {
        super.onStop();
        textToSpeech.stop(); // 停止tts
        textToSpeech.shutdown(); // 關閉，釋放資源
    }

有這個demo的例子在這裏，我們便對TextToSpeech的使用有了基本的瞭解。然後，我們分析源碼便以這個demo的使用調用過程來分析。
首先，當然是新建TextToSpeech對象，我們要看其結構體。然後我們找到了三個，但是對我們用戶可見的只有前兩個，最後一個是系統內部使用的構造方法。前兩個構造方法的區別就是，前者使用系統默認的TTS引擎，後者可以指定包名爲String engine名字的TTS引擎。

public TextToSpeech(Context context, OnInitListener listener) {
        this(context, listener, null);
    }
    public TextToSpeech(Context context, OnInitListener listener, String engine) {
        this(context, listener, engine, null, true);
    }
        public TextToSpeech(Context context, OnInitListener listener, String engine,
            String packageName, boolean useFallback) {
        mContext = context;
        mInitListener = listener;
        mRequestedEngine = engine;
        mUseFallback = useFallback;

        mEarcons = new HashMap<String, Uri>();
        mUtterances = new HashMap<CharSequence, Uri>();
        mUtteranceProgressListener = null;

        mEnginesHelper = new TtsEngines(mContext);
        initTts();
    }

當然，給我們用的都是空實現，實際幹活的還是內部的構造函數。然後重要的函數就是initTts方法。
initTts 是TextToSpeech中很重要的函數，揭示了系統如何選取Tts引擎並連接的過程。代碼雖然較長點，但是不得不列在這。

private int initTts() {
        // Step 1: Try connecting to the engine that was requested.
        if (mRequestedEngine != null) {
            if (mEnginesHelper.isEngineInstalled(mRequestedEngine)) {
                if (connectToEngine(mRequestedEngine)) {
                    mCurrentEngine = mRequestedEngine;
                    return SUCCESS;
                } else if (!mUseFallback) {
                    mCurrentEngine = null;
                    dispatchOnInit(ERROR);
                    return ERROR;
                }
            } else if (!mUseFallback) {
                Log.i(TAG, "Requested engine not installed: " + mRequestedEngine);
                mCurrentEngine = null;
                dispatchOnInit(ERROR);
                return ERROR;
            }
        }

        // Step 2: Try connecting to the user's default engine.
        final String defaultEngine = getDefaultEngine();
        if (defaultEngine != null && !defaultEngine.equals(mRequestedEngine)) {
            if (connectToEngine(defaultEngine)) {
                mCurrentEngine = defaultEngine;
                return SUCCESS;
            }
        }

        // Step 3: Try connecting to the highest ranked engine in the
        // system.
        final String highestRanked = mEnginesHelper.getHighestRankedEngineName();
        if (highestRanked != null && !highestRanked.equals(mRequestedEngine) &&
                !highestRanked.equals(defaultEngine)) {
            if (connectToEngine(highestRanked)) {
                mCurrentEngine = highestRanked;
                return SUCCESS;
            }
        }

        // NOTE: The API currently does not allow the caller to query whether
        // they are actually connected to any engine. This might fail for various
        // reasons like if the user disables all her TTS engines.

        mCurrentEngine = null;
        dispatchOnInit(ERROR);
        return ERROR;
    }

我們分析這段代碼，可以看到註釋寫分了三步：
Step 1: Try connecting to the engine that was requested.
Step 2: Try connecting to the user’s default engine.
Step 3: Try connecting to the highest ranked engine in the system.
分別是1：試圖連接要求的引擎。2：試圖連接用戶默認引擎。3：試圖連接排名最高的引擎。
那麼，誰是要求的引擎呢。我們可以回看TextToSpeech的第二三個構造函數，可以看到參數中可以設置String類型的engine。如果這個參數不爲空，則系統會尋找去連接這個Tts引擎。
然後默認的引擎，是通過getDefaultEngine獲取的。
從註釋中可以理解，這裏的默認，類似於系統設置中，如果有多個引擎可以選擇，用戶選擇的那個就是default engine。比如目前的國產手機，系統自帶的，手機廠商自帶的比如小米、華爲的播放引擎，然後用戶手動安裝的比如訊飛語音輸入法等有都有播報功能，用戶可設置默認引擎。如果原生系統這個默認的就只有名爲"com.svox.pico"的引擎。

   /**
     * @return the default TTS engine. If the user has set a default, and the engine
     *         is available on the device, the default is returned. Otherwise,
     *         the highest ranked engine is returned as per {@link EngineInfoComparator}.
     */
    public String getDefaultEngine() {
        String engine = getString(mContext.getContentResolver(),
                Settings.Secure.TTS_DEFAULT_SYNTH);
        return isEngineInstalled(engine) ? engine : getHighestRankedEngineName();
    }

最後，第三步，連接最高排名的引擎。
getHighestRankedEngineName再調用getEngines

/**
     * Gets a list of all installed TTS engines.
     *
     * @return A list of engine info objects. The list can be empty, but never {@code null}.
     */
    @UnsupportedAppUsage
    public List<EngineInfo> getEngines() {
        PackageManager pm = mContext.getPackageManager();
        Intent intent = new Intent(Engine.INTENT_ACTION_TTS_SERVICE);
        List<ResolveInfo> resolveInfos =
                pm.queryIntentServices(intent, PackageManager.MATCH_DEFAULT_ONLY);
        if (resolveInfos == null) return Collections.emptyList();

        List<EngineInfo> engines = new ArrayList<EngineInfo>(resolveInfos.size());

        for (ResolveInfo resolveInfo : resolveInfos) {
            EngineInfo engine = getEngineInfo(resolveInfo, pm);
            if (engine != null) {
                engines.add(engine);
            }
        }
        Collections.sort(engines, EngineInfoComparator.INSTANCE);

        return engines;
    }

很明顯，系統用PackageManager從系統中獲取所有應用的intent filter爲Intent(Engine.INTENT_ACTION_TTS_SERVICE)的應用，這個就是作爲tts引擎纔會設置的。
然後找了三步，系統中有tts引擎的話，費了這麼多功夫查找，肯定被找到了一個可以連接的引擎，獲取到了引擎的名字engine。然後TextToSpeech去bind這個service，也是用Intent(Engine.INTENT_ACTION_TTS_SERVICE)這個intent，這裏就是普通的連接service的代碼了。

private boolean connectToEngine(String engine) {
        Connection connection = new Connection();
        Intent intent = new Intent(Engine.INTENT_ACTION_TTS_SERVICE);
        intent.setPackage(engine);
        boolean bound = mContext.bindService(intent, connection, Context.BIND_AUTO_CREATE);
        if (!bound) {
            Log.e(TAG, "Failed to bind to " + engine);
            return false;
        } else {
            Log.i(TAG, "Sucessfully bound to " + engine);
            mConnectingServiceConnection = connection;
            return true;
        }

這裏的private class Connection implements ServiceConnection，Connection類是繼承了原生的ServiceConnection的類，其中實現了一些aidl的回調方法。而且還有內部類SetupConnectionAsyncTask，包含了很多內容，且這個異步任務回調了我們TextToSpeech的demo示例中的onInit方法。通過dispatchOnInit(result);如果連接斷開了，則會回調用戶dispatchOnInit(ERROR);如果bindservice回調了連接成功，則在onServiceConnected方法中的mService = ITextToSpeechService.Stub.asInterface(service);這個mService就是我們拿到的Tts引擎的Binder接口，通過這個調用實際的引擎方法。
至此，如果連接成功了，我們就可以正常的使用TextToSpeech提供給我們的方法如Speak，stop等方法。
說了半天，其實這個Service連接的其實就是TextToSpeechService。也是Android系統源碼提供的，同時也是系統原生Tts引擎繼承的Service。

原生Tts Engine分析

我們知道TextToSpeech是通過bind了TextToSpeechService來獲取的tts的能力的，那TtsEngine是如何與之聯繫起來的呢。
系統源碼的/external/svox/pico/compat/src/com/android/tts/compat/CompatTtsService.java中可以看到，此類是繼承了系統的Service。即public abstract class CompatTtsService extends TextToSpeechService 。同時在其內部實現了部分接口方法。而這個抽象類又被真正的引擎Service繼承。
/external/svox/pico/src/com/svox/pico/PicoService.java
public class PicoService extends CompatTtsService
然後實際的工作都在CompatTtsService中把接口工作做了。
private SynthProxy mNativeSynth = null; 這個SynthProxy類實現了getLanguage，isLanguageAvailable，setLanguage，speak，stop，shutdown等方法，所以這個SynthProxy 又是進一步的實現類。

/**
 * The SpeechSynthesis class provides a high-level api to create and play
 * synthesized speech. This class is used internally to talk to a native
 * TTS library that implements the interface defined in
 * frameworks/base/include/tts/TtsEngine.h
 *
 */
public class SynthProxy {

    static {
        System.loadLibrary("ttscompat");
    }

從註釋中可以看出最終是JNI的實現，有這個ttscompat的so實現的。

 public int speak(SynthesisRequest request, SynthesisCallback callback) {
        return native_speak(mJniData, request.getText(), callback);
    }
    public void shutdown() {
        native_shutdown(mJniData);
        mJniData = 0;
    }

這裏應該可以說，TextToSpeech的實現原理及各個模塊都講完了。那如果定製tts引擎呢。

定製Tts引擎

由於原生TextToSpeech未提供中文的播報能力，即使提供了，在國內環境的網絡也是很難使用的，所以很多廠商都會將自己公司的語音播報引擎集成到系統中。那麼我們如何也做一個定製的tts引擎呢。
首先自己要準備好可用的tts提供商的sdk，看提供了哪些能力，然後根據能力現狀選擇方案。比如有些不提供音頻透出，那一方案是用不了的。這個根據實際情況確定，離線在線的引擎，訊飛，阿里，百度，騰訊，思必馳，雲之聲等等。看你能用什麼產品

第一種，繼承系統TextToSpeechService類，然後實現其中的方法。

當然系統也爲我們提供了一個例子
/development/samples/TtsEngine/src/com/example/android/ttsengine/RobotSpeakTtsService.java
public class RobotSpeakTtsService extends TextToSpeechService
當然，需要實現TextToSpeechService中的抽象方法
包括：

protected abstract int onIsLanguageAvailable(String lang, String country, String variant);
protected abstract String[] onGetLanguage();
 protected abstract int onLoadLanguage(String lang, String country, String variant);
 protected abstract void onStop();
 /**
     * Tells the service to synthesize speech from the given text. This method should block until
     * the synthesis is finished. Called on the synthesis thread.
     *
     * @param request The synthesis request.
     * @param callback The callback that the engine must use to make data available for playback or
     *     for writing to a file.
     */
    protected abstract void onSynthesizeText(SynthesisRequest request, SynthesisCallback callback);

最重要的生成的方法，附帶了註釋，這個是根據提供的文字生成音頻，而且會阻塞直到生成結束。根據SynthesisRequest 類型的參數中獲取播報參數，並回調狀態，通過SynthesisCallback 類型的callback回調給系統。
這裏附上剛纔系統提供的tts引擎例子的實現代碼，由於本地的源碼無此類，從在線源碼取得的，會有行號，不妨礙閱讀。

 @Override
156    protected synchronized void onSynthesizeText(SynthesisRequest request,
157            SynthesisCallback callback) {
158        // Note that we call onLoadLanguage here since there is no guarantee
159        // that there would have been a prior call to this function.
160        int load = onLoadLanguage(request.getLanguage(), request.getCountry(),
161                request.getVariant());
162
163        // We might get requests for a language we don't support - in which case
164        // we error out early before wasting too much time.
165        if (load == TextToSpeech.LANG_NOT_SUPPORTED) {
166            callback.error();
167            return;
168        }
169
170        // At this point, we have loaded the language we need for synthesis and
171        // it is guaranteed that we support it so we proceed with synthesis.
172
173        // We denote that we are ready to start sending audio across to the
174        // framework. We use a fixed sampling rate (16khz), and send data across
175        // in 16bit PCM mono.
176        callback.start(SAMPLING_RATE_HZ,
177                AudioFormat.ENCODING_PCM_16BIT, 1 /* Number of channels. */);
178
179        // We then scan through each character of the request string and
180        // generate audio for it.
181        final String text = request.getText().toLowerCase();
182        for (int i = 0; i < text.length(); ++i) {
183            char value = normalize(text.charAt(i));
184            // It is crucial to call either of callback.error() or callback.done() to ensure
185            // that audio / other resources are released as soon as possible.
186            if (!generateOneSecondOfAudio(value, callback)) {
187                callback.error();
188                return;
189            }
190        }
191
192        // Alright, we're done with our synthesis - yay!
193        callback.done();
194    }
195

可以看到在引擎開始工作前，需要回調 callback.start(SAMPLING_RATE_HZ, AudioFormat.ENCODING_PCM_16BIT, 1 /* Number of channels. */);告訴系統生成音頻的採樣頻率，16位pcm格式音頻，單通道。系統收到此回調後則開始等待接收音頻數據。並啓動播放tts。
generateOneSecondOfAudio是假裝的生成一段demo音頻，模擬真正的引擎生成，如果生成完成則回調callback.done。
這種方式的優點是實現功能少，且不需對不同Android平臺做不同處理。其他接口均按系統原生實現。
缺點是對引擎要求高，且調試麻煩，如果沒有對應系統的android源碼，出現問題很難進行調試，因爲系統的log是不打印的，內部哪裏問題很難定位。

第二種，直接取對應系統的TextToSpeech的AIDL接口進行實現。

經過前邊的分析我們知道，TextToSpeech是通過bindservice的形式連接引擎的，而Service又是通過AIDL做爲接口的。我們可以直接取出對應的AIDL，定製引擎實現服務端，客戶端保持不變，當然，服務端的AIDL接口要保持和系統的不變。
這裏要實現的有：
/frameworks/base/core/java/android/speech/tts/ITextToSpeechService.aidl
/frameworks/base/core/java/android/speech/tts/ITextToSpeechCallback.aidl
具體如何實現AIDL，這裏就不在詳細解釋了，有一定基礎的同學看到這裏肯定已經知道思路了。
這種方式的優點是：可定製化程度高，其中暴露的接口都可以根據實際情況進行實現。
缺點是：就是需要實現的方法較多，而且由於Android系統版本的不同，這個aidl接口是有升級改版的，做出來的引擎不會太通用。

當然所有的實現都需要刪除系統原生的Tts引擎的基礎上的，如果不能拿到系統源碼的話，那就只能是前文中提到的規定引擎名字的方法。
另外，最重要的，要讓系統的TextToSpeech能搜索到這個定製的引擎，上文中提到的AndroidManifest.xml中這個service的intent-filter是必不可少的，否則不代表這個應用是個tts引擎。
附上系統中的picoservice的配置。

22        <service android:name=".PicoService"
23                  android:label="@string/app_name">
24            <intent-filter>
25                <action android:name="android.intent.action.TTS_SERVICE" />
26                <category android:name="android.intent.category.DEFAULT" />
27            </intent-filter>
28            <meta-data android:name="android.speech.tts" android:resource="@xml/tts_engine" />
29        </service>

好了，本篇文章結束，對你有幫助的同學記得點個贊。有什麼問題，可以回覆進行討論。

android系統tts TextToSpeech源碼原理解析及定製tts引擎

TextToSpeech源碼解析

原生Tts Engine分析

定製Tts引擎

第一種，繼承系統TextToSpeechService類，然後實現其中的方法。

第二種，直接取對應系統的TextToSpeech的AIDL接口進行實現。

NETCore中實現一個輕量無負擔的極簡任務調度ScheduleTask

docker使用特定的網絡

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

避免DbContext同時在多個線程調用

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（二）使用kube-vip實現集羣VIP訪問

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（三）數據卷掛載NFS（網絡文件系統）

企業大模型如何成爲自己數據的“百科全書”？

手寫路由框架，瞭解ARouter框架核心原理

Handler通信機制源碼解讀

什麼？你還不知道Android studio裏有個Live Template？

android系統tts TextToSpeech源碼原理解析及定製tts引擎

五分鐘讀懂Android中的Binder跨進程機制和AIDL工具

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結