語音識別_CMUSphinx入門(二)創建一個語言模型

本章是https://blog.csdn.net/xj853663557/article/details/84671223的跳轉分支。

本章原文出自https://cmusphinx.github.io/wiki/tutoriallm/

目錄

關鍵詞列表


The language model is an important component of the configuration which tells the decoder which sequences of words are possible to recognize.

There are several types of models: keyword lists, grammars and statistical language models and phonetic language models. They have different capabilities and performance properties. You can chose any decoding mode according to your needs and you can even switch between modes in runtime. See the Pocketsphinx tutorial for more details.

 語言模型是配置的重要組成部分,它告訴譯碼器哪些詞序列是可以識別的。

模型有幾種類型:關鍵詞列表、語法和統計語言模型和語音語言模型。它們具有不同的功能和性能屬性。您可以根據需要選擇任何解碼模式,甚至可以在運行時切換模式。有關更多細節,請參閱Pocketsphinx教程。

關鍵詞列表

Pocketsphinx supports a keyword spotting mode where you can specify a list of keywords to look for. The advantage of this mode is that you can specify a threshold for each keyword so that keywords can be detected in continuous speech. All other modes will try to detect the words from a grammar even if you used words which are not in the grammar. A typical keyword list looks like this:

oh mighty computer /1e-40/
hello world /1e-30/
other phrase /1e-20/

 Pocketsphinx支持一個關鍵詞識別模式,通過這個模式你可以指定一系列關鍵詞去識別。這個模式的一個優勢在於你可以分別指定某個單詞的閾值,這樣就可以讓這個詞在連續講話中被識別到。所有其他模式都會嘗試從語法中檢測單詞,即使您使用的單詞不在語法中。一個典型的關鍵詞列表就像這樣:

oh mighty computer /1e-40/
hello world /1e-30/
other phrase /1e-20/

The threshold must be specified for every keyphrase. For shorter keyphrases you can use smaller thresholds like 1e-1, for longer keyphrases the threshold must be bigger, up to 1e-50. If your keyphrase is very long – larger than 10 syllables – it is recommended to split it and spot for parts separately. The threshold must be tuned to balance between false alarms and missed detections. The best way to do this is to use a prerecorded audio file. The common tuning process is the following:

  1. Take a long recording with few occurrences of your keywords and some other sounds. You can take a movie sound or something else. The length of the audio should be approximately 1 hour.
  2. Run a keyword spotting on that file with different thresholds for every keyword, use the following command:
     pocketsphinx_continuous -infile <your_file.wav> -keyphrase <your keyphrase> \
      -kws_threshold <your_threshold> -time yes
    

    The command will print many lines, some of them are keywords with detection times and confidences. You can also disable extra logs with the -logfn your_file.log option to avoid clutter.

  3. From your keyword spotting results count how many false alarms and missed detections you’ve encountered.
  4. Select the threshold with the smallest amount of false alarms and missed detections.

For the best accuracy it is better to have a keyphrase with 3-4 syllables. Too short phrases are easily confused.

Keyword lists are only supported by pocketsphinx, sphinx4 cannot handle them.

 必須爲每個關鍵詞短語設置閾值。如果關鍵詞比較短,那就使用小閾值比如1e-1。如果對於更長的關鍵詞,必須使用更大的閾值,最高到1e-50。如果要識別的關鍵詞短語非常長,大於10個音節,強烈建議分割它們,並分別識別。必須對閾值進行調整,以平衡誤報和漏檢。最好的方法是使用預先錄製好的音頻文件。常見的調優過程如下:

  1. 做一個長時間的錄音,在錄音中少量出現你要識別的關鍵詞以及一些其他單詞。你可以使用電影的聲音或者其他的。錄音長度近似1個小時。
  2. 使用下方的命令,在這個錄音和你的關鍵詞列表運行關鍵詞識別:
     pocketsphinx_continuous -infile <your_file.wav> -keyphrase <your keyphrase> \
      -kws_threshold <your_threshold> -time yes
    

    該命令將打印許多行,其中一些是具有檢測時間和信心的關鍵字。你也可以使用選項-logfn your_file.log禁止額外的日誌避免顯得雜亂

  3. 從關鍵詞識別結果中計算您遇到了多少次錯誤警報和漏檢。
  4. 選擇錯誤警報和漏檢次數最少的閾值。

爲了得到最好的識別準確性,你最好使用3-4個音節的關鍵詞短語。太短了會很容易混淆,導致錯誤警報。

關鍵詞識別只被pocketsphinx支持,sphinx4不支持。

未完待續……

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章