kaldi安裝及yesno實例

Kaldi是一個非常強大的語音識別工具庫,主要由Daniel Povey開發和維護。目前支持GMM-HMM、SGMM-HMM、DNN-HMM等多種語音識別的模型的訓練和預測。其中DNN-HMM中的神經網絡還可以由配置文件自定義,DNN、CNN、TDNN、LSTM以及Bidirectional-LSTM等神經網絡結構均可支持。

目前在Github上這個項目依舊非常活躍,可以在 https://github.com/kaldi-asr/kaldi 下載代碼,以及在 http://kaldi-asr.org/ 查看它的文檔。

下載以及安裝

與其他開源軟件一樣,首先Clone它在Github上的代碼

$ git clone https://github.com/kaldi-asr/kaldi

Clone下來之後按照INSTALL文件的指示,需要先完成tools文件夾下的編譯安裝,然後再去編譯src下的內容。因此,先去tools文件夾:
$ cd kaldi/tools
在tools文件夾下依舊有一個INSTALL,我們根據它的指示,一步一步完成安裝。首先,需要運行extras/check_dependencies.sh這個腳本來檢查一些依賴的環境是否存在並且正確配置。

$  extras/check_dependencies.sh
extras/check_dependencies.sh: automake is not installed.
extras/check_dependencies.sh: autoconf is not installed.
extras/check_dependencies.sh: neither libtoolize nor glibtoolize is installed
extras/check_dependencies.sh: subversion is not installed
extras/check_dependencies.sh: we recommend that you run (our best guess):
  sudo apt-get install automake autoconf libtool subversion
You should probably do: 
  sudo apt-get install libatlas3-base
/bin/sh is linked to dash, and currently some of the scripts will not run properly.  We recommend to run:
  sudo ln -s -f bash /bin/sh

這個輸出的結果不同的Linux會不相同(我的是Ubuntu 16.04)。根據check_dependencies.sh輸出結果的提示,安裝缺的包,以及配置正確的環境

$ sudo apt-get install automake autoconf libtool subversion
$ sudo apt-get install libatlas3-base
$ sudo ln -s -f bash /bin/sh

然後再重新運行一遍check_dependencies.sh

$ extras/check_dependencies.sh
extras/check_dependencies.sh: all OK.

如果輸出以上結果,那麼我們可以繼續安裝了

$ make -j 16

其中-j 16是開16個job同時進行編譯,這個可以根據CPU內核的數量進行指定。確定沒有錯誤後切換到src文件夾

$ cd ../src

這裏面也包含了一個INSTALL文件,按照裏面的步驟編譯和安裝

$ ./configure

運行完成後在最後一行可以看到SUCCESS,如果沒有的話那應該是哪個步驟出問題了,可以去檢查一下上面幾個步驟是否有錯誤

$ make depend
$ make -j 16

檢查一下編譯是否有錯誤,如果沒有錯誤的話make腳本會在屏幕的最後一行輸出Done。至此Kaldi的編譯安裝完成了,可以愉快的開始訓練模型了。

運行yesno實例
步驟和結果如下:
1.直接運行./run.sh。因爲run.sh裏面可以直接下載。
測試呈現在linux上的結果:
book@book-desktop:~/kaldi/egs/yesno/s5$ sudo ./run.sh
[sudo] password for book:
–2017-07-03 15:20:32– http://www.openslr.org/resources/1/waves_yesno.tar.gz
Resolving www.openslr.org (www.openslr.org)… 35.184.122.207
Connecting to www.openslr.org (www.openslr.org)|35.184.122.207|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 4703754 (4.5M) [application/x-gzip]
Saving to: 鈥榳aves_yesno.tar.gz鈥

waves_yesno.tar.gz 100%[===================>] 4.49M 148KB/s in 45s
Data preparation succeeded
Dictionary preparation succeeded
Preparing train and test data
Preparing word lists etc.
fstaddselfloops ‘echo 4 |’ ‘echo 4 |’
Preparing language models for test
arpa2fst -
\data\
Processing 1-grams
Connected 0 states without outgoing arcs.
fstisstochastic data/lang_test_tg/G.fst
1.20397 0
Succeeded in formatting data.
Succeeded creating MFCC features for train_yesno
Succeeded creating MFCC features for test_yesno
Computing cepstral mean and variance statistics
Initializing monophone system.
Compiling training graphs
Aligning data equally (pass 0)
Pass 1
Aligning data
Pass 2
Aligning data
Pass 3
Aligning data
Pass 4
Aligning data
Pass 5
Aligning data
Pass 6
Aligning data
Pass 7
Aligning data
Pass 8
Aligning data
Pass 9
Aligning data
Pass 10
Aligning data
Pass 11
Pass 12
Aligning data
Pass 13
Pass 14
Aligning data
Pass 15
Pass 16
Aligning data
Pass 17
Pass 18
Aligning data
Pass 19
Pass 20
Aligning data
Pass 21
Pass 22
Pass 23
Aligning data
Pass 24
Pass 25
Pass 26
Aligning data
Pass 27
Pass 28
Pass 29
Aligning data
Pass 30
Pass 31
Pass 32
Aligning data
Pass 33
Pass 34
Pass 35
Aligning data
Pass 36
Pass 37
Pass 38
Aligning data
Pass 39
1 warnings in exp/mono0a/log/update.3.log
1 warnings in exp/mono0a/log/update.7.log
Done
fstminimizeencoded
fstdeterminizestar –use-log=true
fsttablecompose data/lang_test_tg/L_disambig.fst data/lang_test_tg/G.fst
fstisstochastic data/lang_test_tg/tmp/LG.fst
1.20412 -2.34608e-05
warning: LG not stochastic.
fstcomposecontext –context-size=1 –central-position=0 –read-disambig-syms=data/lang_test_tg/tmp/disambig_phones.list –write-disambig-syms=data/lang_test_tg/tmp/disambig_ilabels_1_0.list data/lang_test_tg/tmp/ilabels_1_0
fstisstochastic data/lang_test_tg/tmp/CLG_1_0.fst
1.20412 -2.34608e-05
warning: CLG not stochastic.
make-h-transducer –disambig-syms-out=exp/mono0a/graph_tgpr/disambig_tid.list –transition-scale=1.0 data/lang_test_tg/tmp/ilabels_1_0 exp/mono0a/tree exp/mono0a/final.mdl
fstminimizeencoded
fsttablecompose exp/mono0a/graph_tgpr/Ha.fst data/lang_test_tg/tmp/CLG_1_0.fst
fstdeterminizestar –use-log=true
fstrmsymbols exp/mono0a/graph_tgpr/disambig_tid.list
fstrmepslocal
fstisstochastic exp/mono0a/graph_tgpr/HCLGa.fst
1.20412 -2.34608e-05
HCLGa is not stochastic
add-self-loops –self-loop-scale=0.1 –reorder=true exp/mono0a/final.mdl

%WER 0.00 [ 0 / 232, 0 ins, 0 del, 0 sub ] xp/mono0a/decode_test_yesno/wer_10

實驗前可以看一下說明文件。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章