Kaldi的安裝與測試

Kaldi是一款基於C++的開源語音識別工具箱,根據官方的說法,可以再UNIX和Windows編譯並執行。(吐槽一下,最近kaldi的作者也是風雨漂泊,說是要來中國發展,希望能夠接觸到大牛)。無論語音識別還是語音評測,大部分公司都是基於這套框架的,也是進入語音識別領域的基石,可以很快上手。不過代碼是C++寫的,裏面有些坑還需要挖,比如有些模塊的線程安全問題(具體的模塊忘記了,後面記起來再補上),對於python相關的AI工程師還是有點難度的。講真C++比python可優化的速度空間還是大很多的。

1. 下載

直接到Github上下載就可以了,https://github.com/kaldi-asr/kaldi,可以Git clone下來,也可以直接download zip文件到本地。

2. 需要預安裝的包

automake

autoconf

g++

sox

subversion

apt-get

zlib

基於你的機器情況,提示要安裝的包可能不太一樣,可以參考tools/extras/check_dependencies.sh。

到tools目錄下面執行make命令。

make -j 4

有些教程可能會讓你執行make -j 4,採用4核同時build,但是可能確實的包就不會提示了。這裏我們保險起見,可以直接用make。比如你沒有安裝成功openfst,那麼你到src目錄下執行./configure命令時就會報錯。

$ ./configure
Configuring KALDI to use MKL.
Checking compiler g++ ...
Checking OpenFst library in  ...
***configure failed: Could not find file /include/fst/fst.h:
  you may not have installed OpenFst. See ../tools/INSTALL ***

我們直接根據操作系統用brew(MacOS)、apt-get(Linux)或者yum(Centos)安裝相應的包就可以了。

3. 安裝Kaldi

之後再次執行./configure,會提示你沒有MKL,讓你用ATLAS,我們採用ATLAS,但是還是會有錯誤

$ ./configure  --mathlib=ATLAS
Configuring KALDI to use ATLAS.
Backing up kaldi.mk to kaldi.mk.bak ...
Checking compiler g++ ...
Checking OpenFst library in /home/kkm/work/kaldi2/tools/openfst ...
Checking cub library in /home/kkm/work/kaldi2/tools/cub ...
Doing OS specific configurations ...
On Linux: Checking for linear algebra header files ...
Using ATLAS as the linear algebra library.
Could not find libatlas.a in any of the generic-Linux places, but we'll try other stuff...
** Failed to configure ATLAS libraries ***
**  ERROR   **
** Configure cannot proceed automatically.
**  If you know that you have ATLAS installed somewhere on your machine, you
** may be able to proceed by replacing [somewhere] in kaldi.mk with a directory.
**  If you have sudo (root) access you could install the ATLAS package on your
** machine, e.g. 'sudo apt-get install libatlas-dev libatlas-base-dev' or
** 'sudo yum install atlas.x86_64' or 'sudo zypper install libatlas3-devel',
** or on cygwin, install atlas from the installer GUI; and then run ./configure
** again.
**
**  Otherwise (or if you prefer OpenBLAS for speed), you could go the OpenBLAS
** route: cd to ../tools, type 'extras/install_openblas.sh', cd back to here,
** and type './configure  --openblas-root=../tools/OpenBLAS/install'

搜了一下,網上提供了一種方案https://github.com/kaldi-asr/kaldi/pull/3216

$ sudo apt install -y libatlas-base-dev
. . .
$ ./configure  --mathlib=ATLAS
Configuring KALDI to use ATLAS.
Backing up kaldi.mk to kaldi.mk.bak ...
Checking compiler g++ ...
Checking OpenFst library in /data00/home/liuwenchuang/cpphere/kaldi/source/kaldi-master/tools/openfst-1.6.7 ...
Checking cub library in /data00/home/liuwenchuang/cpphere/kaldi/source/kaldi-master/tools/cub-1.8.0 ...
Doing OS specific configurations ...
On Linux: Checking for linear algebra header files ...
Using ATLAS as the linear algebra library.
Successfully configured ATLAS with ATLASLIBS=/usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3
WARNING: CUDA will not be used! If you have already installed cuda drivers
         and CUDA toolkit, try using the --cudatk-dir= option. A GPU and CUDA
         are required to run neural net experiments in a realistic time.
INFO: Configuring Kaldi not to link with Speex. Don't worry, it's only needed if
      you intend to use 'compress-uncompress-speex', which is very unlikely.
Kaldi has been successfully configured. To compile:

  make -j clean depend; make -j <NCPU>

where <NCPU> is the number of parallel builds you can afford to do. If unsure,
use the smaller of the number of CPUs or the amount of RAM in GB divided by 2,
to stay within safe limits. 'make -j' without the numeric value may not limit
the number of parallel jobs at all, and overwhelm even a powerful workstation,
since Kaldi build is highly parallelized.


$ make -j clean depend; make -j 4
. . .
Done


$ sudo apt remove -y libatlas-base-dev --auto-remove

但是用make j 4的話,有錯誤看不錯咋回事。然後你去egs下面去測試會報錯提示fstaddselfloops找不到,搜索後發現是kaldi沒有build成功: https://github.com/uhh-lt/kaldi-tuda-de/issues/10

回來用sudo make跑,問題暴露了。

make[2]: Entering directory '/home/srinivas/Downloads/kaldi/src/online2'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/srinivas/Downloads/kaldi/src/online2'
make -C bin
make[2]: Entering directory '/home/srinivas/Downloads/kaldi/src/bin'
g++-4.9  -Wl,-rpath=/home/srinivas/Downloads/kaldi/tools/openfst/lib -rdynamic  align-equal.o ../decoder/kaldi-decoder.a ../lat/kaldi-lat.a ../lm/kaldi-lm.a ../fstext/kaldi-fstext.a ../hmm/kaldi-hmm.a ../transform/kaldi-transform.a ../gmm/kaldi-gmm.a ../tree/kaldi-tree.a ../util/kaldi-util.a ../thread/kaldi-thread.a ../matrix/kaldi-matrix.a ../base/kaldi-base.a   /home/srinivas/Downloads/kaldi/tools/openfst/lib/libfst.so /usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3 -lm -lpthread -ldl -o align-equal
align-equal.o: In function `fst::internal::FstImpl<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::WriteFstHeader(fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > > const&, std::ostream&, fst::FstWriteOptions const&, int, std::string const&, unsigned long, fst::FstHeader*)':
/home/srinivas/Downloads/kaldi/tools/openfst/include/fst/fst.h:745: undefined reference to `fst::FstHeader::Write(std::ostream&, std::string const&) const'
../decoder/kaldi-decoder.a(training-graph-compiler.o): In function `fst::internal::FstImpl<fst::ReverseArc<fst::ArcTpl<fst::TropicalWeightTpl<float> > > >::WriteFstHeader(fst::Fst<fst::ReverseArc<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > const&, std::ostream&, fst::FstWriteOptions const&, int, std::string const&, unsigned long, fst::FstHeader*)':
/home/srinivas/Downloads/kaldi/tools/openfst/include/fst/fst.h:745: undefined reference to `fst::FstHeader::Write(std::ostream&, std::string const&) const'
../decoder/kaldi-decoder.a(training-graph-compiler.o): In function `fst::internal::FstImpl<fst::ArcTpl<fst::LogWeightTpl<float> > >::WriteFstHeader(fst::Fst<fst::ArcTpl<fst::LogWeightTpl<float> > > const&, std::ostream&, fst::FstWriteOptions const&, int, std::string const&, unsigned long, fst::FstHeader*)':
/home/srinivas/Downloads/kaldi/tools/openfst/include/fst/fst.h:745: undefined reference to `fst::FstHeader::Write(std::ostream&, std::string const&) const'
../fstext/kaldi-fstext.a(kaldi-fst-io.o): In function `fst::ReadFstKaldi(std::string)':
/home/srinivas/Downloads/kaldi/src/fstext/kaldi-fst-io.cc:34: undefined reference to `fst::FstHeader::Read(std::istream&, std::string const&, bool)'
/home/srinivas/Downloads/kaldi/src/fstext/kaldi-fst-io.cc:37: undefined reference to `fst::FstReadOptions::FstReadOptions(std::string const&, fst::FstHeader const*, fst::SymbolTable const*, fst::SymbolTable const*)'
../fstext/kaldi-fstext.a(kaldi-fst-io.o): In function `fst::internal::FstImpl<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::ReadHeader(std::istream&, fst::FstReadOptions const&, int, fst::FstHeader*)':
/home/srinivas/Downloads/kaldi/tools/openfst/include/fst/fst.h:796: undefined reference to `fst::FstHeader::Read(std::istream&, std::string const&, bool)'
collect2: error: ld returned 1 exit status
<builtin>: recipe for target 'align-equal' failed
make[2]: *** [align-equal] Error 1
make[2]: Leaving directory '/home/srinivas/Downloads/kaldi/src/bin'
Makefile:142: recipe for target 'bin' failed
make[1]: *** [bin] Error 2
make[1]: Leaving directory '/home/srinivas/Downloads/kaldi/src'
Makefile:35: recipe for target 'all' failed
make: *** [all] Error 2

不管clean多少次,這個align-equal死活build不過,網上查資料後發現是build和link用g++的版本不一致造成的。

https://www.oipapio.com/question-141226

因爲我本地有兩個g++版本,Debian上輸入g++之後按tab鍵可以看到所有安裝的版本。

$ g++
g++      g++-4.9

我這裏g++是9.2.0的版本。

這裏強行設置一下CXX的版本爲g++(當然也可以設置爲g++-4.9,兩個都做了測試,都能把kaldi build成功)

$ CXX=g++

然後再執行build相關命令竟然echo Done了。

4. 做個簡單的測試

1. 回到kaldi-master目錄,然後到egs目錄下,這個目錄裏面都是sample,我們也簡單用yesno這個測試來簡單看下。進去後再到s5目錄下,至於爲啥取名s5,我也不知道,先擱置一下,到s5目錄下,然後執行

$ sudo ./run.sh

然後成功了,有點興奮。看下這裏面的數據集,發現都是yes/no這樣的一個人的錄音,你仔細看輸出會提示你一個人錄音訓練不好。

2. 我們也可以來個複雜點的,可以參考http://xuxping.com/2019/06/16/20190616_ASR_from_begin_to_abandoned/搞箇中文thchs30的測試。測試的時候注意一下,需要自己去http://www.openslr.org/18/拉數據,然後解壓。你解壓的目錄要更新到run.sh裏面的thchs這個變量。然後你發現跑起來,還會有錯誤。

creating data/{train,dev,test}
cleaning data/train
preparing scps and text in data/train
cleaning data/dev
preparing scps and text in data/dev
cleaning data/test
preparing scps and text in data/test
creating test_phone for phone decoding
steps/make_mfcc.sh --nj 8 --cmd queue.pl data/mfcc/train exp/make_mfcc/train mfcc/train
utils/validate_data_dir.sh: Successfully validated data-directory data/mfcc/train
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
queue.pl: Error submitting jobs to queue (return status was 32512)
queue log file is exp/make_mfcc/train/q/make_mfcc_train.log, command was qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* -o exp/make_mfcc/train/q/make_mfcc_train.log   -t 1:8 /data00/home/liuwenchuang/cpphere/kaldi/source/kaldi-master/egs/thchs30/s5/exp/make_mfcc/train/q/make_mfcc_train.sh >>exp/make_mfcc/train/q/make_mfcc_train.log 2>&1
Output of qsub was: sh: 1: qsub: not found

可以修改cmd.sh文件如下:

# you can change cmd.sh depending on what type of queue you are using.
# If you have no queueing system and want to run on a local machine, you
# can change all instances 'queue.pl' to run.pl (but be careful and run
# commands one by one: most recipes will exhaust the memory on your
# machine).  queue.pl works with GridEngine (qsub).  slurm.pl works
# with slurm.  Different queues are configured differently, with different
# queue names and different ways of specifying things like memory;
# to account for these differences you can create and edit the file
# conf/queue.conf to match your queue's configuration.  Search for
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.

#export train_cmd=queue.pl
#export decode_cmd="queue.pl --mem 4G"
#export mkgraph_cmd="queue.pl --mem 8G"
#export cuda_cmd="queue.pl --gpu 1"
export train_cmd=run.pl
export decode_cmd=run.pl
export mkgraph_cmd="run.pl"

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章