Kaldi的安装与测试

Kaldi是一款基于C++的开源语音识别工具箱,根据官方的说法,可以再UNIX和Windows编译并执行。(吐槽一下,最近kaldi的作者也是风雨漂泊,说是要来中国发展,希望能够接触到大牛)。无论语音识别还是语音评测,大部分公司都是基于这套框架的,也是进入语音识别领域的基石,可以很快上手。不过代码是C++写的,里面有些坑还需要挖,比如有些模块的线程安全问题(具体的模块忘记了,后面记起来再补上),对于python相关的AI工程师还是有点难度的。讲真C++比python可优化的速度空间还是大很多的。

1. 下载

直接到Github上下载就可以了,https://github.com/kaldi-asr/kaldi,可以Git clone下来,也可以直接download zip文件到本地。

2. 需要预安装的包

automake

autoconf

g++

sox

subversion

apt-get

zlib

基于你的机器情况,提示要安装的包可能不太一样,可以参考tools/extras/check_dependencies.sh。

到tools目录下面执行make命令。

make -j 4

有些教程可能会让你执行make -j 4,采用4核同时build,但是可能确实的包就不会提示了。这里我们保险起见,可以直接用make。比如你没有安装成功openfst,那么你到src目录下执行./configure命令时就会报错。

$ ./configure
Configuring KALDI to use MKL.
Checking compiler g++ ...
Checking OpenFst library in  ...
***configure failed: Could not find file /include/fst/fst.h:
  you may not have installed OpenFst. See ../tools/INSTALL ***

我们直接根据操作系统用brew(MacOS)、apt-get(Linux)或者yum(Centos)安装相应的包就可以了。

3. 安装Kaldi

之后再次执行./configure,会提示你没有MKL,让你用ATLAS,我们采用ATLAS,但是还是会有错误

$ ./configure  --mathlib=ATLAS
Configuring KALDI to use ATLAS.
Backing up kaldi.mk to kaldi.mk.bak ...
Checking compiler g++ ...
Checking OpenFst library in /home/kkm/work/kaldi2/tools/openfst ...
Checking cub library in /home/kkm/work/kaldi2/tools/cub ...
Doing OS specific configurations ...
On Linux: Checking for linear algebra header files ...
Using ATLAS as the linear algebra library.
Could not find libatlas.a in any of the generic-Linux places, but we'll try other stuff...
** Failed to configure ATLAS libraries ***
**  ERROR   **
** Configure cannot proceed automatically.
**  If you know that you have ATLAS installed somewhere on your machine, you
** may be able to proceed by replacing [somewhere] in kaldi.mk with a directory.
**  If you have sudo (root) access you could install the ATLAS package on your
** machine, e.g. 'sudo apt-get install libatlas-dev libatlas-base-dev' or
** 'sudo yum install atlas.x86_64' or 'sudo zypper install libatlas3-devel',
** or on cygwin, install atlas from the installer GUI; and then run ./configure
** again.
**
**  Otherwise (or if you prefer OpenBLAS for speed), you could go the OpenBLAS
** route: cd to ../tools, type 'extras/install_openblas.sh', cd back to here,
** and type './configure  --openblas-root=../tools/OpenBLAS/install'

搜了一下,网上提供了一种方案https://github.com/kaldi-asr/kaldi/pull/3216

$ sudo apt install -y libatlas-base-dev
. . .
$ ./configure  --mathlib=ATLAS
Configuring KALDI to use ATLAS.
Backing up kaldi.mk to kaldi.mk.bak ...
Checking compiler g++ ...
Checking OpenFst library in /data00/home/liuwenchuang/cpphere/kaldi/source/kaldi-master/tools/openfst-1.6.7 ...
Checking cub library in /data00/home/liuwenchuang/cpphere/kaldi/source/kaldi-master/tools/cub-1.8.0 ...
Doing OS specific configurations ...
On Linux: Checking for linear algebra header files ...
Using ATLAS as the linear algebra library.
Successfully configured ATLAS with ATLASLIBS=/usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3
WARNING: CUDA will not be used! If you have already installed cuda drivers
         and CUDA toolkit, try using the --cudatk-dir= option. A GPU and CUDA
         are required to run neural net experiments in a realistic time.
INFO: Configuring Kaldi not to link with Speex. Don't worry, it's only needed if
      you intend to use 'compress-uncompress-speex', which is very unlikely.
Kaldi has been successfully configured. To compile:

  make -j clean depend; make -j <NCPU>

where <NCPU> is the number of parallel builds you can afford to do. If unsure,
use the smaller of the number of CPUs or the amount of RAM in GB divided by 2,
to stay within safe limits. 'make -j' without the numeric value may not limit
the number of parallel jobs at all, and overwhelm even a powerful workstation,
since Kaldi build is highly parallelized.


$ make -j clean depend; make -j 4
. . .
Done


$ sudo apt remove -y libatlas-base-dev --auto-remove

但是用make j 4的话,有错误看不错咋回事。然后你去egs下面去测试会报错提示fstaddselfloops找不到,搜索后发现是kaldi没有build成功: https://github.com/uhh-lt/kaldi-tuda-de/issues/10

回来用sudo make跑,问题暴露了。

make[2]: Entering directory '/home/srinivas/Downloads/kaldi/src/online2'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/srinivas/Downloads/kaldi/src/online2'
make -C bin
make[2]: Entering directory '/home/srinivas/Downloads/kaldi/src/bin'
g++-4.9  -Wl,-rpath=/home/srinivas/Downloads/kaldi/tools/openfst/lib -rdynamic  align-equal.o ../decoder/kaldi-decoder.a ../lat/kaldi-lat.a ../lm/kaldi-lm.a ../fstext/kaldi-fstext.a ../hmm/kaldi-hmm.a ../transform/kaldi-transform.a ../gmm/kaldi-gmm.a ../tree/kaldi-tree.a ../util/kaldi-util.a ../thread/kaldi-thread.a ../matrix/kaldi-matrix.a ../base/kaldi-base.a   /home/srinivas/Downloads/kaldi/tools/openfst/lib/libfst.so /usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3 -lm -lpthread -ldl -o align-equal
align-equal.o: In function `fst::internal::FstImpl<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::WriteFstHeader(fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > > const&, std::ostream&, fst::FstWriteOptions const&, int, std::string const&, unsigned long, fst::FstHeader*)':
/home/srinivas/Downloads/kaldi/tools/openfst/include/fst/fst.h:745: undefined reference to `fst::FstHeader::Write(std::ostream&, std::string const&) const'
../decoder/kaldi-decoder.a(training-graph-compiler.o): In function `fst::internal::FstImpl<fst::ReverseArc<fst::ArcTpl<fst::TropicalWeightTpl<float> > > >::WriteFstHeader(fst::Fst<fst::ReverseArc<fst::ArcTpl<fst::TropicalWeightTpl<float> > > > const&, std::ostream&, fst::FstWriteOptions const&, int, std::string const&, unsigned long, fst::FstHeader*)':
/home/srinivas/Downloads/kaldi/tools/openfst/include/fst/fst.h:745: undefined reference to `fst::FstHeader::Write(std::ostream&, std::string const&) const'
../decoder/kaldi-decoder.a(training-graph-compiler.o): In function `fst::internal::FstImpl<fst::ArcTpl<fst::LogWeightTpl<float> > >::WriteFstHeader(fst::Fst<fst::ArcTpl<fst::LogWeightTpl<float> > > const&, std::ostream&, fst::FstWriteOptions const&, int, std::string const&, unsigned long, fst::FstHeader*)':
/home/srinivas/Downloads/kaldi/tools/openfst/include/fst/fst.h:745: undefined reference to `fst::FstHeader::Write(std::ostream&, std::string const&) const'
../fstext/kaldi-fstext.a(kaldi-fst-io.o): In function `fst::ReadFstKaldi(std::string)':
/home/srinivas/Downloads/kaldi/src/fstext/kaldi-fst-io.cc:34: undefined reference to `fst::FstHeader::Read(std::istream&, std::string const&, bool)'
/home/srinivas/Downloads/kaldi/src/fstext/kaldi-fst-io.cc:37: undefined reference to `fst::FstReadOptions::FstReadOptions(std::string const&, fst::FstHeader const*, fst::SymbolTable const*, fst::SymbolTable const*)'
../fstext/kaldi-fstext.a(kaldi-fst-io.o): In function `fst::internal::FstImpl<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::ReadHeader(std::istream&, fst::FstReadOptions const&, int, fst::FstHeader*)':
/home/srinivas/Downloads/kaldi/tools/openfst/include/fst/fst.h:796: undefined reference to `fst::FstHeader::Read(std::istream&, std::string const&, bool)'
collect2: error: ld returned 1 exit status
<builtin>: recipe for target 'align-equal' failed
make[2]: *** [align-equal] Error 1
make[2]: Leaving directory '/home/srinivas/Downloads/kaldi/src/bin'
Makefile:142: recipe for target 'bin' failed
make[1]: *** [bin] Error 2
make[1]: Leaving directory '/home/srinivas/Downloads/kaldi/src'
Makefile:35: recipe for target 'all' failed
make: *** [all] Error 2

不管clean多少次,这个align-equal死活build不过,网上查资料后发现是build和link用g++的版本不一致造成的。

https://www.oipapio.com/question-141226

因为我本地有两个g++版本,Debian上输入g++之后按tab键可以看到所有安装的版本。

$ g++
g++      g++-4.9

我这里g++是9.2.0的版本。

这里强行设置一下CXX的版本为g++(当然也可以设置为g++-4.9,两个都做了测试,都能把kaldi build成功)

$ CXX=g++

然后再执行build相关命令竟然echo Done了。

4. 做个简单的测试

1. 回到kaldi-master目录,然后到egs目录下,这个目录里面都是sample,我们也简单用yesno这个测试来简单看下。进去后再到s5目录下,至于为啥取名s5,我也不知道,先搁置一下,到s5目录下,然后执行

$ sudo ./run.sh

然后成功了,有点兴奋。看下这里面的数据集,发现都是yes/no这样的一个人的录音,你仔细看输出会提示你一个人录音训练不好。

2. 我们也可以来个复杂点的,可以参考http://xuxping.com/2019/06/16/20190616_ASR_from_begin_to_abandoned/搞个中文thchs30的测试。测试的时候注意一下,需要自己去http://www.openslr.org/18/拉数据,然后解压。你解压的目录要更新到run.sh里面的thchs这个变量。然后你发现跑起来,还会有错误。

creating data/{train,dev,test}
cleaning data/train
preparing scps and text in data/train
cleaning data/dev
preparing scps and text in data/dev
cleaning data/test
preparing scps and text in data/test
creating test_phone for phone decoding
steps/make_mfcc.sh --nj 8 --cmd queue.pl data/mfcc/train exp/make_mfcc/train mfcc/train
utils/validate_data_dir.sh: Successfully validated data-directory data/mfcc/train
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
queue.pl: Error submitting jobs to queue (return status was 32512)
queue log file is exp/make_mfcc/train/q/make_mfcc_train.log, command was qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* -o exp/make_mfcc/train/q/make_mfcc_train.log   -t 1:8 /data00/home/liuwenchuang/cpphere/kaldi/source/kaldi-master/egs/thchs30/s5/exp/make_mfcc/train/q/make_mfcc_train.sh >>exp/make_mfcc/train/q/make_mfcc_train.log 2>&1
Output of qsub was: sh: 1: qsub: not found

可以修改cmd.sh文件如下:

# you can change cmd.sh depending on what type of queue you are using.
# If you have no queueing system and want to run on a local machine, you
# can change all instances 'queue.pl' to run.pl (but be careful and run
# commands one by one: most recipes will exhaust the memory on your
# machine).  queue.pl works with GridEngine (qsub).  slurm.pl works
# with slurm.  Different queues are configured differently, with different
# queue names and different ways of specifying things like memory;
# to account for these differences you can create and edit the file
# conf/queue.conf to match your queue's configuration.  Search for
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.

#export train_cmd=queue.pl
#export decode_cmd="queue.pl --mem 4G"
#export mkgraph_cmd="queue.pl --mem 8G"
#export cuda_cmd="queue.pl --gpu 1"
export train_cmd=run.pl
export decode_cmd=run.pl
export mkgraph_cmd="run.pl"

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章