Kaldi中nnet3進行語音識別過程中用到的部分工具集錦!!!

前一篇文章我們對Kaldi ASR有了初步的瞭解,我們再來看看怎麼使用Kaldi的神經網絡模型NNET3來進行wav文件語音識別~~~~

下載中文預訓練模型:

[houwenbin@localhost ~]$ cd ~/kaldi-master/egs

[houwenbin@localhost egs]$ wget -T 10 -t 3 http://kaldi-asr.org/models/0002_cvte_chain_model.tar.gz

[houwenbin@localhost egs]$ tar xzf 0002_cvte_chain_model.tar.gz


可以看到cvte目錄,進s5去,創建連個軟連接:

[houwenbin@localhost egs]$ ln -s ~/kaldi-master/egs/wsj/s5/steps ~/kaldi-master/egs/cvte/s5/steps

[houwenbin@localhost egs]$ ln -s ~/kaldi-master/egs/wsj/s5/utils ~/kaldi-master/egs/cvte/s5/utils


完成這些工作,我們就可以運行run.sh

#!/bin/bash

. ./cmd.sh
. ./path.sh

# step 1: generate fbank features
obj_dir=data/fbank

for x in test; do
  # rm fbank/$x
  mkdir -p fbank/$x

  # compute fbank without pitch
  steps/make_fbank.sh --nj 1 --cmd "run.pl" $obj_dir/$x exp/make_fbank/$x fbank/$x || exit 1;
  # compute cmvn
  steps/compute_cmvn_stats.sh $obj_dir/$x exp/fbank_cmvn/$x fbank/$x || exit 1;
done

# #step 2: offline-decoding
test_data=data/fbank/test
dir=exp/chain/tdnn

steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 \
  --nj 1 --num-threads 1 \
  --cmd "$decode_cmd" --iter final \
  --frames-per-chunk 50 \
  $dir/graph $test_data $dir/decode_test

# # note: the model is trained using "apply-cmvn-online",
# # so you can modify the corresponding code in steps/nnet3/decode.sh to obtain the best performance,
# # but if you directly steps/nnet3/decode.sh, 
# # the performance is also good, but a little poor than the "apply-cmvn-online" method.

如果不出問題的話,就會得到識別結果了!

# nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |" "ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz" 
# Started at Fri Jun 16 15:38:02 CST 2017
#
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz' 
lattice-scale --acoustic-scale=10.0 ark:- ark:- 
apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- 
LOG (nnet3-latgen-faster[5.1]:CheckAndFixConfigs():nnet-am-decodable-simple.cc:303) Increasing --frames-per-chunk from 50 to 51 to make it a multiple of --frame-subsampling-factor=3
CVTE201703_00030_165722_1175 據 樓主 老婆 說 樓主 昨天 家族 聚會 喝 多 了 回家 路上 大腦 和麪 跟 電線杆 表白 了 一個 多 小時 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165722_1175 is 1.90676 over 452 frames.
CVTE201703_00030_165740_2562 因爲 沒 撈 了 不少 我家 裏 經常 來往 的 人 也 都是 搞 煤礦 的 基本上 現在 都 轉行 了 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165740_2562 is 1.99356 over 379 frames.
CVTE201703_00030_165754_5069 爲啥 叫 皇上 呢 因爲 那時候 凡是 公司 聚餐 行政 都 要 問 我 想 吃 什麼 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165754_5069 is 2.00562 over 298 frames.
CVTE201703_00030_165809_2685 一旦 有 什麼 問題 手機 馬上 就會 報警 然後 系統 自動 停機 等 解決 故障 之後 再開 機 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165809_2685 is 2.31544 over 303 frames.
CVTE201703_00030_165830_5107 首先 你 說 沈 大人 是 這個 就 不符合 按 答 組 的 情況 只能 去 做 天 貓 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165830_5107 is 1.93123 over 260 frames.
CVTE201703_00030_165847_5561 還有 就是 幾年 同學 不 聯繫 微信 問 在 不在 就讓 你 幫忙 刷 好評 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165847_5561 is 2.14821 over 247 frames.
CVTE201703_00030_165907_3088 讀 碩 一般 只要 有 學校 錄取通知書 簽證 肯定 下來 申請 學校 還是 得 靠 你 自己 啊 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165907_3088 is 2.08663 over 307 frames.
CVTE201703_00030_165916_7980 我 認識 一個 叔叔 輩 從前 都是 老實巴交 的 好好先生 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165916_7980 is 1.94317 over 183 frames.
CVTE201703_00030_165929_3456 這樣 即使 有事 故 發生 冷卻 系統 停止 工作 斷電 這裏 仍然 會 保持 在 零下 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165929_3456 is 2.17643 over 290 frames.
LOG (apply-cmvn[5.1]:main():apply-cmvn.cc:146) Applied cepstral mean normalization to 10 utterances, errors on 0
CVTE201703_00030_165942_5013 關於 還款 匯率 希望 大家 不要 被 誤導 當然 這個 火雞 的 答案 並不 對 
LOG (nnet3-latgen-faster[5.1]:DecodeUtteranceLatticeFaster():decoder-wrappers.cc:300) Log-like per frame for utterance CVTE201703_00030_165942_5013 is 2.18779 over 226 frames.
LOG (nnet3-latgen-faster[5.1]:main():nnet3-latgen-faster.cc:256) Time taken 50.7098s: real-time factor assuming 100 frames/sec is 0.573965
LOG (nnet3-latgen-faster[5.1]:main():nnet3-latgen-faster.cc:259) Done 10 utterances, failed for 0
LOG (nnet3-latgen-faster[5.1]:main():nnet3-latgen-faster.cc:261) Overall log-likelihood per frame is 2.06153 over 2945 frames.
LOG (nnet3-latgen-faster[5.1]:~CachingOptimizingCompiler():nnet-optimize.cc:659) 0.0935 seconds taken in nnet3 compilation total (breakdown: 0.0446 compilation, 0.0358 optimization, 0 shortcut expansion, 0.00728 checking, 6.7e-05 computing indexes, 0.00579 misc.)
LOG (lattice-scale[5.1]:main():lattice-scale.cc:90) Done 10 lattices.
# Accounting: time=159 threads=1
# Ended (code 0) at Fri Jun 16 15:40:41 CST 2017, elapsed time 159 seconds

-------------------------------------------------------------通過分析日誌,我們可以發現調用了-------------------------------------------------------------

# nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz'

[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-latgen-faster -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-latgen-faster -h 

Generate lattices using GMM-based model.
Usage: gmm-latgen-faster [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ]

Options:
  --acoustic-scale            : Scaling factor for acoustic likelihoods (float, default = 0.1)
  --allow-partial             : If true, produce output even if end state was not reached. (bool, default = false)
  --beam                      : Decoding beam.  Larger->slower, more accurate. (float, default = 16)
  --beam-delta                : Increment used in decoding-- this parameter is obscure and relates to a speedup in the way the max-active constraint is applied.  Larger is more accurate. (float, default = 0.5)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : If true, determinize the lattice (lattice-determinization, keeping only best pdf-sequence for each word-sequence). (bool, default = true)
  --hash-ratio                : Setting used in decoder to control hash behavior (float, default = 2)
  --lattice-beam              : Lattice generation beam.  Larger->slower, and deeper lattices (float, default = 10)
  --max-active                : Decoder max active states.  Larger->slower; more accurate (int, default = 2147483647)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --min-active                : Decoder minimum #active states. (int, default = 200)
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --prune-interval            : Interval (in frames) at which to prune tokens (int, default = 25)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)
  --word-symbol-table         : Symbol table for words [for debug output] (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# lattice-scale --acoustic-scale=10.0 ark:- ark:- 

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/latbin/lattice-scale -h
/home/houwenbin/kaldi-master/src/latbin/lattice-scale -h 

Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats

Options:
  --acoustic-scale            : Scaling factor for acoustic likelihoods (float, default = 1)
  --acoustic2lm-scale         : Add this times original acoustic costs to LM costs (float, default = 0)
  --inv-acoustic-scale        : An alternative way of setting the acoustic scale: you can set its inverse. (float, default = 1)
  --lm-scale                  : Scaling factor for graph/lm costs (float, default = 1)
  --lm2acoustic-scale         : Add this times original LM costs to acoustic costs (float, default = 0)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

  
[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/latbin/lattice-scale -h
/home/houwenbin/kaldi-master/src/latbin/lattice-scale -h 

Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats

Options:
  --acoustic-scale            : Scaling factor for acoustic likelihoods (float, default = 1)
  --acoustic2lm-scale         : Add this times original acoustic costs to LM costs (float, default = 0)
  --inv-acoustic-scale        : An alternative way of setting the acoustic scale: you can set its inverse. (float, default = 1)
  --lm-scale                  : Scaling factor for graph/lm costs (float, default = 1)
  --lm2acoustic-scale         : Add this times original LM costs to acoustic costs (float, default = 0)


這裏貼一下,訓練過程中用到的一些工具:

一、mono階段:

1、初始化monophone GMM
# gmm-init-mono --shared-phones=data/lang/phones/sets.int "--train-feats=ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/1/utt2spk scp:data/mfcc/train/split8/1/cmvn.scp scp:data/mfcc/train/split8/1/feats.scp ark:- | add-deltas ark:- ark:- | subset-feats --n=10 ark:- ark:-|" data/lang/topo 39 exp/mono/0.mdl exp/mono/tree

[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-init-mono -h    
/home/houwenbin/kaldi-master/src/gmmbin/gmm-init-mono -h 

Initialize monophone GMM.
Usage:  gmm-init-mono <topology-in> <dim> <model-out> <tree-out> 
e.g.: 
 gmm-init-mono topo 39 mono.mdl mono.tree

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --perturb-factor            : Perturb the means using this fraction of standard deviation. (float, default = 0)
  --shared-phones             : rxfilename containing, on each line, a list of phones whose pdfs should be shared. (string, default = "")
  --train-feats               : rspecifier for training features [used to set mean and variance] (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)


2、編譯train圖譜
# compile-train-graphs --read-disambig-syms=data/lang/phones/disambig.int exp/mono/tree exp/mono/0.mdl data/lang/L.fst "ark:sym2int.pl --map-oov 2 -f 2- data/lang/words.txt < data/mfcc/train/split8/2/text|" "ark:|gzip -c >exp/mono/fsts.2.gz"

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/compile-train-graphs -h
/home/houwenbin/kaldi-master/src/bin/compile-train-graphs -h 

Creates training graphs (without transition-probabilities, by default)

Usage:   compile-train-graphs [options] <tree-in> <model-in> <lexicon-fst-in> <transcriptions-rspecifier> <graphs-wspecifier>
e.g.: 
 compile-train-graphs tree 1.mdl lex.fst 'ark:sym2int.pl -f 2- words.txt text|' ark:graphs.fsts

Options:
  --batch-size                : Number of FSTs to compile at a time (more -> faster but uses more memory.  E.g. 500 (int, default = 250)
  --read-disambig-syms        : File containing list of disambiguation symbols in phone symbol table (string, default = "")
  --reorder                   : Reorder transition ids for greater decoding efficiency. (bool, default = true)
  --rm-eps                    : Remove [most] epsilons before minimization (only applicable if disambig symbols present) (bool, default = false)
  --self-loop-scale           : Scale of self-loop vs. non-self-loop probability mass  (float, default = 0)
  --transition-scale          : Scale of transition probabilities (excluding self-loops) (float, default = 0)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

3、aligment
# align-equal-compiled "ark:gunzip -c exp/mono/fsts.3.gz|" "ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/3/utt2spk scp:data/mfcc/train/split8/3/cmvn.scp scp:data/mfcc/train/split8/3/feats.scp ark:- | add-deltas ark:- ark:- |" ark,t:- | gmm-acc-stats-ali --binary=true exp/mono/0.mdl "ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/3/utt2spk scp:data/mfcc/train/split8/3/cmvn.scp scp:data/mfcc/train/split8/3/feats.scp ark:- | add-deltas ark:- ark:- |" ark:- exp/mono/0.3.acc 

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/align-equal-compiled -h
/home/houwenbin/kaldi-master/src/bin/align-equal-compiled -h 

Write an equally spaced alignment (for getting training started)Usage:  align-equal-compiled <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier>
e.g.: 
 align-equal-compiled 1.fsts scp:train.scp ark:equal.ali

Options:
  --binary                    : Write output in binary mode (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-boost-silence --boost=1.25 1 exp/mono/1.mdl -

[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-boost-silence -h   
/home/houwenbin/kaldi-master/src/gmmbin/gmm-boost-silence -h 

Modify GMM-based model to boost (by a certain factor) all
probabilities associated with the specified phones (could be
all silence phones, or just the ones used for optional silence).
Note: this is done by modifying the GMM weights.  If the silence
model shares a GMM with other models, then it will modify the GMM
weights for all models that may correspond to silence.

Usage:  gmm-boost-silence [options] <silence-phones-list> <model-in> <model-out>
e.g.: gmm-boost-silence --boost=1.5 1:2:3 1.mdl 1_boostsil.mdl

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --boost                     : Factor by which to boost silence probs (float, default = 1.5)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)
  
4、update
# gmm-est --min-gaussian-occupancy=3 --mix-up=656 --power=0.25 exp/mono/0.mdl 'gmm-sum-accs - exp/mono/0.*.acc|' exp/mono/1.mdl

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-est -h          
/home/houwenbin/kaldi-master/src/gmmbin/gmm-est -h 

Do Maximum Likelihood re-estimation of GMM-based acoustic model
Usage:  gmm-est [options] <model-in> <stats-in> <model-out>
e.g.: gmm-est 1.mdl 1.acc 2.mdl

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --min-count                 : Minimum per-Gaussian count enforced while mixing up and down. (float, default = 20)
  --min-gaussian-occupancy    : MleDiagGmmOptions: Minimum occupancy to update a Gaussian. (float, default = 10)
  --min-gaussian-weight       : MleDiagGmmOptions: Min Gaussian weight before we remove it. (float, default = 1e-05)
  --min-variance              : MleDiagGmmOptions: Variance floor (absolute variance). (double, default = 0.001)
  --mix-down                  : If nonzero, merge mixture components to this target. (int, default = 0)
  --mix-up                    : Increase number of mixture components to this overall target. (int, default = 0)
  --perturb-factor            : While mixing up, perturb means by standard deviation times this factor. (float, default = 0.01)
  --power                     : If mixing up, power to allocate Gaussians to states. (float, default = 0.2)
  --remove-low-count-gaussians : MleDiagGmmOptions: If true, remove Gaussians that fall below the floors. (bool, default = true)
  --share-for-pdfs            : If true, share all transition parameters where the states have the same pdf. (bool, default = false)
  --transition-floor          : Floor for transition probabilities (float, default = 0.01)
  --transition-min-count      : Minimum count required to update transitions from a state (float, default = 5)
  --update-flags              : Which GMM parameters to update: subset of mvwt. (string, default = "mvwt")
  --write-occs                : File to write pdf occupation counts to. (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-sum-accs - exp/mono/0.1.acc exp/mono/0.2.acc exp/mono/0.3.acc exp/mono/0.4.acc exp/mono/0.5.acc exp/mono/0.6.acc exp/mono/0.7.acc exp/mono/0.8.acc

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-sum-accs -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-sum-accs -h 

Sum multiple accumulated stats files for GMM training.
Usage: gmm-sum-accs [options] <stats-out> <stats-in1> <stats-in2> ...
E.g.: gmm-sum-accs 1.acc 1.1.acc 1.2.acc

Options:
  --binary                    : Write output in binary mode (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-acc-stats-ali --binary=true exp/mono/0.mdl 'ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/5/utt2spk scp:data/mfcc/train/split8/5/cmvn.scp scp:data/mfcc/train/split8/5/feats.scp ark:- | add-deltas ark:- ark:- |' ark:- exp/mono/0.5.acc

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-acc-stats-ali -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-acc-stats-ali -h 

Accumulate stats for GMM training.
Usage:  gmm-acc-stats-ali [options] <model-in> <feature-rspecifier> <alignments-rspecifier> <stats-out>
e.g.:
 gmm-acc-stats-ali 1.mdl scp:train.scp ark:1.ali 1.acc

Options:
  --binary                    : Write output in binary mode (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# add-deltas ark:- ark:-

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/add-deltas -h
/home/houwenbin/kaldi-master/src/featbin/add-deltas -h 

Add deltas (typically to raw mfcc or plp features
Usage: add-deltas [options] in-rspecifier out-wspecifier

Options:
  --delta-order               : Order of delta computation (int, default = 2)
  --delta-window              : Parameter controlling window for delta computation (actual window size for each delta order is 1 + 2*delta-window-size) (int, default = 2)
  --truncate                  : If nonzero, first truncate features to this dimension. (int, default = 0)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)
  
# apply-cmvn --utt2spk=ark:data/mfcc/train/split8/5/utt2spk scp:data/mfcc/train/split8/5/cmvn.scp scp:data/mfcc/train/split8/5/feats.scp ark:-

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/apply-cmvn -h      
/home/houwenbin/kaldi-master/src/featbin/apply-cmvn -h 

Apply cepstral mean and (optionally) variance normalization
Per-utterance by default, or per-speaker if utt2spk option provided
Usage: apply-cmvn [options] (<cmvn-stats-rspecifier>|<cmvn-stats-rxfilename>) <feats-rspecifier> <feats-wspecifier>
e.g.: apply-cmvn --utt2spk=ark:data/train/utt2spk scp:data/train/cmvn.scp scp:data/train/feats.scp ark:-
See also: modify-cmvn-stats, matrix-sum, compute-cmvn-stats

Options:
  --norm-means                : You can set this to false to turn off mean normalization.  Note, the same can be achieved by using 'fake' CMVN stats; see the --fake option to compute_cmvn_stats.sh (bool, default = true)
  --norm-vars                 : If true, normalize variances. (bool, default = false)
  --reverse                   : If true, apply CMVN in a reverse sense, so as to transform zero-mean, unit-variance input into data with the given mean and variance. (bool, default = false)
  --skip-dims                 : Dimensions for which to skip normalization: colon-separated list of integers, e.g. 13:14:15) (string, default = "")
  --utt2spk                   : rspecifier for utterance to speaker map (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

5、analyze_alignments
# ali-to-phones --write-lengths=true exp/mono/final.mdl 'ark:gunzip -c exp/mono/ali.1.gz|' ark,t:-

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/ali-to-phones -h       
/home/houwenbin/kaldi-master/src/bin/ali-to-phones -h 

Convert model-level alignments to phone-sequences (in integer, not text, form)
Usage:  ali-to-phones  [options] <model> <alignments-rspecifier> <phone-transcript-wspecifier|ctm-wxfilename>
e.g.: 
 ali-to-phones 1.mdl ark:1.ali ark:-
or:
 ali-to-phones --ctm-output 1.mdl ark:1.ali 1.ctm
See also: show-alignments lattice-align-phones

Options:
  --ctm-output                : If true, output the alignments in ctm format (the confidences will be set to 1) (bool, default = false)
  --frame-shift               : frame shift used to control the times of the ctm output (float, default = 0.01)
  --per-frame                 : If true, write out the frame-level phone alignment (else phone sequence) (bool, default = false)
  --write-lengths             : If true, write the #frames for each phone (different format) (bool, default = false)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)


# sum-tree-stats exp/tri1/treeacc exp/tri1/1.treeacc exp/tri1/2.treeacc exp/tri1/3.treeacc exp/tri1/4.treeacc exp/tri1/5.treeacc exp/tri1/6.treeacc exp/tri1/7.treeacc exp/tri1/8.treeacc

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/sum-tree-stats -h
/home/houwenbin/kaldi-master/src/bin/sum-tree-stats -h 

Sum statistics for phonetic-context tree building.
Usage:  sum-tree-stats [options] tree-accs-out tree-accs-in1 tree-accs-in2 ...
e.g.: 
 sum-tree-stats treeacc 1.treeacc 2.treeacc 3.treeacc

Options:
  --binary                    : Write output in binary mode (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# cluster-phones exp/tri1/treeacc data/lang/phones/sets.int exp/tri1/questions.int

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/cluster-phones -h
/home/houwenbin/kaldi-master/src/bin/cluster-phones -h 

Cluster phones (or sets of phones) into sets for various purposes
Usage:  cluster-phones [options] <tree-stats-in> <phone-sets-in> <clustered-phones-out>
e.g.: 
 cluster-phones 1.tacc phonesets.txt questions.txt

Options:
  --central-position          : Central position in context window [must match acc-tree-stats] (int, default = 1)
  --context-width             : Does not have any effect-- included for scripting convenience. (int, default = 3)
  --mode                      : Mode of operation: "questions"->sets suitable for decision trees; "k-means"->k-means algorithm, output k classes (set num-classes options)
 (string, default = "questions")
  --num-classes               : For k-means mode, number of classes. (int, default = -1)
  --pdf-class-list            : Colon-separated list of HMM positions to consider [Default = 1: just central position for 3-state models]. (string, default = "1")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-mixup --mix-up=2000 exp/tri1/1.mdl exp/tri1/1.occs exp/tri1/1.mdl  

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-mixup -h  
/home/houwenbin/kaldi-master/src/gmmbin/gmm-mixup -h 

Does GMM mixing up (and Gaussian merging)
Usage:  gmm-mixup [options] <model-in> <state-occs-in> <model-out>
e.g. of mixing up:
 gmm-mixup --mix-up=4000 1.mdl 1.occs 2.mdl
e.g. of merging:
 gmm-mixup --merge=2000 1.mdl 1.occs 2.mdl

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --min-count                 : Minimum count enforced while mixing up. (float, default = 20)
  --mix-down                  : If nonzero, merge mixture components to this target. (int, default = 0)
  --mix-up                    : Increase number of mixture components to this overall target. (int, default = 0)
  --perturb-factor            : While mixing up, perturb means by standard deviation times this factor. (float, default = 0.01)
  --power                     : If mixing up, power to allocate Gaussians to states. (float, default = 0.2)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-init-model --write-occs=exp/tri1/1.occs exp/tri1/tree exp/tri1/treeacc data/lang/topo exp/tri1/1.mdl

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-init-model -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-init-model -h 

Initialize GMM from decision tree and tree stats
Usage:  gmm-init-model [options] <tree-in> <tree-stats-in> <topo-file> <model-out> [<old-tree> <old-model>]
e.g.: 
  gmm-init-model tree treeacc topo 1.mdl
or (initializing GMMs with old model):
  gmm-init-model tree treeacc topo 1.mdl prev/tree prev/30.mdl

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --var-floor                 : Variance floor used while initializing Gaussians (double, default = 0.01)
  --write-occs                : File to write state occupancies to. (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# convert-ali exp/mono_ali/final.mdl exp/tri1/1.mdl exp/tri1/tree "ark:gunzip -c exp/mono_ali/ali.2.gz|" "ark:|gzip -c >exp/tri1/ali.2.gz"

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/convert-ali -h   
/home/houwenbin/kaldi-master/src/bin/convert-ali -h 

Convert alignments from one decision-tree/model to another
Usage:  convert-ali  [options] <old-model> <new-model> <new-tree> <old-alignments-rspecifier> <new-alignments-wspecifier>
e.g.: 
 convert-ali old/final.mdl new/0.mdl new/tree ark:old/ali.1 ark:new/ali.1

Options:
  --frame-subsampling-factor  : Can be used in converting alignments to reduced frame rates. (int, default = 1)
  --phone-map                 : File name containing old->new phone mapping (each line is: old-integer-id new-integer-id) (string, default = "")
  --reorder                   : True if you want the converted alignments to be 'reordered' versus the way they appear in the HmmTopology object (bool, default = true)
  --repeat-frames             : Only relevant when frame-subsampling-factor != 1.  If true, repeat frames of alignment by 'frame-subsampling-factor' after alignment conversion, to keep the alignment the same length as the input alignment. (bool, default = false)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# compile-questions data/lang/topo exp/tri1/questions.int exp/tri1/questions.qst

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/compile-questions -h
/home/houwenbin/kaldi-master/src/bin/compile-questions -h 

Compile questions
Usage:  compile-questions [options] <topo> <questions-text-file> <questions-out>
e.g.: 
 compile-questions questions.txt questions.qst

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --central-position          : Central position in phone context window [must match acc-tree-stats] (int, default = 1)
  --context-width             : Context window size [must match acc-tree-stats]. (int, default = 3)
  --num-iters-refine          : Number of iters of refining questions at each node.  >0 --> questions not refined (int, default = 0)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# build-tree --verbose=1 --max-leaves=2000 --cluster-thresh=-1 exp/tri1/treeacc data/lang/phones/roots.int exp/tri1/questions.qst data/lang/topo exp/tri1/tree

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/build-tree -h       
/home/houwenbin/kaldi-master/src/bin/build-tree -h 

Train decision tree
Usage:  build-tree [options] <tree-stats-in> <roots-file> <questions-file> <topo-file> <tree-out>
e.g.: 
 build-tree treeacc roots.txt 1.qst topo tree

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --central-position          : Central position in context window [must match acc-tree-stats] (int, default = 1)
  --cluster-thresh            : Log-likelihood change threshold for clustering after tree-building.  0 means no clustering; -1 means use as a clustering threshold the likelihood change of the final split. (float, default = -1)
  --context-width             : Context window size [must match acc-tree-stats] (int, default = 3)
  --max-leaves                : Maximum number of leaves to be used in tree-buliding (if positive) (int, default = 0)
  --thresh                    : Log-likelihood change threshold for tree-building (float, default = 300)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

  
  
二、tri2b

# transform-feats exp/tri2b/0.mat ark:- ark:-

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/transform-feats -h
/home/houwenbin/kaldi-master/src/featbin/transform-feats -h 

Apply transform (e.g. LDA; HLDA; fMLLR/CMLLR; MLLT/STC)
Linear transform if transform-num-cols == feature-dim, affine if
transform-num-cols == feature-dim+1 (->append 1.0 to features)
Per-utterance by default, or per-speaker if utt2spk option provided
Global if transform-rxfilename provided.
Usage: transform-feats [options] (<transform-rspecifier>|<transform-rxfilename>) <feats-rspecifier> <feats-wspecifier>
See also: transform-vec, copy-feats, compose-transforms

Options:
  --utt2spk                   : rspecifier for utterance to speaker map (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# splice-feats --left-context=3 --right-context=3 ark:- ark:-

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/splice-feats -h   
/home/houwenbin/kaldi-master/src/featbin/splice-feats -h 

Splice features with left and right context (e.g. prior to LDA)
Usage: splice-feats [options] <feature-rspecifier> <feature-wspecifier>
e.g.: splice-feats scp:feats.scp ark:-

Options:
  --left-context              : Number of frames of left context (int, default = 4)
  --right-context             : Number of frames of right context (int, default = 4)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)  
  
# LDA
# weight-silence-post 0.0 1 exp/tri1_ali/final.mdl ark:- ark:-

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/weight-silence-post -h
/home/houwenbin/kaldi-master/src/bin/weight-silence-post -h 

Apply weight to silences in posts
Usage:  weight-silence-post [options] <silence-weight> <silence-phones> <model> <posteriors-rspecifier> <posteriors-wspecifier>
e.g.:
 weight-silence-post 0.0 1:2:3 1.mdl ark:1.post ark:nosil.post

Options:
  --distribute                : If true, rather than weighting the individual posteriors, apply the weighting to the whole frame: i.e. on time t, scale all posterior entries by p(sil)*silence-weight + p(non-sil)*1.0 (bool, default = false)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# ali-to-post 'ark:gunzip -c exp/tri1_ali/ali.7.gz|' ark:-

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/ali-to-post -h        
/home/houwenbin/kaldi-master/src/bin/ali-to-post -h 

Convert alignments to posteriors.  This is simply a format change
from integer vectors to Posteriors, which are vectors of lists of
pairs (int, float) where the float represents the posterior.  The
floats would all be 1.0 in this case.
The posteriors will still be in terms of whatever integer index
the input contained, which will be transition-ids if they came
directly from decoding, or pdf-ids if they were processed by
ali-to-post.
Usage:  ali-to-post [options] <alignments-rspecifier> <posteriors-wspecifier>
e.g.:
 ali-to-post ark:1.ali ark:1.post
See also: ali-to-pdf, ali-to-phones, show-alignments, post-to-weights

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# acc-lda --rand-prune=4.0 exp/tri1_ali/final.mdl 'ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/7/utt2spk scp:data/mfcc/train/split8/7/cmvn.scp scp:data/mfcc/train/split8/7/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- |' ark,s,cs:- exp/tri2b/lda.7.acc

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/acc-lda -h    
/home/houwenbin/kaldi-master/src/bin/acc-lda -h 

Accumulate LDA statistics based on pdf-ids.
Usage:  acc-lda [options] <transition-gmm/model> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out>
Typical usage:
 ali-to-post ark:1.ali ark:- | lda-acc 1.mdl "ark:splice-feats scp:train.scp|"  ark:- ldaacc.1

Options:
  --binary                    : Write accumulators in binary mode. (bool, default = true)
  --rand-prune                : Randomized pruning threshold for posteriors (float, default = 0)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# est-lda --write-full-matrix=exp/tri2b/full.mat --dim=40 exp/tri2b/0.mat exp/tri2b/lda.1.acc exp/tri2b/lda.2.acc exp/tri2b/lda.3.acc exp/tri2b/lda.4.acc exp/tri2b/lda.5.acc exp/tri2b/lda.6.acc exp/tri2b/lda.7.acc exp/tri2b/lda.8.acc

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/bin/est-lda -h
/home/houwenbin/kaldi-master/src/bin/est-lda -h 

Estimate LDA transform using stats obtained with acc-lda.
Usage:  est-lda [options] <lda-matrix-out> <lda-acc-1> <lda-acc-2> ...

Options:
  --allow-large-dim           : If true, allow an LDA dimension larger than the number of classes. (bool, default = false)
  --binary                    : Write matrix in binary mode. (bool, default = true)
  --dim                       : Dimension to project to with LDA (int, default = 40)
  --remove-offset             : If true, output an affine transform that makes the projected data mean equal to zero. (bool, default = false)
  --within-class-factor       : (Deprecated) If 1.0, do conventional LDA where the within-class variance will be unit in the projected space.  May be set to less than 1.0, which scales the features to have less variance, particularly for dimensions where between-class variance is small; this is a feature being experimented with for neural-net input. (float, default = 1)
  --write-full-matrix         : Write full LDA matrix to this location. (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-acc-mllt --rand-prune=4.0 exp/tri2b/2.mdl 'ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/1/utt2spk scp:data/mfcc/train/split8/1/cmvn.scp scp:data/mfcc/train/split8/1/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/tri2b/0.mat ark:- ark:- |' ark:- exp/tri2b/2.1.macc

[houwenbin@localhost online_demo]$  
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-acc-mllt -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-acc-mllt -h 

Accumulate MLLT (global STC) statistics
Usage:  gmm-acc-mllt [options] <model-in> <feature-rspecifier> <posteriors-rspecifier> <stats-out>
e.g.: 
 gmm-acc-mllt 1.mdl scp:train.scp ark:1.post 1.macc

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --rand-prune                : Randomized pruning parameter to speed up accumulation (larger -> more pruning.  May exceed one). (float, default = 0.25)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-transform-means exp/tri2b/2.mat.new exp/tri2b/2.mdl exp/tri2b/2.mdl

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-transform-means -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-transform-means -h 

Transform GMM means with linear or affine transform
Usage:  gmm-transform-means <transform-matrix> <model-in> <model-out>
e.g.: gmm-transform-means 2.mat 2.mdl 3.mdl

Options:
  --binary                    : Write output in binary mode (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-est-fmllr --fmllr-update-type=full --spk2utt=ark:data/mfcc/train/split8/8/spk2utt exp/tri3b/12.mdl 'ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/8/utt2spk scp:data/mfcc/train/split8/8/cmvn.scp scp:data/mfcc/train/split8/8/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/tri2b_ali/final.mat ark:- ark:- | transform-feats --utt2spk=ark:data/mfcc/train/split8/8/utt2spk ark:exp/tri3b/trans.8 ark:- ark:- |' ark:- ark:exp/tri3b/tmp_trans.8

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-est-fmllr -h      
/home/houwenbin/kaldi-master/src/gmmbin/gmm-est-fmllr -h 

Estimate global fMLLR transforms, either per utterance or for the supplied
set of speakers (spk2utt option).  Reads posteriors (on transition-ids).  Writes
to a table of matrices.
Usage: gmm-est-fmllr [options] <model-in> <feature-rspecifier> <post-rspecifier> <transform-wspecifier>

Options:
  --fmllr-min-count           : Minimum count required to update fMLLR (float, default = 500)
  --fmllr-num-iters           : Number of iterations in fMLLR update phase. (int, default = 40)
  --fmllr-update-type         : Update type for fMLLR ("full"|"diag"|"offset"|"none") (string, default = "full")
  --spk2utt                   : rspecifier for speaker to utterance-list map (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# compose-transforms --b-is-affine=true ark:exp/tri3b/tmp_trans.8 ark:exp/tri3b/trans.8 ark:exp/tri3b/composed_trans.8

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/featbin/compose-transforms -h
/home/houwenbin/kaldi-master/src/featbin/compose-transforms -h 

Compose (affine or linear) feature transforms
Usage: compose-transforms [options] (<transform-A-rspecifier>|<transform-A-rxfilename>) (<transform-B-rspecifier>|<transform-B-rxfilename>) (<transform-out-wspecifier>|<transform-out-wxfilename>)
 Note: it does matrix multiplication (A B) so B is the transform that gets applied
  to the features first.  If b-is-affine = true, then assume last column of b corresponds to offset
 e.g.: compose-transforms 1.mat 2.mat 3.mat
   compose-transforms 1.mat ark:2.trans ark:3.trans
   compose-transforms ark:1.trans ark:2.trans ark:3.trans
 See also: transform-feats, transform-vec, extend-transform-dim, est-lda, est-pca

Options:
  --b-is-affine               : If true, treat last column of transform b as an offset term (only relevant if a is affine) (bool, default = false)
  --binary                    : Write in binary mode (only relevant if output is a wxfilename) (bool, default = true)
  --utt2spk                   : rspecifier for utterance to speaker map (if mixing utterance and speaker ids) (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-acc-stats-twofeats exp/tri3b/35.mdl 'ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/8/utt2spk scp:data/mfcc/train/split8/8/cmvn.scp scp:data/mfcc/train/split8/8/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/tri2b_ali/final.mat ark:- ark:- | transform-feats --utt2spk=ark:data/mfcc/train/split8/8/utt2spk ark:exp/tri3b/trans.8 ark:- ark:- |' 'ark,s,cs:apply-cmvn  --utt2spk=ark:data/mfcc/train/split8/8/utt2spk scp:data/mfcc/train/split8/8/cmvn.scp scp:data/mfcc/train/split8/8/feats.scp ark:- | splice-feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats exp/tri2b_ali/final.mat ark:- ark:- |' ark,s,cs:- exp/tri3b/35.8.acc

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-acc-stats-twofeats -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-acc-stats-twofeats -h 

Accumulate stats for GMM training, computing posteriors with one set of features
but accumulating statistics with another.
First features are used to get posteriors, second to accumulate stats
Usage:  gmm-acc-stats-twofeats [options] <model-in> <feature1-rspecifier> <feature2-rspecifier> <posteriors-rspecifier> <stats-out>
e.g.: 
 gmm-acc-stats-twofeats 1.mdl 1.ali scp:train.scp scp:train_new.scp ark:1.ali 1.acc

Options:
  --binary                    : Write output in binary mode (bool, default = true)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# gmm-mixup --mix-down=20000 --mix-up=20000 exp/tri4b/tmp.mdl exp/tri4b/1.occs exp/tri4b/1.mdl

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-mixup -h        
/home/houwenbin/kaldi-master/src/gmmbin/gmm-mixup -h 

Does GMM mixing up (and Gaussian merging)
Usage:  gmm-mixup [options] <model-in> <state-occs-in> <model-out>
e.g. of mixing up:
 gmm-mixup --mix-up=4000 1.mdl 1.occs 2.mdl
e.g. of merging:
 gmm-mixup --merge=2000 1.mdl 1.occs 2.mdl

Options:
  --binary                    : Write output in binary mode (bool, default = true)
  --min-count                 : Minimum count enforced while mixing up. (float, default = 20)
  --mix-down                  : If nonzero, merge mixture components to this target. (int, default = 0)
  --mix-up                    : Increase number of mixture components to this overall target. (int, default = 0)
  --perturb-factor            : While mixing up, perturb means by standard deviation times this factor. (float, default = 0.01)
  --power                     : If mixing up, power to allocate Gaussians to states. (float, default = 0.2)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

#

#


#


三、解碼:
# nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain/tdnn/graph/words.txt exp/chain/tdnn/final.mdl exp/chain/tdnn/graph/HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=true --norm-vars=false --utt2spk=ark:data/fbank/test/split1/1/utt2spk scp:data/fbank/test/split1/1/cmvn.scp scp:data/fbank/test/split1/1/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain/tdnn/decode_test/lat.1.gz'

[houwenbin@localhost online_demo]$ ~/kaldi-master/src/gmmbin/gmm-latgen-faster -h
/home/houwenbin/kaldi-master/src/gmmbin/gmm-latgen-faster -h 

Generate lattices using GMM-based model.
Usage: gmm-latgen-faster [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ]

Options:
  --acoustic-scale            : Scaling factor for acoustic likelihoods (float, default = 0.1)
  --allow-partial             : If true, produce output even if end state was not reached. (bool, default = false)
  --beam                      : Decoding beam.  Larger->slower, more accurate. (float, default = 16)
  --beam-delta                : Increment used in decoding-- this parameter is obscure and relates to a speedup in the way the max-active constraint is applied.  Larger is more accurate. (float, default = 0.5)
  --delta                     : Tolerance used in determinization (float, default = 0.000976562)
  --determinize-lattice       : If true, determinize the lattice (lattice-determinization, keeping only best pdf-sequence for each word-sequence). (bool, default = true)
  --hash-ratio                : Setting used in decoder to control hash behavior (float, default = 2)
  --lattice-beam              : Lattice generation beam.  Larger->slower, and deeper lattices (float, default = 10)
  --max-active                : Decoder max active states.  Larger->slower; more accurate (int, default = 2147483647)
  --max-mem                   : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000)
  --min-active                : Decoder minimum #active states. (int, default = 200)
  --minimize                  : If true, push and minimize after determinization. (bool, default = false)
  --phone-determinize         : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true)
  --prune-interval            : Interval (in frames) at which to prune tokens (int, default = 25)
  --word-determinize          : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true)
  --word-symbol-table         : Symbol table for words [for debug output] (string, default = "")

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

# lattice-scale --acoustic-scale=10.0 ark:- ark:- 

[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/latbin/lattice-scale -h
/home/houwenbin/kaldi-master/src/latbin/lattice-scale -h 

Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats

Options:
  --acoustic-scale            : Scaling factor for acoustic likelihoods (float, default = 1)
  --acoustic2lm-scale         : Add this times original acoustic costs to LM costs (float, default = 0)
  --inv-acoustic-scale        : An alternative way of setting the acoustic scale: you can set its inverse. (float, default = 1)
  --lm-scale                  : Scaling factor for graph/lm costs (float, default = 1)
  --lm2acoustic-scale         : Add this times original LM costs to acoustic costs (float, default = 0)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

  
[houwenbin@localhost online_demo]$ 
[houwenbin@localhost online_demo]$ ~/kaldi-master/src/latbin/lattice-scale -h
/home/houwenbin/kaldi-master/src/latbin/lattice-scale -h 

Apply scaling to lattice weights
Usage: lattice-scale [options] lattice-rspecifier lattice-wspecifier
 e.g.: lattice-scale --lm-scale=0.0 ark:1.lats ark:scaled.lats

Options:
  --acoustic-scale            : Scaling factor for acoustic likelihoods (float, default = 1)
  --acoustic2lm-scale         : Add this times original acoustic costs to LM costs (float, default = 0)
  --inv-acoustic-scale        : An alternative way of setting the acoustic scale: you can set its inverse. (float, default = 1)
  --lm-scale                  : Scaling factor for graph/lm costs (float, default = 1)
  --lm2acoustic-scale         : Add this times original LM costs to acoustic costs (float, default = 0)

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

先這些,想起了再補充吧!!!

發佈了59 篇原創文章 · 獲贊 15 · 訪問量 15萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章