kaldi tutorial 中文翻譯

數據準備
這部分基本略過了,比較簡單。
從data/lang說起。 data/lang是由prepare_lang.sh 生成的。
首先生成的是 words.txt 和 phones.txt,這是openfst格式的symbol tables(後來我就喜歡直接寫英文了,與其費心思翻譯爲合適的中文,不如直接用原詞來得原汁原味),它們是字符串(字符串是指標註文本的單位,)到整數的映射,下面這個是日語識別中的部分words.txt:
0
a 1
b 2
ch 3
d 4
e 5
f 6
g 7
h 8
i 9
j 10
k 11
m 12
n 13
再看phones文件夾下以.csl爲後綴的文件,分別有disambig.csl nonsilence.csl optional_silence.csl silence.csl,它們的內容都是以冒號作爲分割。他們是後續命令行中偶爾會用到的選項。
看phones.txt,在data/lang/下。這個是 phone symbol table,也包含了標準FST腳本中的“消歧符號”,這些符號就是1,2,3,。。。。我們在此加了個0符號,代替了語言模型中的epsilon transitions。
L.fst文件是編譯後的FST格式的詞典(lexicon)。想看裏面有什麼信息,輸入:(from s5/)
fstprint –isymbols=data/lang/phones.txt –osymbols=data/lang/words.txt data/lang/L.fst | head

然後下面是對應的輸出:

0 1 0.693147182
0 1 sil 0.693147182
1 2 a a 0.693147182
1 1 a a 0.693147182
1 1 b b 0.693147182
1 2 b b 0.693147182
1 1 ch ch 0.693147182
1 2 ch ch 0.693147182
1 1 d d 0.693147182
1 2 d d 0.693147182

G.fst是個描述這種語言語法結構的FST,下面是這個G.fst的內容:

0 2 21 21 0.000315946876
0 1 29 0 8.03088665
1 29 25 25 12.1202555
1 28 6 6 8.34519005
1 27 15 15 6.83906889
1 26 28 28 5.35383272
1 25 19 19 3.59180236
1 24 16 16 5.82719803
1 23 26 26 4.43024254
1 22 17 17 4.63756752
我覺得這應該就是對應詞、或phone出現次數統計後算的概率。

提取特徵
這一段是提取訓練特徵,首先跑對應的命令,然後來看生成的文件。
看看 exp/make_mfcc/train/make_mfcc.1.log ,首先給出的跑過的命令行,(kaldi總是在log最上面顯示命令行,)
split_scp.pl就是把scp分成幾個小的scp(.ark 和.scp是kaldi中兩種記錄數據的格式, .ark是數據(二進制文件),scp是記錄對應ark的路徑)
.ark文件一般都是很大的(因爲他們裏面是真正的數據),可以通過下面這條命令來看:
copy-feats ark:mfcc/raw_mfcc_train.1.ark ark,t:- | head

以下是對應的輸出:

NF089001 [
53.54222 -31.82449 -9.899872 -0.02364012 -5.681367 2.072489 -19.41396 -15.6856 14.83652 25.04876 -11.34208 1.64803 9.309975
49.06616 -28.41237 1.188962 -0.5514585 -14.60496 -6.065259 -12.19813 -17.75549 -9.185356 4.032361 -9.320414 1.339788 12.23572
48.10678 -24.78042 8.86155 -4.958602 -4.843619 1.443337 -8.813286 0.4328361 -3.807028 0.8784758 9.743609 7.107668 9.02508
63.17915 -21.53388 -22.33113 5.595533 -12.11316 -4.990936 -14.4953 -10.58425 2.666025 -0.3021607 -11.49867 -1.502062 3.861568
70.48519 -19.16981 -25.84126 10.23085 -15.72831 -5.344745 -22.62867 -12.71542 0.8277165 -4.167449 -19.62204 -5.533485 2.644755
52.99891 -16.45959 0.7519462 -4.386663 3.804989 -1.37611 -24.83507 5.490471 -3.33739 -8.404724 -17.6997 -0.2677126 5.236793
54.01795 -19.39126 -3.082492 -1.624617 -8.421985 -11.15252 -18.0968 -11.92423 -6.684193 -11.88862 -8.570399 -3.803415 5.675081
55.33753 -18.44497 -9.369541 -7.717715 -8.041488 -11.45842 -19.81938 -12.43418 -1.97697 -4.627994 -7.774594 4.451687 7.557387
55.72844 -20.32559 -12.32121 -9.614379 -2.77022 -8.572324 -14.91047 -6.382179 -7.155323 -7.767553 -17.01464 1.11917 -2.572359

 同名的archive(.ark) 和 script(.scp) 文件代表的同一部分數據,注意,這些命令行都有前綴"scp:" 或 "ark:",kaldi不會自己判斷這到底是個script還是archive形式,這需要我們加前綴告訴kaldi這是什麼格式的文件。對於code而言,這兩種格式對它來說都是一樣的。
這兩種格式都是‘表(table)’的概念。一個‘表’就是一組有序的事物,前面是識別字符串(如句子的id),一個‘表’不是一個c++的對象,因爲對應不同的需求(寫入、迭代、隨機讀入)我們分別有c++對象來讀入數據。
.scp格式是text-only的格式,每行是個key(一般是句子的標識符(id))後接空格,接這個句子特徵數據的路徑 。
.ark格式可以是text或binary格式,(你可以寫爲text格式,命令行要加‘t’,binary是默認的格式)文件裏面數據的格式是:key(如句子的id)空格後接數據。

下面是關於script和archive的幾點說明:
用於說明如何讀表的字符串叫做“rspecifier”,如ark:gunzip -c my/dir/foo.ark.gz|
用於說明如何寫入表的字符串叫做“wspecifier”,如ark,t:foo.ark
archive可以合併成大的archive,仍然有效
code可以讀入這兩種格式通過順序讀入或隨機讀入。用戶級的code只知道它是在迭代還是查找,並不知道它接觸的數據格式是什麼(是script還是archive)
通過隨機讀入(random access)來讀取archive數據,內存使用效率較低,要想高效的隨機讀入archive,那麼生成時就生成對應的ark和scp文件,讀入時通過scp文件讀入。
這部分更多信息在官網(google輸入kaldi)Kaldi I/O mechanisms中。

訓練單音子系統(monophone)
輸入:
gmm-copy –binary=false exp/mono/0.mdl - | less

下面是對應的輸出:




2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

0 0 0 0.75 1 0.25
1 1 1 0.75 2 0.25
2 2 2 0.75 3 0.25
3



1

0 0 0 0.5 1 0.5
1 1 1 0.5 2 0.5
2 2 2 0.75 3 0.25
3


84
1 0 0
1 1 1
1 2 2
2 0 3
2 1 4
2 2 5
3 0 6
3 1 7
3 2 8
4 0 9
4 1 10
4 2 11
5 0 12
5 1 13
5 2 14
6 0 15
6 1 16
6 2 17
7 0 18
7 1 19
7 2 20
8 0 21
8 1 22
8 2 23
9 0 24
9 1 25
9 2 26
10 0 27
10 1 28
10 2 29
11 0 30
11 1 31
11 2 32
12 0 33
12 1 34
12 2 35
13 0 36
13 1 37
13 2 38
14 0 39
14 1 40
14 2 41
15 0 42
15 1 43
15 2 44
16 0 45
16 1 46
16 2 47
17 0 48
17 1 49
17 2 50
18 0 51
18 1 52
18 2 53
19 0 54
19 1 55
19 2 56
20 0 57
20 1 58
20 2 59
21 0 60
21 1 61
21 2 62
22 0 63
22 1 64
22 2 65
23 0 66
23 1 67
23 2 68
24 0 69
24 1 70
24 2 71
25 0 72
25 1 73
25 2 74
26 0 75
26 1 76
26 2 77
27 0 78
27 1 79
27 2 80
28 0 81
28 1 82
28 2 83


[ 0 -0.6931472 -0.6931472 -0.6931472 -0.6931472 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 -0.2876821 -1.386294 ]


39 84
[ -94.49178 ]
[ 1 ]
[
-0.005838452 -0.0107621 0.007483369 0.002269829 0.01010145 0.001220717 -0.002948278 0.004102771 -0.009732663 0.005548568 -0.00846673 0.003018271 0.002561719 -0.001072273 -0.0003676935 0.0009567018 0.0004904701 0.001004559 0.0006702438 0.002065411 0.001736847 -0.0004884294 -0.0001839283 0.000573744 -6.096664e-06 0.0008038587 0.000548786 0.0005939789 -0.001607142 -0.0008620437 0.0002163016 -0.0002253224 0.0009042169 0.0007718542 0.0001247094 -0.0003084296 -0.001637235 0.0004870822 0.002509772 ]
[
0.002302196 0.004163655 0.005391662 0.002574274 0.003217114 0.002863562 0.005842444 0.004294651 0.003149447 0.005013018 0.004209091 0.00450796 0.006656926 0.06917789 0.07472826 0.08657464 0.05778877 0.05123304 0.05053843 0.08058306 0.07275081 0.0605317 0.07780149 0.07870268 0.08504011 0.1153897 0.4811986 0.4609138 0.5185907 0.3938675 0.3216299 0.3013106 0.4305585 0.4167711 0.3496516 0.4075994 0.4405422 0.4886224 0.6218032 ]


[ -94.49178 ]
[ 1 ]
[
-0.005838452 -0.0107621 0.007483369 0.002269829 0.01010145 0.001220717 -0.002948278 0.004102771 -0.009732663 0.005548568 -0.00846673 0.003018271 0.002561719 -0.001072273 -0.0003676935 0.0009567018 0.0004904701 0.001004559 0.0006702438 0.002065411 0.001736847 -0.0004884294 -0.0001839283 0.000573744 -6.096664e-06 0.0008038587 0.000548786 0.0005939789 -0.001607142 -0.0008620437 0.0002163016 -0.0002253224 0.0009042169 0.0007718542 0.0001247094 -0.0003084296 -0.001637235 0.0004870822 0.002509772 ]
[
0.002302196 0.004163655 0.005391662 0.002574274 0.003217114 0.002863562 0.005842444 0.004294651 0.003149447 0.005013018 0.004209091 0.00450796 0.006656926 0.06917789 0.07472826 0.08657464 0.05778877 0.05123304 0.05053843 0.08058306 0.07275081 0.0605317 0.07780149 0.07870268 0.08504011 0.1153897 0.4811986 0.4609138 0.5185907 0.3938675 0.3216299 0.3013106 0.4305585 0.4167711 0.3496516 0.4075994 0.4405422 0.4886224 0.6218032 ]


[ -94.49178 ]
[ 1 ]

首先給出的topo的信息,有一個phone和其他phone的topology不同,通過對比phones.txt,可知這個不同的phone是sil(代表silence),topo文件的慣例是第一個狀態是初始狀態(概率爲1),最後一個狀態是結束狀態(概率爲1)在這裏面,明顯,-1是初始狀態,0,1,2是HMM中間的轉移狀態,3是結束狀態。

.mdl文件的慣例是包含兩部分信息:一部分的類型是Transition Model(轉換模型),包含拓撲信息(topo),作爲HMMtopology的一個成員變量,另一部分是相關模型類型(叫AmGmm),這種類型的文件不是‘表’,寫入是binary or text,取決於命令行選項 –binary=true or –binary=false,‘表’就是指script和archive。
看上面個數據,會發現,0.mdl是初始化模型,所以參數都是初始化的,這個模型訓練40次,所以40.mdl中的概率是最終的參數。
以上信息更多見官網 HMM topology and transition modeling。

再提重要的一點:在kaldi中p.d.f.’s使用數字標識符表示的,從0開始(這些數字我們叫做pdf-ids),在HTK中他們沒有名字。.mdl文件沒有足夠的信息能在context-dependent phones 和 pdf-ids間建立映射,爲看這個,看tree文件,輸入:
copy-tree –binary=false exp/mono/tree - | less

以下是輸出:
ContextDependency 1 0 ToPdf TE 0 29 ( NULL TE -1 3 ( CE 0 CE 1 CE 2 )
TE -1 3 ( CE 3 CE 4 CE 5 )
TE -1 3 ( CE 6 CE 7 CE 8 )
TE -1 3 ( CE 9 CE 10 CE 11 )
TE -1 3 ( CE 12 CE 13 CE 14 )
TE -1 3 ( CE 15 CE 16 CE 17 )
TE -1 3 ( CE 18 CE 19 CE 20 )
TE -1 3 ( CE 21 CE 22 CE 23 )
TE -1 3 ( CE 24 CE 25 CE 26 )
TE -1 3 ( CE 27 CE 28 CE 29 )
TE -1 3 ( CE 30 CE 31 CE 32 )
TE -1 3 ( CE 33 CE 34 CE 35 )
TE -1 3 ( CE 36 CE 37 CE 38 )
TE -1 3 ( CE 39 CE 40 CE 41 )
TE -1 3 ( CE 42 CE 43 CE 44 )
TE -1 3 ( CE 45 CE 46 CE 47 )
TE -1 3 ( CE 48 CE 49 CE 50 )
TE -1 3 ( CE 51 CE 52 CE 53 )
TE -1 3 ( CE 54 CE 55 CE 56 )
TE -1 3 ( CE 57 CE 58 CE 59 )
TE -1 3 ( CE 60 CE 61 CE 62 )
TE -1 3 ( CE 63 CE 64 CE 65 )
TE -1 3 ( CE 66 CE 67 CE 68 )
TE -1 3 ( CE 69 CE 70 CE 71 )
TE -1 3 ( CE 72 CE 73 CE 74 )
TE -1 3 ( CE 75 CE 76 CE 77 )
TE -1 3 ( CE 78 CE 79 CE 80 )
TE -1 3 ( CE 81 CE 82 CE 83 )
)
EndContextDependency

這是個monophone的tree,所以非常trivial,因爲它沒有任何分支,CE是constant eventmap(代表樹的葉子們),TE指table eventmap,(代表查詢表之類的東東),這裏沒有SE,指split eventmap(代表樹的分支)因爲這是個monophone。“TE 0 29”是一個table eventmap從key 0開始分裂的開始(key 0指在長度爲1(因爲是monophone)的phone-context向量的第0個音子位置)。接着,在括號內,有29個event map。第一個是NULL,代表一個指向eventmap的0指針,因爲phone-id0是爲‘epsilon’保留的。“TE -1 3 ( CE 75 CE 76 CE 77 ) ”這個字符串代表一個table eventmap從key-1開始分裂,這個key代表在topo文件中說過的pdfclass,在這裏就是HMM狀態。這個phone有3個狀態,所以分配給這個key的值可以取0,1,2。在括號內是三個constant eventmap,每個代表樹的一個葉子。
現在看exp/mono/ali.1.gz,輸入:
copy-int-vector “ark:gunzip -c exp/mono/ali.1.gz|” ark,t:- | head -n 2

以下是對應的輸出:
NF089001 2 1 1 1 4 3 3 6 5 5 2 1 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 5 5 5 5 5 62 61 61 61 64 63 63 63 66 65 65 56 58 57 57 60 14 13 13 16 15 18 17 17 17 140 142 141 141 141 141 141 141 144 80 79 82 84 80 79 79 79 79 82 81 81 81 84 83 86 85 85 88 87 90 89 89 89 89 56 55 55 55 55 55 55 58 57 57 57 60 59 68 67 67 70 72 32 31 34 33 33 36 35 35 35 35 35 80 79 79 79 79 79 79 79 79 79 79 79 82 84 50 49 52 51 51 54 53 53 53 8 7 7 7 7 7 10 9 12 11 11 11 11 2 4 3 3 3 6 5 5 5 56 55 55 55 55 55 55 55 55 55 58 60 32 34 36 35 35 35 35 35 35 35 35 80 79 82 81 81 84 83 8 7 7 7 7 7 7 7 10 9 9 9 9 12 11 56 55 55 55 58 57 57 57 60 59 59 2 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6
NF089002 2 4 3 6 5 5 5 5 5 5 2 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 5 5 5 5 62 61 61 61 61 61 61 64 63 66 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 65 140 142 144 140 142 144 143 44 43 46 45 48 47 86 85 88 87 87 90 89 89 89 89 80 82 81 84 83 56 55 55 55 58 57 57 60 59 59 59 59 20 19 19 19 22 21 24 23 23 23 56 55 55 55 55 55 55 55 55 55 58 57 57 60 59 59 59 122 121 121 121 121 121 121 121 121 121 124 123 123 126 125 56 58 60 59 20 19 19 19 19 22 21 21 24 23 23 56 55 55 55 55 55 58 60 62 61 61 61 61 61 61 64 66 65 65 65 65 56 55 55 55 55 55 58 57 57 57 57 60 59 59 2 1 1 1 1 1 1 4 3 3 3 3 6 5 128 130 132 131 131 131 86 85 85 85 88 87 87 87 87 87 87 87 87 87 90 89 140 142 144 68 67 67 67 67 67 67 70 72 158 157 160 162 161 161 86 85 88 87 87 87 87 87 87 87 87 87 87 90 140 142 144 44 43 43 43 46 48 47 32 34 33 36 35 35 35 35 35 35 35 35 56 55 55 58 60 59 62 61 61 61 61 61 64 66 65 140 142 144 134 133 133 133 136 135 135 135 135 138 137 137 137 140 142 144 143 143 143 143 143 143 143 143 143 2 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6
這是對訓練數據做的維特比alignement(對齊)。每句話是一行,再看看上面 exp/mono/tree文件,p.d.f. id 數值最大的是83,而這裏的數值遠大於這個,是因爲,alignment文件用的不是
p.d.f. id,這裏用的是更細分的標識符(identifier),稱作“transition-id”,這些id將phone和它們在拓撲原型結構中的轉移概率也編碼進來了。若想知道“transition-id”是什麼,輸入:
show-transitions data/lang/phones.txt exp/mono/0.mdl

以下是對應的輸出:
Transition-state 1: phone = sil hmm-state = 0 pdf = 0
Transition-id = 1 p = 0.5 [self-loop]
Transition-id = 2 p = 0.5 [0 -> 1]
Transition-state 2: phone = sil hmm-state = 1 pdf = 1
Transition-id = 3 p = 0.5 [self-loop]
Transition-id = 4 p = 0.5 [1 -> 2]
Transition-state 3: phone = sil hmm-state = 2 pdf = 2
Transition-id = 5 p = 0.75 [self-loop]
Transition-id = 6 p = 0.25 [2 -> 3]
Transition-state 4: phone = a hmm-state = 0 pdf = 3
Transition-id = 7 p = 0.75 [self-loop]
Transition-id = 8 p = 0.25 [0 -> 1]
Transition-state 5: phone = a hmm-state = 1 pdf = 4
Transition-id = 9 p = 0.75 [self-loop]
Transition-id = 10 p = 0.25 [1 -> 2]
Transition-state 6: phone = a hmm-state = 2 pdf = 5
Transition-id = 11 p = 0.75 [self-loop]
Transition-id = 12 p = 0.25 [2 -> 3]
Transition-state 7: phone = b hmm-state = 0 pdf = 6
Transition-id = 13 p = 0.75 [self-loop]
Transition-id = 14 p = 0.25 [0 -> 1]
Transition-state 8: phone = b hmm-state = 1 pdf = 7
Transition-id = 15 p = 0.75 [self-loop]
Transition-id = 16 p = 0.25 [1 -> 2]
Transition-state 9: phone = b hmm-state = 2 pdf = 8
Transition-id = 17 p = 0.75 [self-loop]
Transition-id = 18 p = 0.25 [2 -> 3]
Transition-state 10: phone = ch hmm-state = 0 pdf = 9
Transition-id = 19 p = 0.75 [self-loop]
Transition-id = 20 p = 0.25 [0 -> 1]
Transition-state 11: phone = ch hmm-state = 1 pdf = 10
Transition-id = 21 p = 0.75 [self-loop]
Transition-id = 22 p = 0.25 [1 -> 2]
Transition-state 12: phone = ch hmm-state = 2 pdf = 11
Transition-id = 23 p = 0.75 [self-loop]
Transition-id = 24 p = 0.25 [2 -> 3]

…..

Transition-state 82: phone = z hmm-state = 0 pdf = 81
Transition-id = 163 p = 0.75 [self-loop]
Transition-id = 164 p = 0.25 [0 -> 1]
Transition-state 83: phone = z hmm-state = 1 pdf = 82
Transition-id = 165 p = 0.75 [self-loop]
Transition-id = 166 p = 0.25 [1 -> 2]
Transition-state 84: phone = z hmm-state = 2 pdf = 83
Transition-id = 167 p = 0.75 [self-loop]
Transition-id = 168 p = 0.25 [2 -> 3]

顯然,上面這個是訓練前的初始狀態
爲了增加可讀性,輸入:
show-alignments data/lang/phones.txt exp/mono/40.mdl exp/mono/40.occs | less
(.occs文件是指occupation counts)
以下是對應的輸出:
Transition-state 1: phone = sil hmm-state = 0 pdf = 0
Transition-id = 1 p = 0.934807 count of pdf = 1.13866e+06 [self-loop]
Transition-id = 2 p = 0.0651934 count of pdf = 1.13866e+06 [0 -> 1]
Transition-state 2: phone = sil hmm-state = 1 pdf = 1
Transition-id = 3 p = 0.889584 count of pdf = 672302 [self-loop]
Transition-id = 4 p = 0.110416 count of pdf = 672302 [1 -> 2]
Transition-state 3: phone = sil hmm-state = 2 pdf = 2
Transition-id = 5 p = 0.7137 count of pdf = 259284 [self-loop]
Transition-id = 6 p = 0.2863 count of pdf = 259284 [2 -> 3]
Transition-state 4: phone = a hmm-state = 0 pdf = 3
Transition-id = 7 p = 0.713307 count of pdf = 390711 [self-loop]
Transition-id = 8 p = 0.286693 count of pdf = 390711 [0 -> 1]
Transition-state 5: phone = a hmm-state = 1 pdf = 4
Transition-id = 9 p = 0.594051 count of pdf = 275931 [self-loop]
Transition-id = 10 p = 0.405949 count of pdf = 275931 [1 -> 2]
Transition-state 6: phone = a hmm-state = 2 pdf = 5
Transition-id = 11 p = 0.594987 count of pdf = 276569 [self-loop]
Transition-id = 12 p = 0.405013 count of pdf = 276569 [2 -> 3]
Transition-state 7: phone = b hmm-state = 0 pdf = 6
Transition-id = 13 p = 0.590539 count of pdf = 19660 [self-loop]
Transition-id = 14 p = 0.409461 count of pdf = 19660 [0 -> 1]
Transition-state 8: phone = b hmm-state = 1 pdf = 7
Transition-id = 15 p = 0.417553 count of pdf = 13821 [self-loop]

這個用的是40.mdl,得到的概率都是模型最後用的轉移概率
要想了解更多關於HMM拓撲結構,轉移標識符(transition-ids),轉移模型之類的,看官網 HMM topology and transition modeling這部分。

接下來看看訓練是如何進行的,輸入:

grep Overall exp/mono/log/acc.{?.?,?.??,??.?,??.??}.log

以下是輸出的最後一部分:
exp/mono/log/acc.35.10.log:LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:115)
exp/mono/log/acc.37.12.log:LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:115) Overall avg like per frame (Gaussian only) = -99.1242 over 595815 frames.
exp/mono/log/acc.38.10.log:LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:115) Overall avg like per frame (Gaussian only) = -99.0045 over 666753 frames.
exp/mono/log/acc.38.11.log:LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:115) Overall avg like per frame (Gaussian only) = -95.769 over 793715 frames.
exp/mono/log/acc.38.12.log:LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:115) Overall avg like per frame (Gaussian only) = -99.0953 over 595815 frames.
exp/mono/log/acc.39.10.log:LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:115) Overall avg like per frame (Gaussian only) = -98.9901 over 666753 frames.
exp/mono/log/acc.39.11.log:LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:115) Overall avg like per frame (Gaussian only) = -95.7472 over 793715 frames.
exp/mono/log/acc.39.12.log:LOG (gmm-acc-stats-ali:main():gmm-acc-stats-ali.cc:115) Overall avg like per frame (Gaussian only) = -99.0786 over 595815 frames.

你可以看到每次迭代的聲學似然概率。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章