假設讀者已經瞭解 LDA 的來龍去脈。
需要明確採樣的含義:
隨機變量是總體,採樣就是按照總體的概率分佈(指示了樣本出現的概率)抽取樣本的過程。樣本應該能正確反映總體的情況,即樣本與總體應該具有相同的統計性質(均值,方差等)。
一、《LDA數學八卦》中的推導
語料庫中的第 i i i 個詞對應的主題我們記爲 z i z_i z i ,其中 i = ( m , n ) i=(m,n) i = ( m , n ) 是一個二維下標,即語料庫中第 i i i 個詞對應於第 m m m 篇文檔的第 n n n 詞,我們用 ¬ i \neg i ¬ i 表示去除下標爲 i i i 的詞。
那麼按照 Gibbs Sampling 的要求,我們要求得任一座標軸 i i i 對應的條件分佈 p ( z i = k ∣ Z ⃗ ¬ i , W ⃗ ) p(z_i=k|\vec{Z}_{\neg i},\vec{W}) p ( z i = k ∣ Z ¬ i , W ) ,注意這是一個離散型概率分佈的抽樣。由貝葉斯法則可得(條件概率正比於聯合概率):
p ( z i = k ∣ Z ⃗ ¬ i , W ⃗ ) ∝ p ( z i = k , w i = t ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) p(z_i=k|\vec{Z}_{\neg i},\vec{W})\propto p(z_i=k,w_i=t|\vec{Z}_{\neg i},\vec{W}_{\neg i}) p ( z i = k ∣ Z ¬ i , W ) ∝ p ( z i = k , w i = t ∣ Z ¬ i , W ¬ i )
由於z i = k , w i = t z_i=k,w_i=t z i = k , w i = t 只涉及到第 m m m 篇文檔和第 k k k 個主題,所以只會涉及到如下兩個 Dirichlet-Multinomial 共軛結構
α ⃗ → θ ⃗ m → z ⃗ m \vec{\alpha}\rightarrow\vec{\theta}_m\rightarrow\vec{z}_m α → θ m → z m
β ⃗ → φ ⃗ k → w ⃗ k \vec{\beta}\rightarrow\vec{\varphi}_k\rightarrow\vec{w}_k β → φ k → w k
去掉了語料中第 i i i 個詞對應的 ( z i , w i ) (z_i,w_i) ( z i , w i ) ,並不會改變上面兩個共軛結構,只是會減少對應的計數。所以 θ ⃗ m , φ ⃗ k \vec{\theta}_m,\vec{\varphi}_k θ m , φ k 的後驗分佈都還是狄利克雷分佈:
p ( θ ⃗ m ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) = D i r ( θ ⃗ m ∣ n ⃗ m , ¬ i + α ⃗ ) p ( φ ⃗ k ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) = D i r ( φ ⃗ k ∣ n ⃗ k , ¬ i + β ⃗ ) p(\vec{\theta}_m|\vec{Z}_{\neg i},\vec{W}_{\neg i})=Dir(\vec{\theta}_m|\vec{n}_{m,\neg i}+\vec{\alpha})\\p(\vec{\varphi}_k|\vec{Z}_{\neg i},\vec{W}_{\neg i})=Dir(\vec{\varphi}_k|\vec{n}_{k,\neg i}+\vec{\beta}) p ( θ m ∣ Z ¬ i , W ¬ i ) = D i r ( θ m ∣ n m , ¬ i + α ) p ( φ k ∣ Z ¬ i , W ¬ i ) = D i r ( φ k ∣ n k , ¬ i + β )
注意,n ⃗ m , ¬ i , n ⃗ k , ¬ i \vec{n}_{m,\neg i},\vec{n}_{k,\neg i} n m , ¬ i , n k , ¬ i 都不是雙下標,n ⃗ m = ( n m ( 1 ) , ⋯  , n m ( K ) ) \vec{n}_{m}=(n_m^{(1)},\cdots,n_m^{(K)}) n m = ( n m ( 1 ) , ⋯ , n m ( K ) ) ,n m ( k ) n_m^{(k)} n m ( k ) 表示第 m 篇文檔中第 k 個主題的詞的個數,n ⃗ k = ( n k ( 1 ) , ⋯  , n k ( V ) ) \vec{n}_k=(n_k^{(1)},\cdots,n_k^{(V)}) n k = ( n k ( 1 ) , ⋯ , n k ( V ) ) ,n k ( t ) n_k^{(t)} n k ( t ) 表示第 k 個主題中第 t(詞典下標) 個詞的個數。¬ i \neg i ¬ i 表示去除下標爲 i i i (語料下標) 的詞,所以在統計生成 n ⃗ m , n ⃗ k \vec{n}_{m},\vec{n}_{k} n m , n k 這兩個向量時,我們可以當做第 m 篇文檔中就沒有這個詞,也就不統計該詞相關的計數,n ⃗ m , ¬ i , n ⃗ k , ¬ i \vec{n}_{m,\neg i},\vec{n}_{k,\neg i} n m , ¬ i , n k , ¬ i 表示的就是這樣一種含義。
Gibbs Sampling 公式的推導:
p ( z i = k ∣ Z ⃗ ¬ i , W ⃗ ) ∝ p ( z i = k , w i = t ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) = ∫ p ( z i = k , w i = t , θ ⃗ m , φ ⃗ k ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) d θ ⃗ m d φ ⃗ k = ∫ p ( z i = k , θ ⃗ m ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) ⋅ p ( w i = t , φ ⃗ k ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) d θ ⃗ m d φ ⃗ k = ∫ p ( z i = k ∣ θ ⃗ m ) p ( θ ⃗ m ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) ⋅ p ( w i = t ∣ φ ⃗ k ) p ( φ ⃗ k ∣ Z ⃗ ¬ i , W ⃗ ¬ i ) d θ ⃗ m d φ ⃗ k = ∫ p ( z i = k ∣ θ ⃗ m ) D i r ( θ ⃗ m ∣ n ⃗ m , ¬ i + α ⃗ ) d θ ⃗ m ⋅ ∫ p ( w i = t ∣ φ ⃗ k ) D i r ( φ ⃗ k ∣ n ⃗ k , ¬ i + β ⃗ ) d φ ⃗ k = ∫ θ m k D i r ( θ ⃗ m ∣ n ⃗ m , ¬ i + α ⃗ ) d θ ⃗ m ⋅ ∫ φ k t D i r ( φ ⃗ k ∣ n ⃗ k , ¬ i + β ⃗ ) d φ ⃗ k = E ( θ m k ) ⋅ E ( φ k t ) = θ ^ m k ⋅ φ ^ k t
\begin{aligned}
p(z_i=k|\vec{Z}_{\neg i},\vec{W})\propto & p(z_i=k,w_i=t|\vec{Z}_{\neg i},\vec{W}_{\neg i})\\
=&\int p(z_i=k,w_i=t,\vec{\theta}_m,\vec{\varphi}_k|\vec{Z}_{\neg i},\vec{W}_{\neg i})\mathrm d\vec{\theta}_m\mathrm d\vec{\varphi}_k\\
=&\int p(z_i=k,\vec{\theta}_m|\vec{Z}_{\neg i},\vec{W}_{\neg i})\cdot p(w_i=t,\vec{\varphi}_k|\vec{Z}_{\neg i},\vec{W}_{\neg i})\mathrm d\vec{\theta}_m\mathrm d\vec{\varphi}_k\\
=&\int p(z_i=k|\vec{\theta}_m)p(\vec{\theta}_m|\vec{Z}_{\neg i},\vec{W}_{\neg i})\cdot p(w_i=t|\vec{\varphi}_k)p(\vec{\varphi}_k|\vec{Z}_{\neg i},\vec{W}_{\neg i})\mathrm d\vec{\theta}_m\mathrm d\vec{\varphi}_k\\
=&\int p(z_i=k|\vec{\theta}_m)Dir(\vec{\theta}_m|\vec{n}_{m,\neg i}+\vec{\alpha})\mathrm d\vec{\theta}_m\cdot\int p(w_i=t|\vec{\varphi}_k)Dir(\vec{\varphi}_k|\vec{n}_{k,\neg i}+\vec{\beta})\mathrm d\vec{\varphi}_k\\
=&\int \theta_{mk} Dir(\vec{\theta}_m|\vec{n}_{m,\neg i}+\vec{\alpha})\mathrm d\vec{\theta}_m\cdot\int \varphi_{kt} Dir(\vec{\varphi}_k|\vec{n}_{k,\neg i}+\vec{\beta})\mathrm d\vec{\varphi}_k\\
=&E(\theta_{mk})\cdot E(\varphi_{kt})\\
=&\hat{\theta}_{mk}\cdot\hat{\varphi}_{kt}
\end{aligned}
p ( z i = k ∣ Z ¬ i , W ) ∝ = = = = = = = p ( z i = k , w i = t ∣ Z ¬ i , W ¬ i ) ∫ p ( z i = k , w i = t , θ m , φ k ∣ Z ¬ i , W ¬ i ) d θ m d φ k ∫ p ( z i = k , θ m ∣ Z ¬ i , W ¬ i ) ⋅ p ( w i = t , φ k ∣ Z ¬ i , W ¬ i ) d θ m d φ k ∫ p ( z i = k ∣ θ m ) p ( θ m ∣ Z ¬ i , W ¬ i ) ⋅ p ( w i = t ∣ φ k ) p ( φ k ∣ Z ¬ i , W ¬ i ) d θ m d φ k ∫ p ( z i = k ∣ θ m ) D i r ( θ m ∣ n m , ¬ i + α ) d θ m ⋅ ∫ p ( w i = t ∣ φ k ) D i r ( φ k ∣ n k , ¬ i + β ) d φ k ∫ θ m k D i r ( θ m ∣ n m , ¬ i + α ) d θ m ⋅ ∫ φ k t D i r ( φ k ∣ n k , ¬ i + β ) d φ k E ( θ m k ) ⋅ E ( φ k t ) θ ^ m k ⋅ φ ^ k t
根據狄利克雷分佈的期望公式有:
θ ^ m k = n m , ¬ i ( k ) + α k ∑ k = 1 K n m , ¬ i ( k ) + α k φ ^ k t = n k , ¬ i ( t ) + β t ∑ t = 1 V n k , ¬ i ( t ) + β t \hat{\theta}_{mk}=\frac{n_{m,\neg i}^{(k)}+\alpha_k}{\sum_{k=1}^Kn_{m,\neg i}^{(k)}+\alpha_k}\\\hat{\varphi}_{kt}=\frac{n_{k,\neg i}^{(t)}+\beta_t}{\sum_{t=1}^Vn_{k,\neg i}^{(t)}+\beta_t} θ ^ m k = ∑ k = 1 K n m , ¬ i ( k ) + α k n m , ¬ i ( k ) + α k φ ^ k t = ∑ t = 1 V n k , ¬ i ( t ) + β t n k , ¬ i ( t ) + β t
所以 LDA 模型的採樣公式爲:
p ( z i = k ∣ Z ⃗ ¬ i , W ⃗ ) ∝ n m , ¬ i ( k ) + α k ∑ k = 1 K n m , ¬ i ( k ) + α k ⋅ n k , ¬ i ( t ) + β t ∑ t = 1 V n k , ¬ i ( t ) + β t p(z_i=k|\vec{Z}_{\neg i},\vec{W})\propto\frac{n_{m,\neg i}^{(k)}+\alpha_k}{\sum_{k=1}^Kn_{m,\neg i}^{(k)}+\alpha_k}\cdot\frac{n_{k,\neg i}^{(t)}+\beta_t}{\sum_{t=1}^Vn_{k,\neg i}^{(t)}+\beta_t} p ( z i = k ∣ Z ¬ i , W ) ∝ ∑ k = 1 K n m , ¬ i ( k ) + α k n m , ¬ i ( k ) + α k ⋅ ∑ t = 1 V n k , ¬ i ( t ) + β t n k , ¬ i ( t ) + β t
二、《Parameter estimation for text analysis》中的推導
第一類狄利克雷積分,後面直接當作結論使用,不做證明:
Δ ( α ⃗ ) = ∫ ∑ x i = 1 ∏ i = 1 N x i α i − 1 d x ⃗ \Delta(\vec{\alpha})=\int_{\sum x_i=1}\prod_{i=1}^Nx_i^{\alpha_i-1}\mathrm d\vec{x} Δ ( α ) = ∫ ∑ x i = 1 i = 1 ∏ N x i α i − 1 d x
考慮一個有狄利克雷先驗的一元模型(文檔的生成僅依賴於一個詞分佈),此時只有一個 Dirichlet-Multinomial 共軛結構,有:
p ( W ∣ α ⃗ ) = ∫ p ( W ∣ p ⃗ ) ⋅ p ( p ⃗ ∣ α ⃗ ) d p ⃗ p(W|\vec{\alpha})=\int{p(W|\vec{p})\cdot p(\vec{p}|\vec{\alpha})\mathrm d\vec{p}} p ( W ∣ α ) = ∫ p ( W ∣ p ) ⋅ p ( p ∣ α ) d p
對比我們在概率論中學過的全概率公式,可以把這個式子理解爲連續型變量的全概率公式。
具體的:
p ( W ∣ α ⃗ ) = ∫ ∏ n = 1 N M u l t ( w = w n ∣ p ⃗ , 1 ) ⋅ D i r ( p ⃗ ∣ α ⃗ ) d p ⃗ = ∫ ∏ v = 1 V p v n ( v ) ⋅ 1 Δ ( α ⃗ ) ∏ v = 1 V p v α v − 1 d p ⃗ = 1 Δ ( α ⃗ ) ∫ ∏ v = 1 V p v n ( v ) + α v − 1 d p ⃗ ∣ D i r i c h l e t ∫ = Δ ( n ⃗ + α ⃗ ) Δ ( α ⃗ ) , n ⃗ = { n ( v ) } v = 1 V
\begin{aligned}
p(W|\vec{\alpha})=&\int{\prod_{n=1}^N\mathrm{Mult}(w=w_n|\vec{p},1)\cdot \mathrm{Dir}(\vec{p}|\vec{\alpha})\mathrm d\vec{p}}\\
=&\int{\prod_{v=1}^Vp_v^{n^{(v)}}\cdot \frac{1}{\Delta(\vec\alpha)}\prod_{v=1}^Vp_v^{\alpha_v-1}\mathrm d\vec{p}}\\
=&\frac{1}{\Delta(\vec\alpha)}\int{\prod_{v=1}^Vp_v^{n^{(v)}+\alpha_v-1}\mathrm d\vec{p}} & | \mathrm{Dirichlet}\int\\
=&\frac{\Delta(\vec{n}+\vec\alpha)}{\Delta(\vec\alpha)},\vec{n}=\{n^{(v)}\}_{v=1}^V
\end{aligned}
p ( W ∣ α ) = = = = ∫ n = 1 ∏ N M u l t ( w = w n ∣ p , 1 ) ⋅ D i r ( p ∣ α ) d p ∫ v = 1 ∏ V p v n ( v ) ⋅ Δ ( α ) 1 v = 1 ∏ V p v α v − 1 d p Δ ( α ) 1 ∫ v = 1 ∏ V p v n ( v ) + α v − 1 d p Δ ( α ) Δ ( n + α ) , n = { n ( v ) } v = 1 V ∣ D i r i c h l e t ∫
其中,M u l t ( w = w n ∣ p ⃗ , 1 ) \mathrm{Mult}(w=w_n|\vec{p},1) M u l t ( w = w n ∣ p , 1 ) 表示多項式分佈 N 次實驗中一次實驗的觀測結果,最後一步的推導使用了第一類狄利克雷積分。
上面這個推導的意義是,消去了多項式分佈的參數 p ⃗ \vec{p} p 的影響,因爲它是未知的,僅使用詞計數和狄利克雷超參數(僞計數)來表達觀察到的語料的概率。
那麼對於 LDA 模型來說,就是 K+M 個 Dirichlet-Multinomial 共軛結構。
首先對於 K 個主題的 Dirichlet-Multinomial 共軛結構有:
p ( w ⃗ ∣ z ⃗ , β ⃗ ) = ∫ p ( w ⃗ ∣ z ⃗ , Φ ) p ( Φ ∣ β ⃗ ) d Φ = ∏ k = 1 K ∫ 1 Δ ( β ⃗ ) ∏ t = 1 V φ k , t n k ( t ) + β t − 1 d φ ⃗ k = ∏ k = 1 K Δ ( n ⃗ k + β ⃗ ) Δ ( β ⃗ ) , n ⃗ k = { n k ( t ) } t = 1 V
\begin{aligned}
p(\vec w|\vec z,\vec \beta)=&\int p(\vec w|\vec z,\Phi)p(\Phi|\vec \beta)\mathrm d\Phi\\
=&\prod_{k=1}^K\int\frac{1}{\Delta(\vec\beta)}\prod_{t=1}^V\varphi_{k,t}^{n_k^{(t)}+\beta_t-1}\mathrm d\vec\varphi_k\\
=&\prod_{k=1}^K\frac{\Delta(\vec n_k+\vec\beta)}{\Delta(\vec\beta)},\vec n_k=\{n_k^{(t)}\}_{t=1}^V
\end{aligned}
p ( w ∣ z , β ) = = = ∫ p ( w ∣ z , Φ ) p ( Φ ∣ β ) d Φ k = 1 ∏ K ∫ Δ ( β ) 1 t = 1 ∏ V φ k , t n k ( t ) + β t − 1 d φ k k = 1 ∏ K Δ ( β ) Δ ( n k + β ) , n k = { n k ( t ) } t = 1 V
其次對於 M 個文檔的 Dirichlet-Multinomial 共軛結構有:
p ( z ⃗ ∣ α ⃗ ) = ∫ p ( z ⃗ ∣ Θ ) p ( Θ ∣ α ⃗ ) d Θ = ∏ m = 1 M ∫ 1 Δ ( α ⃗ ) ∏ k = 1 K θ m , k n m ( k ) + α k − 1 d θ ⃗ m = ∏ m = 1 M Δ ( n ⃗ m + α ⃗ ) Δ ( α ⃗ ) , n ⃗ m = { n m ( k ) } k = 1 K
\begin{aligned}
p(\vec z|\vec \alpha)=&\int p(\vec z|\Theta)p(\Theta|\vec \alpha)\mathrm d\Theta\\
=&\prod_{m=1}^M\int\frac{1}{\Delta(\vec\alpha)}\prod_{k=1}^K\theta_{m,k}^{n_m^{(k)}+\alpha_k-1}\mathrm d\vec\theta_m\\
=&\prod_{m=1}^M\frac{\Delta(\vec n_m+\vec\alpha)}{\Delta(\vec\alpha)},\vec n_m=\{n_m^{(k)}\}_{k=1}^K
\end{aligned}
p ( z ∣ α ) = = = ∫ p ( z ∣ Θ ) p ( Θ ∣ α ) d Θ m = 1 ∏ M ∫ Δ ( α ) 1 k = 1 ∏ K θ m , k n m ( k ) + α k − 1 d θ m m = 1 ∏ M Δ ( α ) Δ ( n m + α ) , n m = { n m ( k ) } k = 1 K
所以模型的聯合分佈爲:
p ( z ⃗ , w ⃗ ∣ α ⃗ , β ⃗ ) = ∏ k = 1 K Δ ( n ⃗ k + β ⃗ ) Δ ( β ⃗ ) ⋅ ∏ m = 1 M Δ ( n ⃗ m + α ⃗ ) Δ ( α ⃗ ) p(\vec z,\vec w|\vec\alpha,\vec\beta)=\prod_{k=1}^K\frac{\Delta(\vec n_k+\vec\beta)}{\Delta(\vec\beta)}\cdot\prod_{m=1}^M\frac{\Delta(\vec n_m+\vec\alpha)}{\Delta(\vec\alpha)} p ( z , w ∣ α , β ) = k = 1 ∏ K Δ ( β ) Δ ( n k + β ) ⋅ m = 1 ∏ M Δ ( α ) Δ ( n m + α )
因爲 Gibbs Sampling 的要求就是根據條件分佈進行採樣,所以 LDA 模型的採樣公式爲:
p ( z i = k ∣ z ⃗ ¬ i , w ⃗ ) = p ( w ⃗ , z ⃗ ) p ( w ⃗ , z ⃗ ¬ i ) = p ( w ⃗ ∣ z ⃗ ) p ( w ⃗ ¬ i ∣ z ⃗ ¬ i ) p ( w i ) ⋅ p ( z ⃗ ) p ( z ⃗ ¬ i ) ( 1 ) ∝ Δ ( n ⃗ k + β ⃗ ) Δ ( n ⃗ k , ¬ i + β ⃗ ) ⋅ Δ ( n ⃗ m + α ⃗ ) Δ ( n ⃗ m , ¬ i + α ⃗ ) ( 2 ) = Γ ( n k ( t ) + β t ) Γ ( ∑ t = 1 V n k , ¬ i ( t ) + β t ) Γ ( n k , ¬ i ( t ) + β t ) ) Γ ( ∑ t = 1 V n k ( t ) + β t ) ⋅ Γ ( n m ( k ) + α k ) Γ ( ∑ k = 1 K n m , ¬ i ( k ) + α k ) Γ ( n m , ¬ i ( k ) + α k ) Γ ( ∑ k = 1 K n m ( k ) + α k ) ( 3 ) = n k , ¬ i ( t ) + β t ∑ t = 1 V n k , ¬ i ( t ) + β t ⋅ n m , ¬ i ( k ) + α k [ ∑ k = 1 K n m ( k ) + α k ] − 1 ( 4 ) ∝ n k , ¬ i ( t ) + β t ∑ t = 1 V n k , ¬ i ( t ) + β t ⋅ ( n m , ¬ i ( k ) + α k ) ( 5 )
\begin{aligned}
p(z_i=k|\vec{z}_{\neg i},\vec{w})=& \frac{p(\vec{w},\vec{z})}{p(\vec{w},\vec{z}_{\neg i})}=\frac{p(\vec{w}|\vec{z})}{p(\vec{w}_{\neg i}|\vec{z}_{\neg i})p(w_i)}\cdot\frac{p(\vec{z})}{p(\vec{z}_{\neg i})} & (1)\\
\propto & \frac{\Delta(\vec{n}_k+\vec\beta)}{\Delta(\vec{n}_{k,\neg i}+\vec\beta)}\cdot \frac{\Delta(\vec{n}_m+\vec\alpha)}{\Delta(\vec{n}_{m,\neg i}+\vec\alpha)} &(2)\\
=& \frac{\Gamma(n_k^{(t)}+\beta_t)\Gamma(\sum_{t=1}^Vn_{k,\neg i}^{(t)}+\beta_t)}{\Gamma(n_{k,\neg i}^{(t)}+\beta_t))\Gamma(\sum_{t=1}^Vn_k^{(t)}+\beta_t)}\cdot\frac{\Gamma(n_m^{(k)}+\alpha_k)\Gamma(\sum_{k=1}^Kn_{m,\neg i}^{(k)}+\alpha_k)}{\Gamma(n_{m,\neg i}^{(k)}+\alpha_k)\Gamma(\sum_{k=1}^Kn_m^{(k)}+\alpha_k)}&(3)\\
=&\frac{n_{k,\neg i}^{(t)}+\beta_t}{\sum_{t=1}^Vn_{k,\neg i}^{(t)}+\beta_t}\cdot\frac{n_{m,\neg i}^{(k)}+\alpha_k}{[\sum_{k=1}^Kn_m^{(k)}+\alpha_k]-1}&(4)\\
\propto & \frac{n_{k,\neg i}^{(t)}+\beta_t}{\sum_{t=1}^Vn_{k,\neg i}^{(t)}+\beta_t}\cdot(n_{m,\neg i}^{(k)}+\alpha_k)&(5)
\end{aligned}
p ( z i = k ∣ z ¬ i , w ) = ∝ = = ∝ p ( w , z ¬ i ) p ( w , z ) = p ( w ¬ i ∣ z ¬ i ) p ( w i ) p ( w ∣ z ) ⋅ p ( z ¬ i ) p ( z ) Δ ( n k , ¬ i + β ) Δ ( n k + β ) ⋅ Δ ( n m , ¬ i + α ) Δ ( n m + α ) Γ ( n k , ¬ i ( t ) + β t ) ) Γ ( ∑ t = 1 V n k ( t ) + β t ) Γ ( n k ( t ) + β t ) Γ ( ∑ t = 1 V n k , ¬ i ( t ) + β t ) ⋅ Γ ( n m , ¬ i ( k ) + α k ) Γ ( ∑ k = 1 K n m ( k ) + α k ) Γ ( n m ( k ) + α k ) Γ ( ∑ k = 1 K n m , ¬ i ( k ) + α k ) ∑ t = 1 V n k , ¬ i ( t ) + β t n k , ¬ i ( t ) + β t ⋅ [ ∑ k = 1 K n m ( k ) + α k ] − 1 n m , ¬ i ( k ) + α k ∑ t = 1 V n k , ¬ i ( t ) + β t n k , ¬ i ( t ) + β t ⋅ ( n m , ¬ i ( k ) + α k ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 )
上面的公式推導做以下五點解釋:
(1) 式和 (2) 式正比,是因爲 p ( w i ) p(w_i) p ( w i ) 是一個常數;由於 ¬ i \neg i ¬ i 只涉及兩個共軛結構,所以 (1) 式到 (2) 式約去了相同的 Δ \Delta Δ 函數。
(2) 式到 (3) 式,同樣是因爲 i 只與第 k 個主題和第 m 篇文檔有關,將 Δ \Delta Δ 函數展開成 Γ \Gamma Γ 函數後,約去了相同的 Γ \Gamma Γ 函數。
(3) 式到 (4) 式利用了 Γ \Gamma Γ 函數的性質:Γ ( x + 1 ) = x Γ ( x ) \Gamma(x+1)=x\Gamma(x) Γ ( x + 1 ) = x Γ ( x ) ,進行約去。
(4) 式裏有一個減 1,是因爲把第 m 篇文檔裏要去掉的那個計數分離了出來。
因爲第 m 篇文檔的單詞數是已知的,所以 (4) 式和 (5) 式是正比關係。
三、原來對 LDA 的一個疑惑,以及解答
疑問:LDA 並沒有做任何詞語語義方面的工作,僅僅是做一些數量上的統計,它是怎麼挖掘出文檔具有語義含義的主題的呢?
LDA 本質上是對詞進行了聚類(依據某方面的相似度),聚的這個類就是主題。換句話說就是,按照主題給詞進行了聚類。
既然沒有做詞語義方面的工作,那詞之間的相似度是怎麼確定的呢?
讀完 LDA 著名的科普文章《Parameter estimation for text analysis》,它裏面提到潛在主題是高階共現的結果,即,t1 和 t2 共現,t2 和 t3 共現,表明 t1 和 t3 是一個二階共現,以此類推。(共現,在不同上下文中共同出現。兩個詞越經常在不同的上下文共現,就越表示這兩個詞相似。)