統計學習方法 第四章習題答案

第4章的習題與習題1.1有些相似,建議兩章一起看,關於極大似然估計和貝葉斯估計我在第一章的習題中講解了,可以先看看第一章的解答。
第一章習題是在伯努利試驗中做貝葉斯估計時,採用的是β\beta分佈,但是本章是多個結果的試驗,例如扔色子、多分類任務,此時需使用狄利克雷分佈。

習題4.1

題目:用極大似然估計法推出樸素貝葉斯法中的概率估計公式(4.8)及公式(4.9)。

公式4.8
P(Y=ck)=i=1NI(yi=ck)N,k=1,2,,KP\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}{N}, \quad k=1,2, \ldots, K

其中II爲指示函數,y=cky = c_{k}時爲1,否則爲0,在書的第10頁有介紹。
P(Y=ck)=θP\left(Y=c_{k}\right)=\theta,進行NN次實驗,有nnY=ckY=c_{k}.
n=i=1NI(yi=ck)n=\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)

P(Y=ck)P\left(Y=c_{k}\right) P(Yck)P\left(Y\neq c_{k}\right)
θ\theta 1θ1-\theta

則有L(θ)=θn(1θ)NnL(\theta) = \theta^n\cdot(1-\theta)^{N-n}
一般取對數作爲似然函數L(θ)=nlogθ+(Nn)log(1θ)L(\theta) = n\cdot log\theta+(N-n)\cdot log(1-\theta)
求導L(θ)=n1θ+(Nn)11θL'(\theta) = n\cdot \frac{1}{\theta}+(N-n)\cdot \frac{1}{1-\theta}
L=0L'=0,有θ=nN=i=1NI(yi=ck)N\theta = \frac{n}{N} = \frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}{N}
得證

公式4.9
P(X(j)=ajlY=ck)=i=1NI(xi(j)=ajl,yi=ck)i=1NI(yi=ck)P\left(X^{(j)}=a_{j l} | Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}

證明過程類似,設P(X(j)=ajlY=ck)=θP\left(X^{(j)}=a_{j l} | Y=c_{k}\right)=\theta,進行了N次實驗,有nnY=ckY=c_{k},有mmY=ck,X(j)=ajlY=c_{k},X^{(j)}=a_{j l}
n=i=1NI(yi=ck),m=i=1NI(xi(j)=ajl,yi=ck)n=\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right),m=\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)
L(θ)=θm(1θ)nmL(\theta) = \theta^m\cdot(1-\theta)^{n-m}
取對數L(θ)=mlogθ+(nm)log(1θ)L(\theta) = m\cdot log\theta+(n-m)\cdot log(1-\theta)
求導L(θ)=m1θ+(nm)11θL'(\theta) = m\cdot \frac{1}{\theta}+(n-m)\cdot \frac{1}{1-\theta}
L=0L'=0,有θ=mn=i=1NI(xi(j)=ajl,yi=ck)i=1NI(yi=ck)\theta = \frac{m}{n} = \frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}
得證

習題4.2

用貝葉斯估計法推出樸素貝葉斯法中的概率估計公式(4.10)及公式(4.11)。
與習題4.1類似,假設進行了N次實驗,有nin_{i}Y=ciY=c_{i},有mim_{i}Y=ci,X(j)=ajlY=c_{i},X^{(j)}=a_{j l}
ni=i=1NI(yi=ci),mi=i=1NI(xi(j)=ajl,yi=ci)n_{i}=\sum_{i=1}^{N} I\left(y_{i}=c_{i}\right),m_{i}=\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{i}\right)

公式4.11
Pλ(Y=ck)=i=1NI(yi=ck)+λN+KλP_{\lambda}\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+\lambda}{N+K \lambda}

假設Pλ(Y=ci)=θiP_{\lambda}\left(Y=c_{i}\right)=\theta_{i},其中θi\theta_{i}服從參數爲αi\alpha_{i}的狄利克雷分佈。
即有f(θ1, ,θKα1,,αk)=1B(α1, ,αK)i=1Kθiαi1f\left(\theta_{1}, \cdots, \theta_{K} | \alpha_{1}, \ldots, \alpha_{k}\right)=\frac{1}{B\left(\alpha_{1}, \cdots, \alpha_{K}\right)} \prod_{i=1}^{K} \theta_{i}^{\alpha_{i}-1}
與極大似然估計類似,有P(Nθ1, ,θK)=θ1n1θ2n2...θKnK=i=1KθiniP\left(N | \theta_{1}, \cdots,\theta_{K}\right)=\theta^{n_{1}}_{1}\theta^{n_{2}}_{2}...\theta^{n_{K}}_{K}=\prod_{i=1}^{K} \theta_{i}^{n_{i}}
P(θ1, ,θKN)P(Nθ1, ,θK)P(θ1, ,θk)i=1Kθiαi1i=1Kθinii=1Kθiαi1+niP\left(\theta_{1}, \cdots, \theta_{K} | N\right) \propto P\left(N | \theta_{1}, \cdots, \theta_{K}\right) P\left(\theta_{1}, \cdots, \theta_{k}\right)\propto\prod_{i=1}^{K} \theta_{i}^{\alpha_{i}-1}\prod_{i=1}^{K} \theta_{i}^{n_{i}}\propto\prod_{i=1}^{K} \theta_{i}^{\alpha_{i}-1+n_{i}}
所以有後驗概率P(θ1, ,θkN)P\left(\theta_{1}, \cdots, \theta_{k} | N\right)服從於狄利克雷分佈
Pλ(Y=ci)P_{\lambda}\left(Y=c_{i}\right)θi\theta_{i}的期望E(θi)=ni+αiN+j=1k(αj)E(\theta_{i})=\frac{n_{i}+\alpha_{i}}{N+\sum_{j=1}^{k}\left(\alpha_{j}\right)},若假設θi\theta_{i}服從參數爲λ\lambda的狄利克雷分佈,即α1=α2=...=αk=λ\alpha_{1}=\alpha_{2}=...=\alpha_{k}=\lambda,則有E(θi)=i=1NI(yi=ci)+λN+KλE(\theta_{i})=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{i}\right)+\lambda}{N+K*\lambda}
得證

公式4.10
Pλ(X(j)=ajlY=ck)=i=1NI(xi(j)=aj,yi=ck)+λi=1NI(yi=ck)+SjλP_{\lambda}\left(X^{(j)}=a_{j{l}} | Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j}, y_{i}=c_{k}\right)+\lambda}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+S_{j} \lambda},其中SjS_{j}表示第jj個特徵的取值個數

證明過程類似,知識參數有點變動,設P(X(j)=ajlY=ci)=θiP\left(X^{(j)}=a_{j l} | Y=c_{i}\right)=\theta_{i}θi\theta_{i}服從於參數爲αi\alpha_{i}的狄利克雷分佈。
即有f(θ1, ,θSjα1,,αSj)=1B(α1, ,αSj)i=1Sjθiαi1f\left(\theta_{1}, \cdots, \theta_{S_{j}} | \alpha_{1}, \ldots, \alpha_{S_{j}}\right)=\frac{1}{B\left(\alpha_{1}, \cdots, \alpha_{S_{j}}\right)} \prod_{i=1}^{S_{j}} \theta_{i}^{\alpha_{i}-1}
同理P(nθ1, ,θk)=θ1m1θ2m2...θKmK=i=1SjθimiP\left(n | \theta_{1}, \cdots,\theta_{k}\right)=\theta^{m_{1}}_{1}\theta^{m_{2}}_{2}...\theta^{m_{K}}_{K}=\prod_{i=1}^{S_{j}} \theta_{i}^{m_{i}}
P(θ1, ,θSjn)P(nθ1, ,θSj)P(θ1, ,θSj)i=1Sjθiαi1i=1Sjθimii=1Sjθiαi1+miP\left(\theta_{1}, \cdots, \theta_{S_{j}} | n\right) \propto P\left(n | \theta_{1}, \cdots, \theta_{S_{j}}\right) P\left(\theta_{1}, \cdots, \theta_{S_{j}}\right)\propto\prod_{i=1}^{S_{j}} \theta_{i}^{\alpha_{i}-1}\prod_{i=1}^{S_{j}} \theta_{i}^{m_{i}}\propto\prod_{i=1}^{S_{j}} \theta_{i}^{\alpha_{i}-1+m_{i}}
所以有後驗概率P(θ1, ,θSjn)P\left(\theta_{1}, \cdots, \theta_{S_{j}} | n\right)服從於狄利克雷分佈
Pλ(X(j)=ajlY=ck)P_{\lambda}\left(X^{(j)}=a_{j{l}} | Y=c_{k}\right)θi\theta_{i}的期望E(θi)=mj+αin+j=1Sj(αj)E(\theta_{i})=\frac{m_{j}+\alpha_{i}}{n+\sum_{j=1}^{S_{j}}\left(\alpha_{j}\right)},若假設θi\theta_{i}服從參數爲λ\lambda的狄利克雷分佈,即α1=α2=...=αSj=λ\alpha_{1}=\alpha_{2}=...=\alpha_{S_{j}}=\lambda,則有E(θi)=i=1NI(xi(j)=ajl,yi=ci)+λi=1NI(yi=ci)+SjλE(\theta_{i})=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{i}\right)+\lambda}{\sum_{i=1}^{N} I\left(y_{i}=c_{i}\right)+S_{j}*\lambda}

參考

極大似然估計與貝葉斯估計(強推,博主講得很詳細)
狄利克雷分佈與貝葉斯分佈分佈
第4章習題

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章