貝葉斯&python應用

貝葉斯

貝葉斯判定準則

爲最小化總體風險,只需要在每個樣本上選擇那個能使條件風險R(cx)R(c|x)最小的類別標記,即:
h(x)=arg mincYR(cx)(式1) h^*{(x)}=\argmin\limits_{c\in{\mathcal Y}}R(c|x)\tag{式1}
此時,h(x)h^*(x)稱爲貝葉斯最優分類器。
已知,條件風險R(cx)R(c|x)的計算公式爲:
R(cix)=j=1NλijP(cjx)(式2) R(c_i|x)=\sum_{j=1}^{N}\lambda_{ij}P(c_j|x)\tag{式2}
如若目標是最小化分類錯誤率,則誤判損失λij\lambda_{ij}對應爲0/1損失,即:
λij={0,ifi=j1,otherwise(式3) \begin{aligned} \lambda_{ij}= \begin{cases} 0,\qquad &{if\quad i=j}\\ 1,&otherwise \end{cases} \end{aligned}\tag{式3}
那麼條件風險R(cx)R(c|x)的計算公式進一步展開爲:
R(cix)=1P(c1x)++1P(ci1x)+0P(cix)+1P(ci+1x)++1P(cNx)=P(c1x)++P(ci1x)+P(ci+1x)++P(cNx)(式4) \begin{aligned} R(c_i|x)&=1\cdot P(c_1|x)+\cdots +1\cdot P(c_{i-1}|x)+0\cdot P(c_i|x)\\ &+1\cdot P(c_{i+1}|x)+\cdots+1\cdot P(c_N|x)\\ &=P(c_1|x)+\cdots+P(c_{i-1}|x)+P(c_{i+1}|x)+\cdots +P(c_N|x)\tag{式4} \end{aligned}
由於j=1NP(cjx)=1\sum_{j=1}^{N}P(c_j|x)=1,所以有:
R(cix)=1P(cix)(式5)R(c_i|x)=1-P(c_i|x)\tag{式5}
於是呢,最小化錯誤率的貝葉斯最優分類器就是:
h(x)=arg mincYR(cx)=arg mincY(1P(cx))=arg maxcYP(cx)(式6) h^*(x)=\argmin\limits_{c\in{\mathcal{Y}}}R(c|x)=\argmin\limits_{c\in{\mathcal{Y}}}(1-P(c|x))=\argmax\limits_{c\in{\mathcal{Y}}}P(c|x)\tag{式6}

多元正態度分佈參數的極大似然估計

已知對數似然函數爲:
LL(θc)=xDclogP(xθc)(式7) LL(\theta_c)=\sum_{x\in{D_c}}logP(x|\theta_c)\tag{式7}
爲了便於計算,令loglog的底數爲ee,則對數似然函數爲:
LL(θc)=xDclnP(xθc)(式8) LL(\theta_c)=\sum_{x\in{D_c}}lnP(x|\theta_c)\tag{式8}
由於P(xθc)=P(xc)N(μc,σc2)P(x|\theta_c)=P(x|c)\sim\mathcal{N}(\mu_c,\sigma_c^2)那麼:
P(xθc)=1(2π)dΣcexp(12(xμc)TΣc1(xμc))(式9) P(x|\theta_c)=\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x-\mu_c)^T\Sigma_c^{-1}(x-\mu_c))\tag{式9}
其中,dd表示xx的維數,Σc=σc2\Sigma_c=\sigma_c^2爲對稱正定協方差矩陣,Σc|\Sigma_c|表示行列式,將上式代入對數似然函數可得:
LL(θc)=xDcln[1(2π)dΣcexp(12(xμc)TΣc1(xμc))](式10) LL(\theta_c)=\sum_{x\in{D_c}}ln[\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x-\mu_c)^T\Sigma_c^{-1}(x-\mu_c))]\tag{式10}
Dc=N|D_c=N|,則對數似然函數化爲:
LL(θc)=x=1Nln[1(2π)dΣcexp(12(xiμc)TΣc1(xiμc))]=i=1Nln[1(2π)d1Σcexp(12(xiμc)TΣc1(xiμc))]=i=1N{ln1(2π)d+ln1Σc+ln[exp(12(xiμc)TΣc1(xiμc))]}=i=1N{d2ln(2π)12lnΣc12(xiμc)TΣc1(xiμc)}=Nd2ln(2π)N2lnΣc12Σi=1N(xiμc)TΣc1(xiμc)(式11) \begin{aligned} LL(\theta_c)&=\sum_{x=1}^{N}ln[\cfrac{1}{\sqrt{(2\pi)^d{|\Sigma_c|}}}exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\\ &=\sum_{i=1}^{N}ln[\cfrac{1}{\sqrt{(2\pi)^d}}\cdot \cfrac{1}{\sqrt{|\Sigma_c|}}exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\\ &=\sum_{i=1}^{N}\{ln\cfrac{1}{\sqrt{(2\pi)^d}}+ln\cfrac{1}{\sqrt{|\Sigma_c|}}+ln[exp(-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c))]\}\\ &=\sum_{i=1}^{N}\{-\cfrac{d}{2}ln(2\pi)-\cfrac{1}{2}ln|\Sigma_c|-\cfrac{1}{2}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)\}\\ &=-\cfrac{Nd}{2}ln(2\pi)-\cfrac{N}{2}ln|\Sigma_c|-\cfrac{1}{2}\Sigma_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c) \end{aligned}\tag{式11}
由於參數θc\theta_c的極大似然估計θ^c\hat{\theta}_c爲;
θ^c=arg minθcLL(θc)(式12) \hat{\theta}_c=\argmin\limits_{\theta_c}LL(\theta_c)\tag{式12}
所以下面只需求出使得對數似然函數LL(θc)LL(\theta_c)取到最大值的μ^c\hat{\mu}_c^c\hat{\sum}_c,就求出了θ^c\hat{\theta}_c
LL(θc)LL(\theta_c)關於μc\mu_c求偏導:
LL(θc)μc=μc[Nd2ln(2π)N2lnΣc12i=1N(xiμc)TΣc1(xiμc)]=μc[12i=1N(xiμc)TΣc1(xiμc)]]=12i=1Nμc[(xiμc)TΣc1(xiμc)]=12i=1Nμc[(xiTμcT)Σc1(xiμc)]=12i=1Nμc[(xiTμcT)(Σc1xiΣc1μc)]=12i=1Nμc[xiTΣc1xixiTΣcTμcμcTΣc1xi+μcTΣc1μc](式13) \begin{aligned} \cfrac{\partial{LL(\theta_c)}}{\partial{\mu_c}}&=\cfrac{\partial}{\partial{\mu_c}}[-\cfrac{Nd}{2}ln(2\pi)-\cfrac{N}{2}ln|\Sigma_c|-\cfrac{1}{2}\sum_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]\\ &=\cfrac{\partial}{\partial{\mu_c}}[-\cfrac{1}{2}\sum_{i=1}^{N}(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i-\mu_c)^T\Sigma_c^{-1}(x_i-\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i^T-\mu_c^T)\Sigma_c^{-1}(x_i-\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[(x_i^T-\mu_c^T)(\Sigma_c^{-1}x_i-\Sigma_c^{-1}\mu_c)]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[x_i^T\Sigma_c^{-1}x_i-x_i^T\Sigma_c^T\mu_c-\mu_c^T\Sigma_c^{-1}x_i+\mu_c^T\Sigma_c^{-1}\mu_c] \end{aligned}\tag{式13}
由於xiTΣc1μcx_i^T\Sigma_c^{-1}\mu_c的計算結果爲標量,所以有:
xiTΣc1μc=(xiTΣc1μc)T=μcT(Σc1)Txi=μcT(ΣcT)1xi=μcTΣc1xi(式14) x_i^T\Sigma_c^{-1}\mu_c=(x_i^T\Sigma_c^{-1}\mu_c)^T=\mu_c^T(\Sigma_c^{-1})^Tx_i=\mu_c^T(\Sigma_c^T)^{-1}x_i=\mu_c^T\Sigma_c^{-1}x_i\tag{式14}
所以(式13)可以進一步化爲:
LL(θc)μc=12i=1Nμc[xiTΣc1xi2xiTΣc1μc+μcTΣc1μc](式15) \cfrac{\partial{LL(\theta_c)}}{\partial{\mu_c}}= -\cfrac{1}{2}\sum_{i=1}^{N}\cfrac{\partial}{\partial{\mu_c}}[x_i^T\Sigma_c^{-1}x_i-2x_i^T\Sigma_c^{-1}\mu_c+\mu_c^T\Sigma_c^{-1}\mu_c]\tag{式15}
由矩陣微分公式:
aTxx=a,xTβxx=(β+βT)x(式16) \cfrac{\partial a^T x}{\partial x}=a,\quad \cfrac{\partial x^T \beta x}{\partial x}=(\beta+\beta^T)x\tag{式16}
可以得到;
LL(θc)μc=12i=1N[0(2xiTΣc1)T+(Σc1+Σc1)Tμc]=12i=1N[(2(Σc1)Txi)+(Σc1+Σc1)Tμc]=12i=1N[(2Σc1xi)+2Σc1μc]=i=1NΣc1xiNΣc1μc(式17) \begin{aligned} \cfrac{\partial LL(\theta_c)}{\partial \mu_c}&= -\cfrac{1}{2}\sum_{i=1}^{N}[0-(2x_i^T\Sigma_c^{-1})^T+(\Sigma_c^{-1}+{\Sigma_c^{-1})}^T\mu_c]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}[-(2(\Sigma_c^{-1})^T x_i)+(\Sigma_c^{-1}+{\Sigma_c^{-1})}^T\mu_c]\\ &=-\cfrac{1}{2}\sum_{i=1}^{N}[-(2\Sigma_c^{-1}x_i)+2\Sigma_c^{-1}\mu_c]\\ &=\sum_{i=1}^{N}\Sigma_c^{-1}x_i-N\Sigma_c^{-1}\mu_c \end{aligned}\tag{式17}
令偏導數爲0,得到:
LL(θc)μc=i=1NΣc1xiNΣc1μc=0i=1NΣc1xi=NΣc1μcΣc1i=1Nxi=NΣc1μcNμc=i=1Nxiμc=1Ni=1Nxi(式18) \begin{aligned} \cfrac{\partial LL(\theta_c)}{\partial \mu_c}&=\sum_{i=1}^{N}\Sigma_c^{-1}x_i-N\Sigma_c^{-1}\mu_c=0\\ &\Longrightarrow\sum_{i=1}^{N}\Sigma_c^{-1}x_i=N\Sigma_c^{-1}\mu_c\\ &\Longrightarrow\Sigma_c^{-1}\sum_{i=1}^{N}x_i=N\Sigma_c^{-1}\mu_c\\ &\Longrightarrow N\mu_c = \sum_{i=1}^{N}x_i\\ &\Longrightarrow \mu_c = \cfrac{1}{N}\sum_{i=1}^{N}x_i \end{aligned}\tag{式18}
同樣的,對LL(θc)LL(\theta_c)關於Σc\Sigma_c求偏導得到:
Σc=1Ni=1N(xiμc)(xiμc)T(式19) \Sigma_c = \cfrac{1}{N}\sum_{i=1}^{N}(x_i-\mu_c)(x_i-\mu_c)^T\tag{式19}
最小化分類錯誤率的貝葉斯最優分類器爲:
h(x)=arg maxcYP(cx)(式20) h^*(x)=\argmax\limits_{c\in\mathcal{Y}}P(c|x)\tag{式20}
又由貝葉斯定理可以知道:
P(cx)=P(x,c)P(x)=P(c)P(xc)P(x)(式21) P(c|x)=\cfrac{P(x,c)}{P(x)}=\cfrac{P(c)P(x|c)}{P(x)}\tag{式21}
所以:
h(x)=arg maxcYP(c)P(xc)P(x)=arg maxcYP(c)P(xc)(式22) h^*(x)=\argmax\limits_{c\in{\mathcal{Y}}}\cfrac{P(c)P(x|c)}{P(x)}=\argmax\limits_{c\in\mathcal{Y}}P(c)P(x|c)\tag{式22}
又由屬性條件獨立性假設:
P(xc)=P(x1,x2,,xdc)=i=1dP(xic)(式23) P(x|c) = P(x_1,x_2,\cdots,x_d|c) = \prod_{i=1}^{d}P(x_i|c)\tag{式23}
所以:
h(x)=arg maxcYP(c)i=1dP(xic)(式24) h^*(x)=\argmax\limits_{c\in\mathcal{Y}}P(c)\prod_{i=1}^{d}P(x_i|c)\tag{式24}
這個就是樸素貝葉斯分類器的表達式。
對於P(c)P(c),表示的是樣本空間中各類樣本所佔的比例,根據大數定律,當訓練集包含充足的度量同分布樣本的時候,P(c)P(c)可以通過各類樣本的頻率來進行估計,即:
P(c)=DcD(式25) P(c)=\cfrac{|D_c|}{|D|}\tag{式25}
其中,DD表示訓練集,D|D|表示樣本數,DcD_c表示訓練集中第cc類樣本的數量組成的集合,Dc|D_c|表示集合DcD_c的樣本個數。

貝葉斯分類器python應用

# 導入乳腺腫瘤數據
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
# 打印處數據的keys
print(cancer.keys())
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
# 打印數據集中的標註好的腫瘤分類
print("腫瘤的分類:",cancer['target_names'])
print("腫瘤的特徵:",cancer['feature_names'])
腫瘤的分類: ['malignant' 'benign']
腫瘤的特徵: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']

可見,腫瘤的分類分爲:惡性(Malignant),良性(benign),特徵值有很多。

# 將數據集的數值和分類目標賦值給X,y
X, y = cancer.data, cancer.target
# 導入數據拆分工具
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=38)
# 查看數據形態
print("訓練集形態:", X_train.shape)
print("測試集形態:", X_test.shape)
訓練集形態: (426, 30)
測試集形態: (143, 30)
# 導入高斯樸素貝葉斯
from sklearn.naive_bayes import GaussianNB
# 進行擬合數據
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# 打印模型得分
print("模型得分:{:.3f}".format(gnb.score(X_test, y_test)))
模型得分:0.944
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章