概率論的兩大基本規則:
加法規則:P ( X ) = ∑ Y P ( X , Y ) P(X)=\sum_Y P(X,Y) P ( X ) = Y ∑ P ( X , Y )
乘法規則:P ( X , Y ) = P ( Y ) P ( X ∣ Y ) = P ( X ) P ( Y ∣ X ) P(X,Y)=P(Y)P(X|Y)=P(X)P(Y|X) P ( X , Y ) = P ( Y ) P ( X ∣ Y ) = P ( X ) P ( Y ∣ X )
演化得:
貝葉斯公式:
P ( Y ∣ X ) = P ( X ∣ Y ) P ( Y ) P ( X ) P(Y|X)=\frac {P(X|Y)P(Y)}{P(X)} P ( Y ∣ X ) = P ( X ) P ( X ∣ Y ) P ( Y )
似 然 函 數 : P ( X ∣ Y ) 似然函數:P(X|Y) 似 然 函 數 : P ( X ∣ Y )
先 驗 分 布 : P ( Y ) 先驗分佈:P(Y) 先 驗 分 布 : P ( Y )
配分函數:( P(X)對所有的Y展開 )P ( X ) = ∑ Y P ( X ∣ Y ) P ( Y ) P(X)=\sum_{Y} P(X|Y)P(Y) P ( X ) = Y ∑ P ( X ∣ Y ) P ( Y )
所以貝葉斯公式也可以寫爲:
P ( Y ∣ X ) = P ( X ∣ Y ) P ( Y ) ∑ Y P ( X ∣ Y ) P ( Y ) P(Y|X)=\frac{P(X|Y)P(Y)}{\sum_{Y} P(X|Y)P(Y)} P ( Y ∣ X ) = ∑ Y P ( X ∣ Y ) P ( Y ) P ( X ∣ Y ) P ( Y )
還可以寫爲:
P ( Y i ∣ X ) = P ( X ∣ Y i ) P ( Y i ) ∑ j P ( X ∣ Y j ) P ( Y j ) P(Y_i|X)=\frac{P(X|Y_i)P(Y_i)}{\sum_{j} P(X|Y_j)P(Y_j)} P ( Y i ∣ X ) = ∑ j P ( X ∣ Y j ) P ( Y j ) P ( X ∣ Y i ) P ( Y i )
貝葉斯公式的應用:
血友病是X隱性遺傳病
一個正常婦女,哥哥患血友病
假設 θ = 1 \theta=1 θ = 1 爲攜帶致病基因,θ = 0 \theta=0 θ = 0 爲不攜帶致病基因
那麼對這個婦女來講,
p ( θ = 1 ) = P ( θ = 0 ) = 1 2 p(\theta=1)=P(\theta=0)=\frac{1}{2} p ( θ = 1 ) = P ( θ = 0 ) = 2 1
這個婦女生了兩個兒子,如果這兩個兒子均正常:
P ( y 1 = 0 , y 2 = 0 ∣ θ = 0 ) = 1 × 1 P(y_1=0,y_2=0|\theta=0)=1×1 P ( y 1 = 0 , y 2 = 0 ∣ θ = 0 ) = 1 × 1
P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) = 1 2 × 1 2 P(y_1=0,y_2=0|\theta=1)=\frac{1}{2}×\frac{1}{2} P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) = 2 1 × 2 1
我們反推這個婦女的患病概率:
P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 ) P ( y 1 = 0 , y 2 = 0 ) P(\theta=1|y_1=0,y_2=0)=\frac{P(y_1=0,y_2=0|\theta=1)P(\theta=1)}{P(y_1=0,y_2=0)} P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ) P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 )
由配分函數
P ( X ) = ∑ Y P ( X ∣ Y ) P ( Y ) P(X)=\sum_{Y} P(X|Y)P(Y) P ( X ) = Y ∑ P ( X ∣ Y ) P ( Y )
可知:
P ( y ) = ∑ θ P ( y ∣ θ ) P ( θ ) P(y)=\sum_{\theta}P(y|\theta)P(\theta) P ( y ) = θ ∑ P ( y ∣ θ ) P ( θ )
即
P ( y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ) P ( θ = 1 ) + P ( y 1 = 0 , y 2 = 0 ) P ( θ = 0 ) P(y_1=0,y_2=0)=P(y_1=0,y_2=0)P(\theta=1)+P(y_1=0,y_2=0)P(\theta=0) P ( y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ) P ( θ = 1 ) + P ( y 1 = 0 , y 2 = 0 ) P ( θ = 0 )
所以這個婦女的患病概率公式就變成了:
P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 ) P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 ) + P ( y 1 = 0 , y 2 = 0 ∣ θ = 0 ) P ( θ = 0 ) P(\theta=1|y_1=0,y_2=0)=\frac{P(y_1=0,y_2=0|\theta=1)P(\theta=1)}{P(y_1=0,y_2=0|\theta=1)P(\theta=1)+P(y_1=0,y_2=0|\theta=0)P(\theta=0)} P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 ) + P ( y 1 = 0 , y 2 = 0 ∣ θ = 0 ) P ( θ = 0 ) P ( y 1 = 0 , y 2 = 0 ∣ θ = 1 ) P ( θ = 1 )
算出來該婦女在生了兩個健康的孩子的條件下的患病概率:
P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = 1 2 × 1 2 × 1 2 1 2 × 1 2 × 1 2 + 1 × 1 × 1 2 = 1 8 5 8 = 1 5 P(\theta=1|y_1=0,y_2=0)=\frac{\frac{1}{2}×\frac{1}{2}×\frac{1}{2}}{\frac{1}{2}×\frac{1}{2}×\frac{1}{2}+1×1×\frac{1}{2}}=\frac{\frac{1}{8}}{\frac{5}{8}}=\frac{1}{5} P ( θ = 1 ∣ y 1 = 0 , y 2 = 0 ) = 2 1 × 2 1 × 2 1 + 1 × 1 × 2 1 2 1 × 2 1 × 2 1 = 8 5 8 1 = 5 1
如果該婦女生了3個健康的兒子
由於這裏的孩子都是沒病的,我們簡化書寫:
P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ) = P ( y ) = P ( y 1 = 0 , y 2 = 0 , . . . , y n = 0 ) P(y_1=0,y_2=0,y_3=0)=P(y)=P(y_1=0,y_2=0,...,y_n=0) P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ) = P ( y ) = P ( y 1 = 0 , y 2 = 0 , . . . , y n = 0 )
則:
P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ∣ θ = 0 ) = P ( y ∣ θ = 0 ) = 1 × 1 × 1 P(y_1=0,y_2=0,y_3=0|\theta=0)=P(y|\theta=0)=1×1×1 P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ∣ θ = 0 ) = P ( y ∣ θ = 0 ) = 1 × 1 × 1
P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ∣ θ = 1 ) = P ( y ∣ θ = 1 ) = 1 2 × 1 2 × 1 2 P(y_1=0,y_2=0,y_3=0|\theta=1)=P(y|\theta=1)=\frac{1}{2}×\frac{1}{2}×\frac{1}{2} P ( y 1 = 0 , y 2 = 0 , y 3 = 0 ∣ θ = 1 ) = P ( y ∣ θ = 1 ) = 2 1 × 2 1 × 2 1
反推這個婦女的患病概率:
P ( θ = 1 ∣ y ) = P ( y ∣ θ = 1 ) P ( θ = 1 ) p ( y ) = P ( y ∣ θ = 1 ) P ( θ = 1 ) P ( y ∣ θ = 0 ) P ( θ = 0 ) + P ( y ∣ θ = 1 ) P ( θ = 1 ) P(\theta=1|y)=\frac{P(y|\theta=1)P(\theta=1)}{p(y)}=\frac{P(y|\theta=1)P(\theta=1)}{P(y|\theta=0)P(\theta=0)+P(y|\theta=1)P(\theta=1)} P ( θ = 1 ∣ y ) = p ( y ) P ( y ∣ θ = 1 ) P ( θ = 1 ) = P ( y ∣ θ = 0 ) P ( θ = 0 ) + P ( y ∣ θ = 1 ) P ( θ = 1 ) P ( y ∣ θ = 1 ) P ( θ = 1 )
= ( 1 2 × 1 2 × 1 2 ) × 1 2 1 × 1 × 1 × 1 2 + ( 1 2 × 1 2 × 1 2 ) × 1 2 = 1 16 9 16 = 1 9 ≈ 0.111111 =\frac{(\frac{1}{2}×\frac{1}{2}×\frac{1}{2})×\frac{1}{2}}{1×1×1×\frac{1}{2}+(\frac{1}{2}×\frac{1}{2}×\frac{1}{2})×\frac{1}{2}}=\frac{\frac{1}{16}}{\frac{9}{16}}=\frac{1}{9}≈0.111111 = 1 × 1 × 1 × 2 1 + ( 2 1 × 2 1 × 2 1 ) × 2 1 ( 2 1 × 2 1 × 2 1 ) × 2 1 = 1 6 9 1 6 1 = 9 1 ≈ 0 . 1 1 1 1 1 1
繼續推廣,假設這個婦女生了n個健康的兒子:
P ( y ∣ θ = 0 ) = 1 n P(y|\theta=0)=1^n P ( y ∣ θ = 0 ) = 1 n
P ( y ∣ θ = 1 ) = ( 1 2 ) n P(y|\theta=1)=(\frac{1}{2})^n P ( y ∣ θ = 1 ) = ( 2 1 ) n
那麼這個婦女的患病概率爲:
P ( θ = 1 ∣ y ) = P ( y ∣ θ = 1 ) P ( θ = 1 ) P ( y ) = P ( y ∣ θ = 1 ) P ( θ = 1 ) P ( y ∣ θ = 0 ) P ( θ = 0 ) + P ( y ∣ θ = 1 ) P ( θ = 1 ) = ( 1 2 ) n + 1 1 n × 1 2 + ( 1 2 ) n + 1 P(\theta=1|y)=\frac{P(y|\theta=1)P(\theta=1)}{P(y)}=\frac{P(y|\theta=1)P(\theta=1)}{P(y|\theta=0)P(\theta=0)+P(y|\theta=1)P(\theta=1)}=\frac{(\frac{1}{2})^{n+1}}{1^n×\frac{1}{2}+(\frac{1}{2})^{n+1}} P ( θ = 1 ∣ y ) = P ( y ) P ( y ∣ θ = 1 ) P ( θ = 1 ) = P ( y ∣ θ = 0 ) P ( θ = 0 ) + P ( y ∣ θ = 1 ) P ( θ = 1 ) P ( y ∣ θ = 1 ) P ( θ = 1 ) = 1 n × 2 1 + ( 2 1 ) n + 1 ( 2 1 ) n + 1
由此可以看出,當 n → ∞ n\to∞ n → ∞ ,這名婦女患病的概率就成了 0
事實上,當n爲10時,這名婦女的患病概率就已經非常小了(0.001949317738791423,幾近於0),不信我們用matplotlib模擬一下看看:
import numpy as np
from matplotlib import pyplot as plt
plt. xlim( ( 0 , 10 ) )
plt. ylim( ( 0 , 0.5 ) )
x = np. arange( 0 , 11 )
y = ( 0.5 ** ( x+ 1 ) ) / ( ( 0.5 ** ( x+ 1 ) ) + 0.5 )
plt. title( "Bayes" )
plt. xlabel( "該婦女所生孩子個數" )
plt. ylabel( "該婦女攜帶致病基因概率" )
plt. plot( x, y, color= 'red' )
plt. show( )
如果我把x區間修改爲20:
所以可以驗證上面:n爲10的時候就已經可以認爲這名婦女不攜帶致病基因了
類似的圖還可以用pyecharts畫出來:
from pyecharts. charts import *
from pyecharts import options as opts
from pyecharts. render import make_snapshot
from snapshot_selenium import snapshot
from pyecharts. globals import ThemeType
list_x = [ x for x in range ( 0 , 11 ) ]
list_y = [ ]
for x in range ( 0 , 11 ) :
list_y. append( ( 0.5 ** ( x + 1 ) ) / ( ( 0.5 ** ( x + 1 ) ) + 0.5 ) )
line = (
Line( init_opts= opts. InitOpts( theme= ThemeType. WALDEN) )
. add_xaxis( list_x)
. add_yaxis( "" , list_y, is_smooth= True )
. set_global_opts( title_opts= opts. TitleOpts( title= "Bayes" , pos_left= 'center' , ) ,
yaxis_opts= opts. AxisOpts( name= "該婦女攜帶致病基因概率" ) ,
xaxis_opts= opts. AxisOpts( name= "該婦女所生孩子個數" ) )
. set_series_opts( label_opts= opts. LabelOpts( is_show= False ) )
)
make_snapshot( snapshot, line. render( ) , "Bayes.png" )