最近在看计算机视觉:模型学习与推理,第四章将了使用最大后验概率来学习正态分布的参数。
1维正态分布的先验概率是正态逆伽马分布,N维正态分布的先验概率是正态逆维系特分布。这里生成的数据是满足1维正态分布。
1维正态分布先验概率公式如下:
Pr ( μ , σ 2 ) = γ σ 2 π β α Γ ( α ) ( 1 σ 2 ) α + 1 exp [ − 2 β + γ ( δ − μ ) 2 2 σ 2 ] \operatorname{Pr}\left(\mu, \sigma^{2}\right)=\frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}} \frac{\beta^{\alpha}}{\Gamma(\alpha)}\left(\frac{1}{\sigma^{2}}\right)^{\alpha+1} \exp \left[-\frac{2 \beta+\gamma(\delta-\mu)^{2}}{2 \sigma^{2}}\right] P r ( μ , σ 2 ) = σ 2 π γ Γ ( α ) β α ( σ 2 1 ) α + 1 exp [ − 2 σ 2 2 β + γ ( δ − μ ) 2 ]
最大后验概率实际上试求
μ ^ , σ ^ 2 = argmax μ , σ 2 [ ∏ i = 1 I Pr ( x i ∣ μ , σ 2 ) Pr ( μ , σ 2 ) ] = argmax μ , σ 2 [ ∏ i = 1 I Norm x i [ μ , σ 2 ] NormInvGam μ , σ 2 [ α , β , γ , δ ] ] \begin{aligned} \hat{\mu}, \hat{\sigma}^{2} &=\underset{\mu, \sigma^{2}}{\operatorname{argmax}}\left[\prod_{i=1}^{I} \operatorname{Pr}\left(x_{i} | \mu, \sigma^{2}\right) \operatorname{Pr}\left(\mu, \sigma^{2}\right)\right] \\ &=\underset{\mu, \sigma^{2}}{\operatorname{argmax}}\left[\prod_{i=1}^{I} \operatorname{Norm}_{x_{i}}\left[\mu, \sigma^{2}\right] \operatorname{NormInvGam}_{\mu, \sigma^{2}}[\alpha, \beta, \gamma, \delta]\right] \end{aligned} μ ^ , σ ^ 2 = μ , σ 2 a r g m a x [ i = 1 ∏ I P r ( x i ∣ μ , σ 2 ) P r ( μ , σ 2 ) ] = μ , σ 2 a r g m a x [ i = 1 ∏ I N o r m x i [ μ , σ 2 ] N o r m I n v G a m μ , σ 2 [ α , β , γ , δ ] ]
的最大值。跟上文一样的方法,但是这个公式的形式稍微复杂。
μ ^ , σ ^ 2 = argmax μ , σ 2 [ ∑ i = 1 I log [ Norm x i [ μ , σ 2 ] ] + log [ NormInvGam μ , σ 2 [ α , β , γ , δ ] ] ] \hat{\mu}, \hat{\sigma}^{2}=\underset{\mu, \sigma^{2}}{\operatorname{argmax}}\left[\sum_{i=1}^{I} \log \left[\operatorname{Norm}_{x_{i}}\left[\mu, \sigma^{2}\right]\right]+\log \left[\text { NormInvGam }_{\mu, \sigma^{2}}[\alpha, \beta, \gamma, \delta]\right]\right] μ ^ , σ ^ 2 = μ , σ 2 a r g m a x [ i = 1 ∑ I log [ N o r m x i [ μ , σ 2 ] ] + log [ NormInvGam μ , σ 2 [ α , β , γ , δ ] ] ]
最终结果如下:
μ ^ = ∑ i = 1 I x i + γ δ I + γ \hat{\mu}=\frac{\sum_{i=1}^{I} x_{i}+\gamma \delta}{I+\gamma} μ ^ = I + γ ∑ i = 1 I x i + γ δ
σ ^ 2 = ∑ i = 1 I ( x i − μ ^ ) 2 + 2 β + γ ( δ − μ ^ ) 2 I + 3 + 2 α \hat{\sigma}^{2}=\frac{\sum_{i=1}^{I}\left(x_{i}-\hat{\mu}\right)^{2}+2 \beta+\gamma(\delta-\hat{\mu})^{2}}{I+3+2 \alpha} σ ^ 2 = I + 3 + 2 α ∑ i = 1 I ( x i − μ ^ ) 2 + 2 β + γ ( δ − μ ^ ) 2
所以正态分布的参数依赖于超参数α , β , γ , δ \alpha,\beta,\gamma,\delta α , β , γ , δ
算法流程如下:
Input : Training data { x i } i = 1 I , Hyperparameters α , β , γ , δ Output: MAP estimates of parameters θ = { μ , σ 2 } begin / / Set mean parameter μ = ( ∑ i = 1 x i + γ δ ) / ( I + γ ) / / Set variance σ 2 = ( ∑ i = 1 I ( x i − μ ) 2 + 2 β + γ ( δ − μ ) 2 ) / ( I + 3 + 2 α ) end \begin{array}{l}{\text { Input : Training data }\left\{x_{i}\right\}_{i=1}^{I}, \text { Hyperparameters } \alpha, \beta, \gamma, \delta} \\ {\text { Output: MAP estimates of parameters } \theta=\left\{\mu, \sigma^{2}\right\}} \\ {\text { begin }} \\ {/ / \text { Set mean parameter }} \\ {\mu=\left(\sum_{i=1} x_{i}+\gamma \delta\right) /(I+\gamma)} \\ {/ / \text { Set variance }} \\ {\sigma^{2}=\left(\sum_{i=1}^{I}\left(x_{i}-\mu\right)^{2}+2 \beta+\gamma(\delta-\mu)^{2}\right) /(I+3+2 \alpha)} \\ {\text { end }}\end{array} Input : Training data { x i } i = 1 I , Hyperparameters α , β , γ , δ Output: MAP estimates of parameters θ = { μ , σ 2 } begin / / Set mean parameter μ = ( ∑ i = 1 x i + γ δ ) / ( I + γ ) / / Set variance σ 2 = ( ∑ i = 1 I ( x i − μ ) 2 + 2 β + γ ( δ − μ ) 2 ) / ( I + 3 + 2 α ) end
生成数据的代码在
:代码链接点击这里 .
实现代码如下:
void MAP_learning_univariate_normal_parameters ( )
{
double alpha, beta, gamma, delta;
alpha = 1 ;
beta = 1 ;
gamma = 1 ;
delta = - 1 ;
vector< double > data = generate_normal_distribution_data< double > ( 0 , 1 , 100000 ) ;
double sum= 0.0 , sum_der_2= 0.0 ;
double mu_map, var_map;
for ( int i = 0 ; i < data. size ( ) ; i++ )
{
sum + = data[ i] ;
}
mu_map = ( sum + gamma * delta) / ( data. size ( ) + gamma) ;
for ( int i = 0 ; i < data. size ( ) ; i++ )
{
sum_der_2 + = ( ( data[ i] - mu_map) * ( data[ i] - mu_map) ) ;
}
var_map = ( sum_der_2 + 2 * beta + gamma * ( delta - mu_map) * ( delta - mu_map) ) / ( data. size ( ) + 3 + 2 * alpha) ;
cout << "mu:" << mu_map << endl;
cout << "var: " << var_map << endl;
}
决定精度的是超参数,我觉得有点玄学!!!
一般来讲先验概率准确率较低,加入数据之后计算的后验概率有明显的精度提升。而且数据量如果比较大
书中的一个事例图,讲的很清楚