最近在看計算機視覺:模型學習與推理這本書,書中講了幾十種算法,原書是使用matlab實現的,本人打算用c++重新實現一遍。書中作者也推薦自己實現一遍代碼對於理解問題大有裨益。
本篇博客主要講使用c++標準庫生成正態分佈的數據,然後使用最大似然的方法估計學習參數。並提供c++源碼。
一維正態分佈滿足如下公式:
Pr ( x ) = 1 2 π σ 2 exp [ − 0.5 ( x − μ ) 2 / σ 2 ] \operatorname{Pr}(x)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left[-0.5(x-\mu)^{2} / \sigma^{2}\right] P r ( x ) = 2 π σ 2 1 exp [ − 0 . 5 ( x − μ ) 2 / σ 2 ]
n個數據假設獨立分佈,則聯合分佈爲:
Pr ( x 1 … I ∣ μ , σ 2 ) = ∏ i = 1 I Pr ( x i ∣ μ , σ 2 ) \operatorname{Pr}\left(x_{1 \ldots I} | \mu, \sigma^{2}\right)=\prod_{i=1}^{I} \operatorname{Pr}\left(x_{i} | \mu, \sigma^{2}\right) P r ( x 1 … I ∣ μ , σ 2 ) = i = 1 ∏ I P r ( x i ∣ μ , σ 2 )
= ∏ i = 1 I Norm x i [ μ , σ 2 ] = 1 ( 2 π σ 2 ) I / 2 exp [ − 0.5 ∑ i = 1 I ( x i − μ ) 2 σ 2 ] \begin{array}{l}{=\prod_{i=1}^{I} \operatorname{Norm}_{x_{i}}\left[\mu, \sigma^{2}\right]} \\ {=\frac{1}{\left(2 \pi \sigma^{2}\right)^{I / 2}} \exp \left[-0.5 \sum_{i=1}^{I} \frac{\left(x_{i}-\mu\right)^{2}}{\sigma^{2}}\right]}\end{array} = ∏ i = 1 I N o r m x i [ μ , σ 2 ] = ( 2 π σ 2 ) I / 2 1 exp [ − 0 . 5 ∑ i = 1 I σ 2 ( x i − μ ) 2 ]
進行極大似然估計,就是選擇一組參數,使得目前觀測的聯合概率分佈達到最大,那麼這組參數就是最符合目前觀測數據的。
常規方法是求對數的導數,計算極值:
μ ^ , σ ^ 2 = argmax μ , σ 2 [ ∑ i = 1 I log [ Norm x i [ μ , σ 2 ] ] ] = argmax μ , σ 2 [ − 0.5 I log [ 2 π ] − 0.5 I log σ 2 − 0.5 ∑ i = 1 I ( x i − μ ) 2 σ 2 ] \begin{aligned} \hat{\mu}, \hat{\sigma}^{2} &=\underset{\mu, \sigma^{2}}{\operatorname{argmax}}\left[\sum_{i=1}^{I} \log \left[\operatorname{Norm}_{x_{i}}\left[\mu, \sigma^{2}\right]\right]\right] \\ &=\underset{\mu, \sigma^{2}}{\operatorname{argmax}}\left[-0.5 I \log [2 \pi]-0.5 I \log \sigma^{2}-0.5 \sum_{i=1}^{I} \frac{\left(x_{i}-\mu\right)^{2}}{\sigma^{2}}\right] \end{aligned} μ ^ , σ ^ 2 = μ , σ 2 a r g m a x [ i = 1 ∑ I log [ N o r m x i [ μ , σ 2 ] ] ] = μ , σ 2 a r g m a x [ − 0 . 5 I log [ 2 π ] − 0 . 5 I log σ 2 − 0 . 5 i = 1 ∑ I σ 2 ( x i − μ ) 2 ]
求極值:
∂ L ∂ μ = ∑ i = 1 I ( x i − μ ) σ 2 = ∑ i = 1 I x i σ 2 − I μ σ 2 = 0 \begin{aligned} \frac{\partial L}{\partial \mu} &=\sum_{i=1}^{I} \frac{\left(x_{i}-\mu\right)}{\sigma^{2}} \\ &=\frac{\sum_{i=1}^{I} x_{i}}{\sigma^{2}}-\frac{I \mu}{\sigma^{2}}=0 \end{aligned} ∂ μ ∂ L = i = 1 ∑ I σ 2 ( x i − μ ) = σ 2 ∑ i = 1 I x i − σ 2 I μ = 0
μ ^ = ∑ i = 1 I x i I \hat{\mu}=\frac{\sum_{i=1}^{I} x_{i}}{I} μ ^ = I ∑ i = 1 I x i
σ ^ 2 = ∑ i = 1 I ( x i − μ ^ ) 2 I \hat{\sigma}^{2}=\sum_{i=1}^{I} \frac{\left(x_{i}-\hat{\mu}\right)^{2}}{I} σ ^ 2 = i = 1 ∑ I I ( x i − μ ^ ) 2
實際上我們的最終算法就是這兩個公式。算法整體流程如下:
Input : Training data { x i } i = 1 I Output: Maximum likelihood estimates of parameters θ = { μ , σ 2 } begin / / Set mean parameter μ = ∑ i = 1 I x i / I / / Set variance σ 2 = ∑ i = 1 I ( x i − μ ^ ) 2 / I end \begin{array}{l}{\text { Input : Training data }\left\{x_{i}\right\}_{i=1}^{I}} \\ {\text { Output: Maximum likelihood estimates of parameters } \theta=\left\{\mu, \sigma^{2}\right\}} \\ {\text { begin }} \\ {/ / \text { Set mean parameter }} \\ {\mu=\sum_{i=1}^{I} x_{i} / I} \\ {/ / \text { Set variance }} \\ {\qquad \sigma^{2}=\sum_{i=1}^{I}\left(x_{i}-\hat{\mu}\right)^{2} / I} \\ {\text { end }}\end{array} Input : Training data { x i } i = 1 I Output: Maximum likelihood estimates of parameters θ = { μ , σ 2 } begin / / Set mean parameter μ = ∑ i = 1 I x i / I / / Set variance σ 2 = ∑ i = 1 I ( x i − μ ^ ) 2 / I end
數據生成的代碼如下:
#include <random>
#include <iostream>
#include <vector>
using std:: vector;
template < typename T>
vector< T> generate_normal_distribution_data ( T mu, T sigma, int number)
{
vector< T> data;
std:: random_device rd{ } ;
std:: mt19937 gen{ rd ( ) } ;
std:: normal_distribution< > d{ mu, sigma* sigma } ;
for ( int n = 0 ; n < number; ++ n) {
data. push_back ( d ( gen) ) ;
}
return data;
}
學習代碼如下:
void learning_normal_distribution_parameters ( )
{
vector< double > data= generate_normal_distribution_data< double > ( 0 , 1 , 100000 ) ;
double mu = 0.0 ;
double sigma = 0.0 ;
for ( int i = 0 ; i < data. size ( ) ; i++ )
{
mu + = data[ i] ;
}
mu = mu / data. size ( ) ;
for ( int i = 0 ; i < data. size ( ) ; i++ )
{
sigma + = ( data[ i] - mu) * ( data[ i] - mu) ;
}
sigma = sigma / data. size ( ) ;
double var = sigma;
cout << "mu: " << mu << endl;
cout << "var: " << var << endl;
}
我們在模型的參數設計上,生成均值爲0,方差爲1的10000個數據點。
學習得到的參數如下: