最近在看計算機視覺:模型學習與推理。第四章使用最大似然方法學習分類分佈概率參數。
Pr ( x = k ∣ λ 1 … K ) = λ k \operatorname{Pr}\left(x=k | \lambda_{1 \ldots K}\right)=\lambda_{k} P r ( x = k ∣ λ 1 … K ) = λ k
我這裏使用C++標準庫的Poisson分佈生成數據,然後用最大似然方法去估計分佈參數。
這裏設置的Poisson分佈的超參數是4
代碼如下:
vector< int > generate_categorical_distribution_data ( int number)
{
vector< int > data;
std:: random_device rd{ } ;
std:: mt19937 gen{ rd ( ) } ;
std:: poisson_distribution< > d ( 4 ) ;
for ( int i = 0 ; i < number; i++ )
{
data. push_back ( d ( gen) ) ;
}
return data;
}
λ ^ 1 … k = argmax λ 1 … k [ ∏ i = 1 I Pr ( x i ∣ λ 1 … k ) ] s.t. ∑ k λ k = 1 = argmax λ 1 … , k [ ∏ i = 1 I Cat x i [ λ 1 … k ] ] s.t. ∑ k λ k = 1 = argmax λ 1 … , k [ ∏ k = 1 k λ k N k ] s.t. ∑ k λ k = 1 \begin{aligned} \hat{\lambda}_{1 \ldots k} &=\underset{\lambda_{1 \ldots k}}{\operatorname{argmax}}\left[\prod_{i=1}^{I} \operatorname{Pr}\left(x_{i} | \lambda_{1 \ldots k}\right)\right] & & \text { s.t. } \sum_{k} \lambda_{k}=1 \\ &=\underset{\lambda_{1} \ldots, k}{\operatorname{argmax}}\left[\prod_{i=1}^{I} \operatorname{Cat}_{x_{i}}\left[\lambda_{1 \ldots k}\right]\right] & & \text { s.t. } \sum_{k} \lambda_{k}=1 \\ &=\underset{\lambda_{1} \ldots, k}{\operatorname{argmax}}\left[\prod_{k=1}^{k} \lambda_{k}^{N_{k}}\right] & & \text { s.t. } \sum_{k} \lambda_{k}=1 \end{aligned} λ ^ 1 … k = λ 1 … k a r g m a x [ i = 1 ∏ I P r ( x i ∣ λ 1 … k ) ] = λ 1 … , k a r g m a x [ i = 1 ∏ I C a t x i [ λ 1 … k ] ] = λ 1 … , k a r g m a x [ k = 1 ∏ k λ k N k ] s.t. k ∑ λ k = 1 s.t. k ∑ λ k = 1 s.t. k ∑ λ k = 1
這裏泊松分佈產生的是整數從0開的的分佈,在數據集中產生多少種數字就給出對應的概率。
推導過程還是使用最大似然估計的對數化求導數技巧:
L = ∑ k = 1 k N k log [ λ k ] + ν ( ∑ k = 1 k λ k − 1 ) L=\sum_{k=1}^{k} N_{k} \log \left[\lambda_{k}\right]+\nu\left(\sum_{k=1}^{k} \lambda_{k}-1\right) L = k = 1 ∑ k N k log [ λ k ] + ν ( k = 1 ∑ k λ k − 1 )
結果如下:
λ ^ k = N k ∑ m = 1 k N m \hat{\lambda}_{k}=\frac{N_{k}}{\sum_{m=1}^{k} N_{m}} λ ^ k = ∑ m = 1 k N m N k
算法流程如下:
Input : Multi-valued training data { x i } i = 1 I Output: ML estimate of categorical parameters θ = { λ 1 … λ k } begin for k = 1 to K do λ k = ∑ i = 1 I δ [ x i − k ] / I end \begin{array}{l}{\text { Input : Multi-valued training data }\left\{x_{i}\right\}_{i=1}^{I}} \\ {\text { Output: ML estimate of categorical parameters } \boldsymbol{\theta}=\left\{\lambda_{1} \ldots \lambda_{k}\right\}} \\ {\text { begin }} \\ {\text { for } k=1 \text { to } K \text { do }} \\ {\qquad \begin{array}{l}{\lambda_{k}=\sum_{i=1}^{I} \delta\left[\mathbf{x}_{i}-k\right] / I} \\ {\text { end }}\end{array}}\end{array} Input : Multi-valued training data { x i } i = 1 I Output: ML estimate of categorical parameters θ = { λ 1 … λ k } begin for k = 1 to K do λ k = ∑ i = 1 I δ [ x i − k ] / I end
這部分學習的代碼如下:
void max_likelihood_categorical_distribution_parameters ( )
{
vector< int > data;
data = generate_categorical_distribution_data ( 1000 ) ;
std:: map< int , double > hist{ } ;
for ( int i = 0 ; i < data. size ( ) ; i++ )
{
++ hist[ data[ i] ] ;
}
double total_p = 0 ;
for ( int i = 0 ; i < hist. size ( ) ; i++ )
{
hist. at ( i) = hist. at ( i) / data. size ( ) ;
total_p + = hist. at ( i) ;
std:: cout << hist. at ( i) << std:: endl;
}
std:: cout << "total_p: " << total_p << std:: endl;
}
在map結構的hist中存儲的就是數據的最大似然分佈。