最近在看計算機視覺:模型學習和推理,現在在使用c++實現裏面的代碼。本篇博客使用c++實現基於貝葉斯方法學習分類分佈的參數。
Pr ( λ 1 … λ k ∣ x 1 … I ) = ∏ i = 1 I Pr ( x i ∣ λ 1 … k ) Pr ( λ 1 … k ) Pr ( x 1 … I ) = ∏ i = 1 I Cat x i [ λ 1 … k ] Dir λ 1 … k [ α 1 … k ] Pr ( x 1 … I ) = κ ( α 1 … k , x 1 … I ) Dir λ 1 … k [ α ~ 1 … k ] Pr ( x 1 … I ) = Dir λ 1 … k [ α ~ 1 … k ] \begin{aligned} \operatorname{Pr}\left(\lambda_{1} \ldots \lambda_{k} | x_{1 \ldots I}\right) &=\frac{\prod_{i=1}^{I} \operatorname{Pr}\left(x_{i} | \lambda_{1 \ldots k}\right) \operatorname{Pr}\left(\lambda_{1 \ldots k}\right)}{\operatorname{Pr}\left(x_{1 \ldots I}\right)} \\ &=\frac{\prod_{i=1}^{I} \operatorname{Cat}_{x_{i}}\left[\lambda_{1 \ldots k}\right] \operatorname{Dir}_{\lambda_{1} \ldots k}\left[\alpha_{1 \ldots k}\right]}{\operatorname{Pr}\left(x_{1 \ldots I}\right)} \\ &=\frac{\kappa\left(\alpha_{1 \ldots k}, x_{1 \ldots I}\right) \operatorname{Dir}_{\lambda_{1 \ldots k}}\left[\tilde{\alpha}_{1 \ldots k}\right]}{\operatorname{Pr}\left(x_{1 \ldots I}\right)} \\ &=\operatorname{Dir}_{\lambda_{1 \ldots k}\left[\tilde{\alpha}_{1 \ldots k}\right]} \end{aligned} P r ( λ 1 … λ k ∣ x 1 … I ) = P r ( x 1 … I ) ∏ i = 1 I P r ( x i ∣ λ 1 … k ) P r ( λ 1 … k ) = P r ( x 1 … I ) ∏ i = 1 I C a t x i [ λ 1 … k ] D i r λ 1 … k [ α 1 … k ] = P r ( x 1 … I ) κ ( α 1 … k , x 1 … I ) D i r λ 1 … k [ α ~ 1 … k ] = D i r λ 1 … k [ α ~ 1 … k ]
使用貝葉斯去做預測:
Pr ( x ∗ ∣ x 1 … I ) = ∫ Pr ( x ∗ ∣ λ 1 … k ) Pr ( λ 1 … k ∣ x 1 … I ) d λ 1 … k = ∫ Cat x ∗ [ λ 1 … k ] Dir λ 1 … k [ α ~ 1 … k ] d λ 1 … k = ∫ κ ( x ∗ , α ~ 1 … k ) Dir λ 1 … k [ α ˘ 1 … k ] d λ 1 … k = κ ( x ∗ , α ~ 1 … k ) \begin{aligned} \operatorname{Pr}\left(x^{*} | x_{1 \ldots I}\right) &=\int \operatorname{Pr}\left(x^{*} | \lambda_{1 \ldots k}\right) \operatorname{Pr}\left(\lambda_{1 \ldots k} | x_{1 \ldots I}\right) d \lambda_{1 \ldots k} \\ &=\int \operatorname{Cat}_{x^{*}}\left[\lambda_{1 \ldots k}\right] \operatorname{Dir}_{\lambda_{1 \ldots k}}\left[\tilde{\alpha}_{1 \ldots k}\right] d \lambda_{1 \ldots k} \\ &=\int \kappa\left(x^{*}, \tilde{\alpha}_{1 \ldots k}\right) \operatorname{Dir}_{\lambda_{1 \ldots k}}\left[\breve{\alpha}_{1 \ldots k}\right] d \lambda_{1 \ldots k} \\ &=\kappa\left(x^{*}, \tilde{\alpha}_{1 \ldots k}\right) \end{aligned} P r ( x ∗ ∣ x 1 … I ) = ∫ P r ( x ∗ ∣ λ 1 … k ) P r ( λ 1 … k ∣ x 1 … I ) d λ 1 … k = ∫ C a t x ∗ [ λ 1 … k ] D i r λ 1 … k [ α ~ 1 … k ] d λ 1 … k = ∫ κ ( x ∗ , α ~ 1 … k ) D i r λ 1 … k [ α ˘ 1 … k ] d λ 1 … k = κ ( x ∗ , α ~ 1 … k )
結果表示爲:
Pr ( x ∗ = k ∣ x 1 … I ) = κ ( x ∗ , α ~ 1 … k ) = N k + α ~ k ∑ j = 1 k ( N j + α ~ j ) \operatorname{Pr}\left(x^{*}=k | x_{1 \ldots I}\right)=\kappa\left(x^{*}, \tilde{\alpha}_{1 \ldots k}\right)=\frac{N_{k}+\tilde{\alpha}_{k}}{\sum_{j=1}^{k}\left(N_{j}+\tilde{\alpha}_{j}\right)} P r ( x ∗ = k ∣ x 1 … I ) = κ ( x ∗ , α ~ 1 … k ) = ∑ j = 1 k ( N j + α ~ j ) N k + α ~ k
算法流程如下:
Input : Categorical training data { x i } i = 1 I , Hyperparameters { α k } k = 1 K Output: Posterior parameters { α ~ k } k = 1 K , predictive distribution Pr ( x ∗ ∣ x 1 … I ) begin l compute categsorical posterior over λ for k = l to K do Evaluate new datapoint under predictive distribution Evaluate new datapoint under predictive distribution for k = 1 to K do tr ( x ∗ = k ∣ x 1 … I ) = α k ~ / ( ∑ m = 1 K α ~ m ) end \begin{array}{l}{\text { Input : Categorical training data }\left\{x_{i}\right\}_{i=1}^{I}, \text { Hyperparameters }\left\{\alpha_{k}\right\}_{k=1}^{K}} \\ {\text { Output: Posterior parameters }\left\{\tilde{\alpha}_{k}\right\}_{k=1}^{K}, \text { predictive distribution } \operatorname{Pr}\left(x^{*} | \mathbf{x}_{1} \ldots I\right)} \\ {\text { begin }} \\ {\text { l compute categsorical posterior over } \lambda} \\ {\text { for } k=l \text { to } K \text { do }} \\ {\text { Evaluate new datapoint under predictive distribution }} \\ {\text { Evaluate new datapoint under predictive distribution }} \\ {\text { for } k=1 \text { to } K \text { do }} \\ {\quad \quad \operatorname{tr}\left(x^{*}=k | \mathbf{x}_{1 \ldots I}\right)=\tilde{\alpha_{k}} /\left(\sum_{m=1}^{K} \tilde{\alpha}_{m}\right)} \\ {\text { end }}\end{array} Input : Categorical training data { x i } i = 1 I , Hyperparameters { α k } k = 1 K Output: Posterior parameters { α ~ k } k = 1 K , predictive distribution P r ( x ∗ ∣ x 1 … I ) begin l compute categsorical posterior over λ for k = l to K do Evaluate new datapoint under predictive distribution Evaluate new datapoint under predictive distribution for k = 1 to K do t r ( x ∗ = k ∣ x 1 … I ) = α k ~ / ( ∑ m = 1 K α ~ m ) end
代碼如下:
void Bayesian_categorical_distribution_parameters ( )
{
vector< int > data;
data = generate_categorical_distribution_data ( 100000 ) ;
std:: map< int , double > hist{ } ;
for ( int i = 0 ; i < data. size ( ) ; i++ )
{
++ hist[ data[ i] ] ;
}
vector< double > alpha_v;
vector< double > alpha_v_post;
for ( int i = 0 ; i < hist. size ( ) ; i++ )
{
alpha_v. push_back ( 1.0 ) ;
}
double total_p = 0 ;
for ( int i = 0 ; i < hist. size ( ) ; i++ )
{
alpha_v_post. push_back ( alpha_v[ i] + hist. at ( i) ) ;
}
double down = 0 ;
for ( int i = 0 ; i < hist. size ( ) ; i++ )
{
down + = alpha_v_post[ i] ;
}
for ( int i = 0 ; i < hist. size ( ) ; i++ )
{
hist. at ( i) = alpha_v_post[ i] / down;
total_p + = hist. at ( i) ;
std:: cout << hist. at ( i) << std:: endl;
}
cout << "total_p: " << total_p << endl;
}
在書中作者的代碼給出瞭如下兩張圖,作爲對貝葉斯方法的解釋