Probabilistic Graphical Modeling概率圖模型學習筆記

0. learning materials

1. Introduction

A great amout of problems we have are to model the real world with some kind of functions (the most direct example is estimation fitting problems, and further our deep learning, machine learning algorithms are mostly fitting the real problem with a function).

But most of the time, the measurement involves a significant amont of uncertainty (in another work “error”). As a result, our measurements are actually following a probability distribution. This introduces the Probability Theory.
And the measurements are dependenting on each other, and sometimes we cannot find the exact expression of these relationship, but we know it exists, and we can know some properties of their relation, by prior knowledges. And this introduces the Graph modeling.

As a result, Probabilistic Graphical Modeling is concived to solve such kinds of questions.
There are three main elements in PGM :

Representation : How to specify a model ？ Normally, we have Bayesian network for a Directed Acyclic Graph; and Markov Random Field for a Undirected graph representation. (And of course, we have other models)
Inference : How to ask the model questions ？ For example, the Marginal inference telling the probability of a given variable, when we sum over every other variables. And Maximum a posterior inference to tell the most likely assignment of variables.
Learning : How to fit a model to real-world data ? Inference and learning have a special link. Inference is a key to learning.

2. Representation

2.1 Bayesian network

It is a directed acyclic graph, and it can deal with variables with causality (the variables, which have directed relationship).

$p(x_{1}, x_{2}, x_{3},..,x_{n}) = p(x_{1})p(x_{2} | x_{1}) ...p(x_{n}|x_{n-1}, ...,x_{2},x_{1})$

Based on these relationship, a directed graph could be built, and further the probabilty expression could be formed. That the variable only depends on some of the ancestors $A_{i}$ .

$p(x_{i}|x_{i-1}, ... ,x_{2}, x_{1}) = p(x_{i}|x_{A_{i}})$

Fromal definition:
A bayesian network is a direct graph $G=(V,E)$ together with :

A random variable $x_{i}$ for each Node $i \in V$ .
One conditional probability distribution (CPD) $p(x_{i}|x_{A_{i}})$ per node, specifying the probability of $x_{i}$ conditioned on its parents’ values. (in another word Edge)

End of definition

When $G$ contains cycles, its associated probability may not sum to one.
Certain variables could be independent. (this will help to build a more efficient inference)
Common Parent. $A \leftarrow B \to C$ : if $B$ is observed, then $A \bot C | B$ ( $p(AC|B)=p(A|B)p(C|B)$ )， if $B$ is unobserved, ( $p(AC) \ne p(A)p(C)$ )。
Cascade. $A \to B \to C$ : if $B$ is observed, then $A \bot C | B$ ( $p(AC|B)=p(A|B)p(C|B)$ )， if $B$ is unobserved, ( $p(AC) \ne p(A)p(C)$ )。
V-structure. $A \to B \leftarrow C$ : if $B$ is unobserved, then $A \bot C | B$ ( $p(AC) = p(A)p(C)$ )， if $B$ is observed, ( $p(AC|B) \ne p(A|B)p(C|B)$ )。

2.2 Markov Random Fields

There are cases where Bayesian network cannot describe. But Markov Random Fields (a undirected graph) can solve some. And this expression is used more in computer vision area.

For an example, the friendship, should not be expressed by directed edge. which is also hard to express with conditional probability. Which is nature to introduce the undirected edge in Markov Random Fields.

The edge becomes more like a interaction that push the variables, it likes a force, a potential energy.
And it requires less prior about the variables, as we do not know their exact dependence relationship. All we need to know is there exist a interaction.
As they are more like a potential energy other than a probability, we should not forget to add a normalization term in our expression.

Fromal definition:
A Markov Random Field (MRF) is a probability distribution $p$ over variables $x_{1},x_{2},...,x_{n}$ defined by an undirected graph $G$ in which nodes correspond to variables $x_{i}$ . The probability $p$ has the form :
$p(x_{1},x_{2},...,x_{n}) = \frac{1}{Z}\prod_{c \in C}\phi_{c}(x_{c})$
Where $C$ donates the set of cliques (fully connected subgraphs) of $G$ , and each factor $\phi_{c}$ is an nonegative function over the variables in the clique. The partition function :
$Z = \sum_{x_{1},x_{2},...,x_{n}}\prod_{c \in C}\phi_{c}(x_{c})$
is a normalizing constant that ensures that the distribution sums to one.
End of definition

MRF can express more non-directional assications.
But it takes more to calculate then Bayesian Nets, especially the noramlization term $Z$ .
Bayesian Nets are computational easier.

A Bayesian network can always be converted into an undirected network with normalization constant one. The converse is also possible, but may be computationally intractable, and may produce a very large (e.g. fully connected) directed graph.

2.3 Conditional Random Fields

It is a special case of Markov Random Fields, applied to model a conditional probability distribution.
a conditional random field results in an instantiation of a new Markov Random Field for each input x.

2.4 Factor Graph

A factor graph is a bipartite graph where one group is the variables in the distribution being modeled, and the other group is the factors defined on these variables. Edges go between factors and variables that those factors depend on.

Probabilistic Graphical Modeling概率圖模型學習筆記

Probabilistic Graphical Modeling概率圖模型學習筆記

0. learning materials

1. Introduction

2. Representation

2.1 Bayesian network

2.2 Markov Random Fields

2.3 Conditional Random Fields

2.4 Factor Graph

PCL點雲特徵小結

C/C++ socket basic example with code

VINS 代碼閱讀分析 (1)

LOAM, ALOAM, LegoLOAM, hdl graph slam比較

CVX based SLAM algorithms paper read

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結