Probabilistic Graphical Modeling概率圖模型學習筆記

0. learning materials

1. Introduction

A great amout of problems we have are to model the real world with some kind of functions (the most direct example is estimation fitting problems, and further our deep learning, machine learning algorithms are mostly fitting the real problem with a function).

  • But most of the time, the measurement involves a significant amont of uncertainty (in another work “error”). As a result, our measurements are actually following a probability distribution. This introduces the Probability Theory.
  • And the measurements are dependenting on each other, and sometimes we cannot find the exact expression of these relationship, but we know it exists, and we can know some properties of their relation, by prior knowledges. And this introduces the Graph modeling.

As a result, Probabilistic Graphical Modeling is concived to solve such kinds of questions.
There are three main elements in PGM :

  • Representation : How to specify a model ? Normally, we have Bayesian network for a Directed Acyclic Graph; and Markov Random Field for a Undirected graph representation. (And of course, we have other models)
  • Inference : How to ask the model questions ? For example, the Marginal inference telling the probability of a given variable, when we sum over every other variables. And Maximum a posterior inference to tell the most likely assignment of variables.
  • Learning : How to fit a model to real-world data ? Inference and learning have a special link. Inference is a key to learning.

2. Representation

2.1 Bayesian network

It is a directed acyclic graph, and it can deal with variables with causality (the variables, which have directed relationship).

p(x1,x2,x3,..,xn)=p(x1)p(x2x1)...p(xnxn1,...,x2,x1) p(x_{1}, x_{2}, x_{3},..,x_{n}) = p(x_{1})p(x_{2} | x_{1}) ...p(x_{n}|x_{n-1}, ...,x_{2},x_{1})

Based on these relationship, a directed graph could be built, and further the probabilty expression could be formed. That the variable only depends on some of the ancestors AiA_{i}.

p(xixi1,...,x2,x1)=p(xixAi) p(x_{i}|x_{i-1}, ... ,x_{2}, x_{1}) = p(x_{i}|x_{A_{i}})

Fromal definition:
A bayesian network is a direct graph G=(V,E)G=(V,E) together with :

  • A random variable xix_{i} for each Node iVi \in V.
  • One conditional probability distribution (CPD) p(xixAi)p(x_{i}|x_{A_{i}}) per node, specifying the probability of xix_{i} conditioned on its parents’ values. (in another word Edge)

End of definition

  • When GG contains cycles, its associated probability may not sum to one.
  • Certain variables could be independent. (this will help to build a more efficient inference)
  • Common Parent. ABCA \leftarrow B \to C: if BB is observed, then ACBA \bot C | B (p(ACB)=p(AB)p(CB)p(AC|B)=p(A|B)p(C|B)), if BB is unobserved, (p(AC)p(A)p(C)p(AC) \ne p(A)p(C))。
  • Cascade. ABCA \to B \to C: if BB is observed, then ACBA \bot C | B (p(ACB)=p(AB)p(CB)p(AC|B)=p(A|B)p(C|B)), if BB is unobserved, (p(AC)p(A)p(C)p(AC) \ne p(A)p(C))。
  • V-structure. ABCA \to B \leftarrow C: if BB is unobserved, then ACBA \bot C | B (p(AC)=p(A)p(C)p(AC) = p(A)p(C)), if BB is observed, (p(ACB)p(AB)p(CB)p(AC|B) \ne p(A|B)p(C|B))。

2.2 Markov Random Fields

There are cases where Bayesian network cannot describe. But Markov Random Fields (a undirected graph) can solve some. And this expression is used more in computer vision area.

For an example, the friendship, should not be expressed by directed edge. which is also hard to express with conditional probability. Which is nature to introduce the undirected edge in Markov Random Fields.

  • The edge becomes more like a interaction that push the variables, it likes a force, a potential energy.
  • And it requires less prior about the variables, as we do not know their exact dependence relationship. All we need to know is there exist a interaction.
  • As they are more like a potential energy other than a probability, we should not forget to add a normalization term in our expression.

Fromal definition:
A Markov Random Field (MRF) is a probability distribution pp over variables x1,x2,...,xnx_{1},x_{2},...,x_{n} defined by an undirected graph GG in which nodes correspond to variables xix_{i}. The probability pp has the form :
p(x1,x2,...,xn)=1ZcCϕc(xc) p(x_{1},x_{2},...,x_{n}) = \frac{1}{Z}\prod_{c \in C}\phi_{c}(x_{c})
Where CC donates the set of cliques (fully connected subgraphs) of GG, and each factor ϕc\phi_{c} is an nonegative function over the variables in the clique. The partition function :
Z=x1,x2,...,xncCϕc(xc) Z = \sum_{x_{1},x_{2},...,x_{n}}\prod_{c \in C}\phi_{c}(x_{c})
is a normalizing constant that ensures that the distribution sums to one.
End of definition

  • MRF can express more non-directional assications.
  • But it takes more to calculate then Bayesian Nets, especially the noramlization term ZZ.
  • Bayesian Nets are computational easier.

A Bayesian network can always be converted into an undirected network with normalization constant one. The converse is also possible, but may be computationally intractable, and may produce a very large (e.g. fully connected) directed graph.

2.3 Conditional Random Fields

It is a special case of Markov Random Fields, applied to model a conditional probability distribution.
a conditional random field results in an instantiation of a new Markov Random Field for each input x.

2.4 Factor Graph

A factor graph is a bipartite graph where one group is the variables in the distribution being modeled, and the other group is the factors defined on these variables. Edges go between factors and variables that those factors depend on.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章