- From course Probabilistic Models and Inference Algorithms for Machine Learning, Prof. Dahua Lin
- All contents here are from the course and self understandings.
Basic Concepts
- The key idea behind graphical models is factorization
- A graphical model generally refers to a family of joint distributions over multiple variables that factorize according to the structure of the underlying graph.
- 可以從兩方面來理解 graphical models:
- 是個 數據結構(data structure), 這個數據結構呢能通過分解的方式描述聯合分佈(a joint distribution in a factorized manner.)
- 一種緊湊的一系列條件獨立(conditional independencies)的分佈(a family of distributions.)的表示方法
- 以上這兩點實際上是等價的。
Graphical Models 的類別:
- Bayesian Networks (Directed Acyclic Graphs)
- Markov Random Fields (Undirected Graphs)
- Chain Graphs (Directed acyclic graphs over undirected components)
- Factor Graphs
Directed Acyclic Graph
- A graph G is called a directed acyclic graph (DAG)
if it has no directed cycles. (即每月 自環的現象) - 因爲有向圖(directed graph)是有方向的,所以對圖中的有向邊來說,是要分parent 和 child 的
- A vertex s is called an ancestor of t and t an descendant of s, denoted as s ≺ t, if there exists
a directed path from s to t. - Topological Ordering: A topological ordering of a directed graph G = (V, E) is a linear ordering of vertices such that for each edge (s, t) ∈ E, s always comes before t.
- A finite directed graph is acyclic if and only if it has a topological ordering.
- A graph G is called a directed acyclic graph (DAG)
Bayesian Networks
- Given a DAG G = (V, E), we say a joint distribution over
XV factorizes according to G, if its density p can be expressed as
p(xV)=∏s∈Vps(xs|xπ(s))
- Such a model is called a Bayesian Network over G.
π(s) is the set ofs `s parents, which can be empty- example:
Markov Random Fields
考慮一個無向圖 G = (V, E)
- clique: is a fully connected subset of vertices
- A clique is called maximal if it is not properly contained in another clique. (指當另一個點加進來的時候,這個clique 就變得 不 clique 了)
C(G) denotes the set of all maximal cliques.- example
Markov Random Fields
- Consider an undirected graph
G=(V,E) , we say a joint distribution ofXV factorizes according to G if its density p can be expressed as
p(xV)=1X∏C∈CψC(xC) - This is called a Markov Random Field over G.
ψC:XC→R+ are called factors- The normalizing constant Z is usually needed to ensure the distribution is properly normalized:
Z=∫∏C∈C(G)ψC(xC)μ(dx) ψC 稱爲 compatibility functions,它不需要服從marginal or conditional distributions.
- Consider an undirected graph
Analysis of Conditional Independence
- The graphical structure also encodes a set of conditional independencies among the variables.
- Consider a joint distribution over (X, Y, Z), X and Y are called conditionally independent given Z, denoted by
X⊥Y|Z ifPr(X∈A&Y∈B|Z)=Pr(X∈A|Z)Pr(Y∈B|Z)
or more generally
EX,Y|Z[f(X)g(Y)]=EX|Z[f(X)]EY|Z[g(Y)]
- Suppose the conditional distributions
X|Z andY|Z have densitiespX|z andpY|z , thenX⊥Y|Z , if the following equality holds almost
surely:p(X,Y)|z(x,y)=pX|z(x)pY|z(y)
- Suppose the conditional distributions
Factor Graphs
- An MRF does not always fully reveal the factorized structure of a distribution.
- A factor graph can sometimes give a more accurate characterization of a family of distributions.
- A factor graph is a bipartite graph with links between two types of nodes: variables and factors.
- A variable x and a factor f is linked in a factor graph, if the factor involves x as an argument.