概率圖模型11:Minimal I-Maps

作者:孫相國

E-mail:[email protected]

1. 引言

如我們之前討論過的,實際問題中,變量的聯合概率分佈的原子情況往往非常巨大,我們根本不可能,或者說我們的數據也不可能把所有的情況都囊括其中。這就意味着,我們很難全面的發現這個真實的概率分佈,而我們所能夠做到的就是根據已有的數據,儘可能的發掘這個真是概率分佈中的獨立性子集。然後構建一個滿足這個獨立性子集的I-map。本節的工作是:給定一個概率分佈P ,我們能在多大程度上構建出一個圖 ,使得這個圖爲P 的一個IMap 呢?一般的情況是,根據一部分獨立性集合,我們可以構建多個字圖。爲此我們希望找一個特殊的。

2.回顧

定理1:令 是定義在變量集 上的一個貝葉斯網絡,並且P 是同一個空間上的聯合分佈。如果P 的一個I-map,那麼P 根據 因子分解。

證明:

假定X1,X2,,Xn 的順序就是圖 的一個拓撲序。

由概率的鏈式法則有:

>P(X1,,Xn)=P(X1)P(X2|X1)P(X3|X1,X2)P(Xn|X1,,Xn1)>

由於 爲I-map,因此 中蘊含了如下的獨立性論斷:l()={(XiNonDescendantsXi|PaXi):XiX1:n} .且l()(P)

由於X1,X2,,Xn 是圖 的一個拓撲序,因此對於式子(11) 中的任意一項P(Xi|X1,,Xi1)Xi 的所有父節點都在集合{X1,,Xi1} 中,並且這個集合不存在任何Xi 的後代節點,即:{X1,,Xi1}=PaXiZ,ZNonDescendantsXi ,根據獨立性論斷l() 和條件獨立性分解性質,有:P(Xi|X1,,Xi1)=P(Xi|PaXiZ)=P(Xi|PaXi) ,進而有公式(9) .

得證

定理2:令 是定義在變量集 上的一個貝葉斯網絡,並且P 是同一個空間上的聯合分佈。如果P 根據 因子分解,那麼P 的一個I-map。

P 是某個根據Gstudents 因子分解的概率分佈。我們需要證明(Gstudents)P 中成立。考慮任意隨機變量Xk 的獨立性假設(XkNonDescendantsXk|PaXk) ,爲了證明其在P中成立,需要證明:

P(Xk|NonDescendantsXk,PaXk)=P(Xk|PaXk)(1)

根據定義,
P(Xk|NonDescendantsXk,PaXk)=P(Xk,NonDescendantsXk,PaXk)P(NonDescendantsXk,PaXk)(2)

根據貝葉斯網的鏈式法則,分式的分子爲:
P(Xk,NonDescendantsXk,PaXk)=ΠXiDescendantsXkP(Xi|PaXi)(3)

通過對聯合分佈執行邊緣化,分式的分母爲:
P(NonDescendantsXk,PaXk)=XkP(Xk,NonDescendantsXk,PaXk)=XkΠXiDescendantsXkP(Xi|PaXi)=XkP(Xk|PaXk)ΠXiDescendantsXk,XiXkP(Xi|PaXi)=ΠXiDescendantsXk,XiXkP(Xi|PaXi)XkP(Xk|PaXk)=ΠXiDescendantsXk,XiXkP(Xi|PaXi)(4)

這樣,(2) 可以寫爲:
P(Xk|NonDescendantsXk,PaXk)=P(Xk,NonDescendantsXk,PaXk)P(NonDescendantsXk,PaXk)=ΠXiDescendantsXkP(Xi|PaXi)ΠXiDescendantsXk,XiXkP(Xi|PaXi)=P(Xk|PaXk)ΠXiDescendantsXk,XiXkP(Xi|PaXi)ΠXiDescendantsXk,XiXkP(Xi|PaXi)=P(Xk|PaXk)

證畢

3. minimal I-map

A graph is a minimal I-map for a set of independencies if it is an I-map for , and if the removal of even a single edge from renders it not an I-map.

第2節的定理1和定理2爲我們找到minimal I-map提供了依據,We assume we are given a predetermined variable ordering, say, {X 1 , … , X n }. We now examine each variable X i , i = 1, … , n in turn. For each X i , we pick some minimal subset U of {X 1 , … , X i−1 } to be X i ’s parents in G. More precisely, we require that U satisfy (X i ⊥ {X 1 , … , X i−1 } − U | U), and that no node can be removed from U without violating this property. We then set U to be the parents of X i .

The proof of theorem 1 tells us that, if each node X i is independent of X 1 , … , X i−1 given its parents in G, then P factorizes over G. We can then conclude from theorem 3.2 that G is an I-map for P. By construction, G is minimal, so that G is a minimal I-map for P.

Screen Shot 2017-11-20 at 2.28.54 PM

事實上,給定一個拓撲序列,找Xi 節點的父節點最小集U ,這個最小集U 的尋找並不是唯一的,例如有X1,X2,X3 這3個節點,其中X1,X2 在邏輯上等價(如下圖),那麼我們可以選擇X1,X2 中的任一個節點作爲X3 的父節點,不過一旦選擇了一個,就不等選擇另一個了,Hence, the minimal parent set U in our construction is not necessarily unique.

However, one can show that, if the distribution is positive (see definition 2.5), that is, if for any instantiation ξ to all the network variables X we have that P(ξ) > 0, then the choice of parent set, given an ordering, is unique. Under this assumption, algorithm 3.2 can produce all minimal I-maps for P: Let G be any minimal I-map for P. If we give call Build-Minimal-I-Map with an ordering ≺ that is topological for G, then, due to the uniqueness argument, the algorithm must return G.

Picture1

At first glance, the minimal I-map seems to be a reasonable candidate for capturing the structure in the distribution: It seems that if G is a minimal I-map for a distribution P, then we should be able to “read off” all of the independencies in P directly from G. Unfortunately, this intuition is false.

A distribution P is said to be positive if for all events α ∈ S such that α = ∅, we have that P(α) > 0.


4. Minimal I-Map的問題

Note that the graphs in figure 3.8b,c really are minimal I-maps for this distribution. However, they fail to capture some or all of the independencies that hold in the distribution. Thus, they show that the fact that G is a minimal I-map for P is far from a guarantee that G captures the independence structure in P.

Screen Shot 2017-11-20 at 3.28.04 PM

Consider the distribution P B student , as defined in figure 3.4, and let us go through the process of constructing a minimal I-map for P B student . We note that the graph G student precisely reflects the

independencies in this distribution P B student (that is, I(P B student ) = I(G student )), so that we can use G student to determine which independencies hold in P B student .

Our construction process starts with an arbitrary ordering on the nodes; we will go through this process for three different orderings. Throughout this process, it is important to remember that we are testing independencies relative to the distribution P B student . We can use G student (figure 3.4) to guide our intuition about which independencies hold in P B student , but we can always resort to testing these independencies in the joint distribution P B student .

The first ordering is a very natural one: D, I, S, G, L. We add one node at a time and see which of the possible edges from the preceding nodes are redundant. We start by adding D, then I. We can now remove the edge from D to I because this particular distribution satisfies (I ⊥ D), so I is independent of D given its other parents (the empty set). Continuing on, we add S, but we can remove the edge from D to S because our distribution satisfies (S ⊥ D | I). We then add G, but we can remove the edge from S to G, because the distribution satisfies (G ⊥ S | I, D).

Finally, we add L, but we can remove all edges from D, I, S. Thus, our final output is the graph in figure 3.8a, which is precisely our original network for this distribution.

Now, consider a somewhat less natural ordering: L, S, G, I, D. In this case, the resulting I-map is not quite as natural or as sparse. To see this, let us consider the sequence of steps. We start by adding L to the graph. Since it is the first variable in the ordering, it must be a root. Next, we consider S. The decision is whether to have L as a parent of S. Clearly, we need an edge from L to S, because the quality of the student’s letter is correlated with his SAT score in this distribution, and S has no other parents that help render it independent of L. Formally, we have that (S ⊥ L) does not hold in the distribution. In the next iteration of the algorithm, we introduce G. Now, all possible subsets of {L, S} are potential parents set for G. Clearly, G is dependent on L. Moreover, although G is independent of S given I, it is not independent of S given L. Hence, we must add the edge between S and G. Carrying out the procedure, we end up with the graph shown in figure 3.8b.

Finally, consider the ordering: L, D, S, I, G. In this case, a similar analysis results in the graph shown in figure 3.8c, which is almost a complete graph, missing only the edge from S to G, which we can remove because G is independent of S given I.

爲了解決這樣的問題,我們接下來將要提到的概念是P-Maps,請見此係列下一篇博文。

發佈了131 篇原創文章 · 獲贊 261 · 訪問量 36萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章