概率圖模型11:Minimal I-Maps

作者：孫相國

1. 引言

如我們之前討論過的，實際問題中，變量的聯合概率分佈的原子情況往往非常巨大，我們根本不可能，或者說我們的數據也不可能把所有的情況都囊括其中。這就意味着，我們很難全面的發現這個真實的概率分佈，而我們所能夠做到的就是根據已有的數據，儘可能的發掘這個真是概率分佈中的獨立性子集。然後構建一個滿足這個獨立性子集的I-map。本節的工作是：給定一個概率分佈P ，我們能在多大程度上構建出一個圖 ，使得這個圖爲P 的一個I−Map 呢？一般的情況是，根據一部分獨立性集合，我們可以構建多個字圖。爲此我們希望找一個特殊的。

2.回顧

定理1:令 是定義在變量集 上的一個貝葉斯網絡，並且P 是同一個空間上的聯合分佈。如果 是P 的一個I-map，那麼P 根據 因子分解。

證明：

假定X1,X2,⋯,Xn 的順序就是圖 的一個拓撲序。

由概率的鏈式法則有：

$> P (X 1, \dots, X n) = P (X 1) P (X 2 | X 1) P (X 3 | X 1, X 2) \dots P (X n | X 1, \dots, X n - 1) >$
由於 爲I-map,因此 中蘊含了如下的獨立性論斷：l()={(Xi⊥NonDescendantsXi|PaXi):Xi∈X1:n} .且l()⊆(P) 。
由於X1,X2,⋯,Xn 是圖 的一個拓撲序，因此對於式子(11) 中的任意一項P(Xi|X1,⋯,Xi−1) ，Xi 的所有父節點都在集合{X1,⋯,Xi−1} 中，並且這個集合不存在任何Xi 的後代節點,即：{X1,⋯,Xi−1}=PaXi∪Z,Z⊆NonDescendantsXi ，根據獨立性論斷l() 和條件獨立性分解性質，有：P(Xi|X1,⋯,Xi−1)=P(Xi|PaXi∪Z)=P(Xi|PaXi) ，進而有公式(9) .

得證

定理2:令 是定義在變量集 上的一個貝葉斯網絡，並且P 是同一個空間上的聯合分佈。如果P 根據 因子分解，那麼 是P 的一個I-map。

令P 是某個根據Gstudents 因子分解的概率分佈。我們需要證明(Gstudents) 在P 中成立。考慮任意隨機變量Xk 的獨立性假設(Xk⊥NonDescendantsXk|PaXk) ，爲了證明其在P中成立，需要證明：

P (X k | N o n D e s c e n d a n t s X k, P a  X k) = P (X k | P a  X k) (1)

根據定義，

P (X k | N o n D e s c e n d a n t s X k, P a  X k) = P ( X k , N o n D e s c e n d a n t s X k , P a  X k ) P ( N o n D e s c e n d a n t s X k , P a  X k ) (2)

根據貝葉斯網的鏈式法則，分式的分子爲：

P (X k, N o n D e s c e n d a n t s X k, P a  X k) = Π X i \notin D e s c e n d a n t s X k P (X i | P a  X i) (3)

通過對聯合分佈執行邊緣化，分式的分母爲：

P (N o n D e s c e n d a n t s X k, P a  X k) = \sum X k P (X k, N o n D e s c e n d a n t s X k, P a  X k) = \sum X k Π X i \notin D e s c e n d a n t s X k P (X i | P a  X i) = \sum X k P (X k | P a  X k) Π X i \notin D e s c e n d a n t s X k, X i \neq X k P (X i | P a  X i) = Π X i \notin D e s c e n d a n t s X k, X i \neq X k P (X i | P a  X i) \sum X k P (X k | P a  X k) = Π X i \notin D e s c e n d a n t s X k, X i \neq X k P (X i | P a  X i) (4)

這樣，

(2) 可以寫爲：

P (X k | N o n D e s c e n d a n t s X k, P a  X k) = P ( X k , N o n D e s c e n d a n t s X k , P a  X k ) P ( N o n D e s c e n d a n t s X k , P a  X k ) = Π X i \notin D e s c e n d a n t s X k P ( X i | P a  X i ) Π X i \notin D e s c e n d a n t s X k , X i \neq X k P ( X i | P a  X i ) = P ( X k | P a  X k ) Π X i \notin D e s c e n d a n t s X k , X i \neq X k P ( X i | P a  X i ) Π X i \notin D e s c e n d a n t s X k , X i \neq X k P ( X i | P a  X i ) = P (X k | P a  X k)

證畢

3. minimal I-map

A graph  is a minimal I-map for a set of independencies  if it is an I-map for  , and if the removal of even a single edge from  renders it not an I-map.

第2節的定理1和定理2爲我們找到minimal I-map提供了依據，We assume we are given a predetermined variable ordering, say, {X 1 , … , X n }. We now examine each variable X i , i = 1, … , n in turn. For each X i , we pick some minimal subset U of {X 1 , … , X i−1 } to be X i ’s parents in G. More precisely, we require that U satisfy (X i ⊥ {X 1 , … , X i−1 } − U | U), and that no node can be removed from U without violating this property. We then set U to be the parents of X i .

The proof of theorem 1 tells us that, if each node X i is independent of X 1 , … , X i−1 given its parents in G, then P factorizes over G. We can then conclude from theorem 3.2 that G is an I-map for P. By construction, G is minimal, so that G is a minimal I-map for P.

事實上，給定一個拓撲序列，找Xi 節點的父節點最小集U ，這個最小集U 的尋找並不是唯一的，例如有X1,X2,X3 這3個節點，其中X1,X2 在邏輯上等價（如下圖），那麼我們可以選擇X1,X2 中的任一個節點作爲X3 的父節點，不過一旦選擇了一個，就不等選擇另一個了,Hence, the minimal parent set U in our construction is not necessarily unique.

However, one can show that, if the distribution is positive (see deﬁnition 2.5), that is, if for any instantiation ξ to all the network variables X we have that P(ξ) > 0, then the choice of parent set, given an ordering, is unique. Under this assumption, algorithm 3.2 can produce all minimal I-maps for P: Let G be any minimal I-map for P. If we give call Build-Minimal-I-Map with an ordering ≺ that is topological for G, then, due to the uniqueness argument, the algorithm must return G.

At ﬁrst glance, the minimal I-map seems to be a reasonable candidate for capturing the structure in the distribution: It seems that if G is a minimal I-map for a distribution P, then we should be able to “read oﬀ” all of the independencies in P directly from G. Unfortunately, this intuition is false.

A distribution P is said to be positive if for all events α ∈ S such that α = ∅, we have that P(α) > 0.

4. Minimal I-Map的問題

Note that the graphs in ﬁgure 3.8b,c really are minimal I-maps for this distribution. However, they fail to capture some or all of the independencies that hold in the distribution. Thus, they show that the fact that G is a minimal I-map for P is far from a guarantee that G captures the independence structure in P.

Consider the distribution P B student , as deﬁned in ﬁgure 3.4, and let us go through the process of constructing a minimal I-map for P B student . We note that the graph G student precisely reﬂects the

independencies in this distribution P B student (that is, I(P B student ) = I(G student )), so that we can use G student to determine which independencies hold in P B student .

Our construction process starts with an arbitrary ordering on the nodes; we will go through this process for three diﬀerent orderings. Throughout this process, it is important to remember that we are testing independencies relative to the distribution P B student . We can use G student (ﬁgure 3.4) to guide our intuition about which independencies hold in P B student , but we can always resort to testing these independencies in the joint distribution P B student .

The ﬁrst ordering is a very natural one: D, I, S, G, L. We add one node at a time and see which of the possible edges from the preceding nodes are redundant. We start by adding D, then I. We can now remove the edge from D to I because this particular distribution satisﬁes (I ⊥ D), so I is independent of D given its other parents (the empty set). Continuing on, we add S, but we can remove the edge from D to S because our distribution satisﬁes (S ⊥ D | I). We then add G, but we can remove the edge from S to G, because the distribution satisﬁes (G ⊥ S | I, D).

Finally, we add L, but we can remove all edges from D, I, S. Thus, our ﬁnal output is the graph in ﬁgure 3.8a, which is precisely our original network for this distribution.

Now, consider a somewhat less natural ordering: L, S, G, I, D. In this case, the resulting I-map is not quite as natural or as sparse. To see this, let us consider the sequence of steps. We start by adding L to the graph. Since it is the ﬁrst variable in the ordering, it must be a root. Next, we consider S. The decision is whether to have L as a parent of S. Clearly, we need an edge from L to S, because the quality of the student’s letter is correlated with his SAT score in this distribution, and S has no other parents that help render it independent of L. Formally, we have that (S ⊥ L) does not hold in the distribution. In the next iteration of the algorithm, we introduce G. Now, all possible subsets of {L, S} are potential parents set for G. Clearly, G is dependent on L. Moreover, although G is independent of S given I, it is not independent of S given L. Hence, we must add the edge between S and G. Carrying out the procedure, we end up with the graph shown in ﬁgure 3.8b.

Finally, consider the ordering: L, D, S, I, G. In this case, a similar analysis results in the graph shown in ﬁgure 3.8c, which is almost a complete graph, missing only the edge from S to G, which we can remove because G is independent of S given I.

爲了解決這樣的問題，我們接下來將要提到的概念是P-Maps，請見此係列下一篇博文。

相國大人

發佈了131 篇原創文章 · 獲贊 261 · 訪問量 36萬+

私信關注

概率圖模型11:Minimal I-Maps

1. 引言

2.回顧

3. minimal I-map

4. Minimal I-Map的問題

如何基於surging跨網關跨語言進行緩存降級

2024合集

程序員天天 CURD，怎麼才能成長，職業發展的思考(2)

移位操作搞定兩數之商

教你用Perl實現Smgp協議

如何通過前端表格控件在10分鐘內完成一張分組報表？

win11關閉自動檢測病毒刪文件

通用代碼生成器簡介

lightdb 單機模式下數據庫平移

千兆寬帶實際網速能到達多少？

Django開發速成

Catalogue/update in 2018.09.10

The Collection of classic codes with Python(updated in 2019.03.22)

Vue2.js工程實踐2:故障集結

OSN博士必須掌握的必殺技(更新至2017/12/15)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結