文章目錄

Probabilistic Graphical Models

Statistical and Algorithmic Foundations of Deep Learning

Author: Eric Xing

01 An overview of DL components

Historical remarks: early days of neural networks

我們知道生物神經元是這樣的：

上游細胞通過軸突（Axon）將神經遞質傳送給下游細胞的樹突。人工智能受到該原理的啓發，是按照下圖來構造人工神經元（或者是感知器）的。

類似的，生物神經網絡 —— > 人工神經網絡
![在這裏插入圖片描述](https://img-blog.csdnimg.cn/2020051209264072.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L05HVWV2ZXIxNQ==,size_16,color_FFFFFF,t_70Reverse-mode automatic differentiation (aka backpropagation)

Reverse-mode automatic differentiation (aka backpropagation)

下面我們來看看具體的感知器學習算法。

假設這是一個迴歸問題x->y， $y = f(x)+\eta$ $, 則目標函數爲

爲了求出該函數的解，我們需要對其求導，具體的：

其中

由此 $w$ 的更新公式爲：

下面我們來說說神經網絡模型：

其中，隱藏單元沒有目標。

人工神經網絡不過是可以由計算圖表示的複雜功能組成。

通過應用鏈式規則並使用反向累積，我們得到：

該算法通常稱爲反向傳播。如果某些功能是隨機的怎麼辦？使用隨機反向傳播！現代軟件包可以自動執行此操作（稍後再介紹）

Modern building blocks: units, layers, activations functions, loss functions, etc.

常用激活函數：

Linear and ReLU
Sigmoid and tanh
Etc.

網絡層：

Fully connected
Convolutional & pooling
Recurrent
ResNets
Etc.
-

也就是說基本構成要素的可以任意組合，如果有多種損失功能的話，可以實現多目標預測和轉移學習等。只要有足夠的數據，更深的架構就會不斷改進。

Feature learning
成功學習中間表示[Lee et al ICML 2009，Lee et al NIPS 2009]

表示學習：網絡學習越來越多的抽象數據表示形式，這些數據被“解開”，即可以進行線性分離。

02 Similarities and differences between GMs and NNs

Graphical models vs. computational graphs

Graphical models:

用於以圖形形式編碼有意義的知識和相關的不確定性的表示形式
學習和推理基於經過充分研究（依賴於結構）的技術（例如EM，消息傳遞，VI，MCMC等）的豐富工具箱
圖形代表模型

Utility of the graph
一種用於從局部結構綜合全局損失函數的工具(潛在功能，特徵功能等)
一種設計合理有效的推理算法的工具(總和，均值場等)
激發近似和懲罰的工具(結構化MF，樹近似等)
用於監視理論和經驗行爲以及推理準確性的工具

Utility of the loss function

學習算法和模型質量的主要衡量指標

Deep neural networks :

學習有助於最終指標上的計算和性能的表示形式（中間表示形式不保證一定有意義）
學習主要基於梯度下降法（aka反向傳播）；推論通常是微不足道的，並通過“向前傳遞”完成
圖形代表計算

Utility of the network

概念上綜合複雜決策假設的工具（分階段的投影和聚合）
用於組織計算操作的工具（潛在狀態的分階段更新）
用於設計加工步驟和計算模塊的工具（逐層並行化）
在評估DL推理算法方面沒有明顯的用途

到目前爲止，圖形模型是概率分佈的表示，而神經網絡是函數近似器（無概率含義）。有些神經網絡實際上是圖形模型（即單位/神經元代表隨機變量）：

玻爾茲曼機器Boltzmann machines （Hinton＆Sejnowsky，1983）
受限制的玻爾茲曼機器Restricted Boltzmann machines（Smolensky，1986）
Sigmoid信念網絡的學習和推理Learning and Inference in sigmoid belief networks（Neal，1992）
深度信念網絡中的快速學習Fast learning in deep belief networks（Hinton，Osindero，Teh，2006年）
深度玻爾茲曼機器Deep Boltzmann machines（Salakhutdinov和Hinton，2009年）

接下來我們會逐一介紹他們。

I: Restricted Boltzmann Machines
受限玻爾茲曼機器，縮寫爲RBM。 RBM是用二部圖（bi-partite graph）表示的馬爾可夫隨機場，圖的一層/部分中的所有節點都連接到另一層中的所有節點；沒有層間連接。

聯合分佈爲：

單個數據點的對數似然度（不可觀察的邊際被邊緣化）：

對數似然比的梯度模型參數：

對數似然比的梯度參數（替代形式）：

兩種期望都可以通過抽樣來近似，從後部採樣是準確的（RBM在給定的h上分解）。通過MCMC從關節進行採樣（例如，吉布斯採樣）

在神經網絡文獻中：

計算第一項稱爲鉗位/喚醒/正相（網絡是“清醒的”，因爲它取決於可見變量）
計算第二項稱爲非固定/睡眠/自由/負相（該網絡“處於睡眠狀態”，因爲它對關節的可見變量進行了採樣；比喻，它夢見了可見的輸入）

通過隨機梯度下降（SGD）優化給定數據的模型對數似然來完成學習，第二項（負相）的估計嚴重依賴於馬爾可夫鏈的混合特性，這經常導致收斂緩慢並且需要額外的計算。

II: Sigmoid Belief Networks

Sigimoid信念網是簡單的貝葉斯網絡，其二進制變量的條件概率由Sigmoid函數表示：

貝葉斯網絡表現出一種稱爲“解釋效應”的現象：如果A與C相關，則B與C相關的機會減少。 ⇒在給定C的情況下A和B相互關聯。

值得注意的是，由於“解釋效應”，當我們以信念網絡中的可見層爲條件時，所有隱藏變量都將成爲因變量。

Sigmoid Belief Networks as graphical models

尼爾提出了用於學習和推理的蒙特卡洛方法（尼爾，1992年）：

RBMs are infinite belief networks
要對模型參數進行梯度更新，我們需要通過採樣計算期望值。

我們可以在第一階段從後驗中精確採樣
我們運行吉布斯塊抽樣，以從聯合分佈中近似抽取樣本

條件分佈 $p(v| h)$ 和 $p(h|v)$ 用sigmoid表示，因此，我們可以將以RBM表示的聯合分佈中的Gibbs採樣視爲無限深的Sigmoid信念網絡中的自頂向下傳播！

RBM等效於無限深的信念網絡。當我們訓練RBM時，實際上就是在訓練一個無限深的簡短網，只是所有圖層的權重都捆綁在一起。如果權重在某種程度上“統一”，我們將獲得一個深度信仰網絡。

Deep Belief Networks and Boltzmann Machines

III: Deep Belief Nets

DBN是混合圖形模型（鏈圖）。其聯合概率分佈可表示爲：

其中蘊含的挑戰：
由於explaining away effect，因此在DBN中進行精確推斷是有問題的
訓練分兩個階段進行：

貪婪的預訓練+臨時微調；沒有適當的聯合訓練
近似推斷爲前饋（自下而上）

Layer-wise pre-training

預訓練並凍結第一個RBM
在頂部堆疊另一個RBM並對其進行訓練
重物2層以上的重物保持綁緊狀態
我們重複此過程：預訓練和解開

Fine-tuning

Pre-training is quite ad-hoc（特別指定） and is unlikely to lead to a good probabilistic model per se
However, the layers of representations could perhaps be useful for some other downstream tasks!
We can further “fine-tune” a pre-trained DBN for some other task

Setting A: Unsupervised learning (DBN → autoencoder)

Pre-train a stack of RBMs in a greedy layer-wise fashion
“Unroll” the RBMs to create an autoencoder
Fine-tune the parameters by optimizing the reconstruction error（重構誤差）

Setting B: Supervised learning (DBN → classifier)

Pre-train a stack of RBMs in a greedy layer-wise fashion
“Unroll” the RBMs to create a feedforward classifier
Fine-tune the parameters by optimizing the reconstruction error

Deep Belief Nets and Boltzmann Machines

DBMs are fully un-directed models (Markov random fields). Can be trained similarly as RBMs via MCMC (Hinton & Sejnowski, 1983). Use a variational approximation(變分近似) of the data distribution for faster training (Salakhutdinov & Hinton, 2009). Similarly, can be used to initialize other networks for downstream tasks

A few critical points to note about all these models:

The primary goal of deep generative models is to represent the distribution of the observable variables. Adding layers of hidden variables allows to represent increasingly more complex distributions.
Hidden variables are secondary (auxiliary) elements used to facilitate learning of complex dependencies between the observables.
Training of the model is ad-hoc, but what matters is the quality of learned hidden representations.
Representations are judged by their usefulness on a downstream task (the probabilistic meaning of the model is often discarded at the end).
In contrast, classical graphical models are often concerned with the correctness of learning and inference of all variables

Conclusion

DL & GM: the fields are similar in the beginning (structure, energy, etc.), and then diverge to their own signature pipelines
DL: most effort is directed to comparing different architectures and their components (models are driven by evaluating empirical performance on a downstream tasks)
DL models are good at learning robust hierarchical representations from the data and suitable for simple reasoning (call it “low-level cognition”)
GM: the effort is directed towards improving inference accuracy and convergence speed
GMs are best for provably correct inference and suitable for high-level complex reasoning tasks (call it “high-level cognition”) 推理任務
Convergence of both fields is very promising!

03 Combining DL methods and GMs

Using outputs of NNs as inputs to GMs

Combining sequential NNs and GMs
HMM：隱馬爾可夫

Hybrid NNs + conditional GMs

In a standard CRF條件隨機場, each of the factor cells is a parameter.
In a hybrid model, these values are computed by a neural network.

GMs with potential functions represented by NNs q NNs with structured outputs

Using GMs as Prediction Explanations

!!! How do we build a powerful predictive model whose predictions we can interpret in terms of semantically meaningful features?

Contextual Explanation Networks (CENs)

The final prediction is made by a linear GM.
Each coefficient assigns a weight to a meaningful attribute.
Allows us to judge predictions in terms of GMs produced by the context encoder.

CEN: Implementation Details

Workflow:

Maintain a (sparse稀疏) dictionary of GM parameters.
Process complex inputs (images, text, time series, etc.) using deep nets; use soft attention to either select or combine models from the dictionary.
• Use constructed GMs (e.g., CRFs) to make predictions.
• Inspect GM parameters to understand the reasoning behind predictions.

Results: imagery as context

Based on the imagery, CEN learns to select different models for urban and rural

Results: classical image & text datasets

CEN architectures for survival analysis

04 Bayesian Learning of NNs

Bayesian learning of NN parameters q Deep kernel learning

A neural network as a probabilistic model: Likelihood: $p(y|x, \theta)$

Categorical distribution for classification ⇒ cross-entropy loss 交叉熵損失
Gaussian distribution for regression ⇒ squared loss平方損失
Gaussianprior⇒L2regularization
Laplaceprior⇒L1regularization

Bayesian learning [MacKay 1992, Neal 1996, de Freitas 2003]

深度學習基礎 Probabilistic Graphical Models | Statistical and Algorithmic Foundations of Deep Learning

文章目錄

Probabilistic Graphical Models

Statistical and Algorithmic Foundations of Deep Learning

01 An overview of DL components

Historical remarks: early days of neural networks

Reverse-mode automatic differentiation (aka backpropagation)

Modern building blocks: units, layers, activations functions, loss functions, etc.

02 Similarities and differences between GMs and NNs

Graphical models vs. computational graphs

Sigmoid Belief Networks as graphical models

Deep Belief Networks and Boltzmann Machines

03 Combining DL methods and GMs

Using outputs of NNs as inputs to GMs

GMs with potential functions represented by NNs q NNs with structured outputs

Contextual Explanation Networks (CENs)

04 Bayesian Learning of NNs

Bayesian learning of NN parameters q Deep kernel learning

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

本地SSL證書過期輸入命令在IIS自動生成

【MMT】ICLR 2020: MMT(Mutual Mean-Teaching)方法，無監督域適應在Person Re-ID上性能再創新高

【FSR】Feature Space Regularization for Person Re-Identification with One Sample

Zero-Shot Deep Domain Adaptation[reading notes]

Generation Tasks

概念學習前沿報告

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結