第10章無監督學習（3）

Manifold Interpretation of PCA and Linear Auto-Encoders

目標是尋找x 在子空間的一個映射，並保存儘量多的信息

令編碼器爲

h = f (x) = W T (x - μ)

h 是 x 的一個低維模擬

解碼器爲

x^= g (h) = b + V h

因爲解碼器和編碼器都是線性的，那麼最小化重構誤差就是

E [| | x - x^| | 2]

也就是

V = W, μ = b = E [x]

W 的行形成協方差矩陣主要特徵向量空間一組正交基

C = E [(x - μ) (x - μ) T]

對於PCA, W 的行就是這些特徵向量，以對應特徵值的重要性排序

最優的重構誤差

min E [| | x - x^| | 2] = \sum i = d + 1 D λ i

其中

x∈RD

h∈Rd

λi 是協方差矩陣的特徵值.

如果協方差的秩是 d , 那麼重構誤差就是 0.

ICA

Independent Component Analysis
獨立成分分析
Herault and Ans, 1984; Jutten and Herault, 1991; Comon, 1994; Hyv¨arinen,1999; Hyv¨arinen et al., 2001

和概率PCA，特徵分析相似，它也滿足線性特徵模型的條件

sample real-valued factors $h \sim P (h)$
sample the real-valued observable variables $x = W h + b + m n o i s e$

其中不同的一點是它不假設先驗分佈是高斯分佈，它只假設是參數化的，例如

$P (h) = \prod i P (h i)$
假如假定隱藏變量是非高斯分佈的，那麼就可以重現他們。這也是ICA的目的。

Sparse Coding as a Generative Model

一個比較有趣的非高斯分佈模型-分佈是稀疏的

P(h) 在0附近值很大也就是h 是0附近的概率很高例如參數化的拉普拉斯密度先驗分佈

P (h) = \prod i P (h i) = \prod i λ 2 e - λ | h i |

Student_t prior is

P (h) = \prod i P (h i) \propto \prod i 1 1 + h 2 i v v + 1 2

Greedy Layerwise Unsupervised Pre-Training

Greedy - 不同層沒有一起統合起來訓練，可能會得到局部最優解

Layerwise - 每次只訓練一層，訓練第K層的時候保持前面的層保持不變

Unsupervised - 每一層都是無監督學習

Pre-Training - 它只是算法的第一步

Transfer Learning and Domain Adaptation

目標是抽取和利用數據集A的信息來應用到數據集B
譬如，不同的領域的具體評價不同（電影，音樂，書籍的評價），但有些地方是相同的。所以叫 Domain Adaptation

兩個例子

Mesnil et al., 2011

Goodfellow et al., 2011

Extreme forms of transfer learning

one-shot learning

zero-shot learning

zero-data learning

Manifold Interpretation of PCA and Linear Auto-Encoders

標籤（空格分隔）：深度學習個人興趣

Look for projections of x into a subspace that preserves as much as information as possible about x

Let the encoder be

h = f (x) = W T (x - μ)

h is a low-dimensional representation of x

Decoder

x^= g (h) = b + V h

With liner encoder and decoder, minimizing reconstruction error

E [| | x - x^| | 2]

means that

V = W, μ = b = E [x]

and the rows of

W form an orthonormal basis which spans the same subspace as the principal eigenvectors of the covariance matrix

C = E [(x - μ) (x - μ) T]

In the case of PCA, the rows of W are these eigenvectors, ordered by the magnitude of the corresponding eigenvalues.

the optimal reconstruction error

min E [| | x - x^| | 2] = \sum i = d + 1 D λ i

Where

D is the dimension of x

d is the dimension of h

λi are the eigenvalues of the convariance.

If the covariance has rank d , the reconstrcution error is 0.

ICA

Independent Component Analysis
Herault and Ans, 1984; Jutten and Herault, 1991; Comon, 1994; Hyv¨arinen,1999; Hyv¨arinen et al., 2001

Like probabilistic PCA and factor analysis, it also fits the linear factor model of Eqs.

sample real-valued factors $h \sim P (h)$
sample the real-valued observable variables $x = W h + b + m n o i s e$

What is particular about ICA is that unlike PCA and factor analysis it does not assume that the prior is Gaussian. It only assumes that it is factorized, i.e.

$P (h) = \prod i P (h i)$
In this case, if we assume that the latent variables are non-Gaussian, then we can recover them, and this is what ICA is trying to achieve.

Sparse Coding as a Generative Model

A particularly interesting form of non-Gaussianity arises with distributions that are sparse.

P(h) puts high probability at or around 0. For instance, the factorized Laplace density prior is

P (h) = \prod i P (h i) = \prod i λ 2 e - λ | h i |

Student_t prior is

P (h) = \prod i P (h i) \propto \prod i 1 1 + h 2 i v v + 1 2

Greedy Layerwise Unsupervised Pre-Training

Greedy - the different layers are not jointly trained with respect to a global training objective, which could make the procedure sub-optimal

Layerwise - it proceeds one layer at a time, training the k-layer while keeping the previous ones fixed.

Unsupervised - each layer is trained with an unsupervised representation learning algorithm.

Pre-Training - it should be only a first step before a joint training algorithm is applied to fine-tune all the layers together with respect to a criterion of interest

Transfer Learning and Domain Adaptation

The objective is to take advantage of data from a first setting to extract information that may be useful when learning or even directly making predictions in the second setting.

Two examples

Mesnil et al., 2011

Goodfellow et al., 2011

Extreme forms of transfer learning

one-shot learning

zero-shot learning

zero-data learning

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

第10章無監督學習（3）

Manifold Interpretation of PCA and Linear Auto-Encoders

ICA

Sparse Coding as a Generative Model

Greedy Layerwise Unsupervised Pre-Training

Transfer Learning and Domain Adaptation

Manifold Interpretation of PCA and Linear Auto-Encoders

ICA

Sparse Coding as a Generative Model

Greedy Layerwise Unsupervised Pre-Training

Transfer Learning and Domain Adaptation

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

linux安裝cuda和cudnn

Mellanox網卡開啓SR-IOV

模擬手機設備：使用 Playwright 實現移動端自動化測試

HTML 00 Tutorial

全面系統的AI學習路徑，幫助普通人也能玩轉AI

從零開始：使用 Playwright 腳本錄製實現自動化測試

uni-app實現上拉加載

第 11 章 CNNs(2)

第10章無監督學習（2）

英文markdown 簡歷

在你的 Mac 上安裝Theano

第10章無監督學習

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

第10章 無監督學習（3）

Manifold Interpretation of PCA and Linear Auto-Encoders

ICA

Sparse Coding as a Generative Model

Greedy Layerwise Unsupervised Pre-Training

Transfer Learning and Domain Adaptation

Manifold Interpretation of PCA and Linear Auto-Encoders

ICA

Sparse Coding as a Generative Model

Greedy Layerwise Unsupervised Pre-Training

Transfer Learning and Domain Adaptation

第10章無監督學習（3）