獨立成分分析FastICA算法原理

原創

2020-04-21 08:07

獨立成分分析FastICA算法原理

首先對於d維的隨機變量 $\displaystyle \mathbf{x} \in R^{d\times 1}$ ，我們假設他的產生過程是由相互獨立的源 $\displaystyle \mathbf{s} \in R^{d\times 1}$ ，通過 $\displaystyle A\in R^{d\times d}$ 線性組合產生的

$\mathbf{x} =\mathbf{As}$

白化 (Whitening)

求解ICA的第一步是對x白化得到互不相關的列 $\displaystyle \tilde{\mathbf{x}} =\mathbf{ED}^{-1/2}\mathbf{E}^{T}\mathbf{x}$ ，這一步一般是用PCA來完成的

$\tilde{\mathbf{x}} =\mathbf{ED}^{-1/2}\mathbf{E}^{T}\mathbf{As} =\tilde{\mathbf{A}}\mathbf{s}$

其中EDE其實的協方差x的特徵值分解，即 $\displaystyle \mathbf{xx}^{T} =\mathbf{AA}^{T} =\mathbf{EDE}^{T}$ 。白化的一個重要作用就是這個 $\displaystyle \tilde{\mathbf{A}}$ 是一個正交矩陣。利用這個正交性，可以證明 $\displaystyle E\left(\mathbf{\tilde{\mathbf{x}}}\tilde{\mathbf{x}}^{T}\right) =I$ ，也就是每個維度都是線性無關的：

$\tilde{\mathbf{x}}\tilde{\mathbf{x}}^{T} =\mathbf{ED}^{-1/2}\mathbf{E}^{T}\mathbf{Ass}^{T}\mathbf{A}^{T}\mathbf{ED^{-1/2} E}^{T} =\mathbf{ED}^{-1/2}\mathbf{E}^{T}\mathbf{EDE^{T} ED^{-1/2} E}^{T} =I$

這個白化使得我們需要學習的參數減少了一半，爲什麼呢，因爲正交矩陣的自由度是 $\displaystyle \frac{n( n-1)}{2}$ ，而原始的矩陣的自由度是 $\displaystyle n^{2}$ .現假設後面的x都是經過白化處理的數據。

FastICA的迭代算法

接來下開始就是ICA了，如果s的服從高斯分佈的，那麼故事結束，我們不能恢復出唯一的s，因爲不管哪個方向都是等價的。而如果s是非高斯的，那麼ICA要做的就是就是要找到一個最優方向 $\displaystyle \mathbf{w}$ ，使得改方向的非高斯性最大

$\max J\ \left(\mathbf{w}^{T}\mathbf{x}\right)$

因此，這只是一個優化問題。那麼怎麼衡量非高斯性呢，我們可以用負熵來衡量：

$J(\mathbf{y} )=H(\mathbf{y}_{gauss}) -H(\mathbf{y} )$

其中 $\displaystyle \mathbf{y}_{gauss}$ 是高斯分佈，其協方差矩陣跟 $\displaystyle y$ 是一樣的，顯然當y是高斯分佈的時候等於0.所以當J越大非高斯性就越大。然而這個東西不好算，所以有很多不同的近似方法,一個經典的方法是

$J(y)\approx \frac{1}{12} E\left\{y^{3}\right\}^{2} +\frac{1}{48}\operatorname{kurt} (y)^{2}$

而在fastICA中，使用的是：

$J_{G} (\mathbf{w} )=\left[ E\left\{G\left(\mathbf{w}^{T}\mathbf{x}\right)\right\} -E\{G(\nu )\}\right]^{2}$

其中 $\displaystyle E\left(\left(\mathbf{w}^{T}\mathbf{x}\right)^{2}\right) =E\left(\mathbf{w}^{T}\mathbf{xx}^{T}\mathbf{w}\right) =E\left(\mathbf{w}^{T}\mathbf{w}\right) =\| \mathbf{w} \| ^{2} =1$ ,這是爲了限制目標函數不要到無窮大。那麼這個w怎麼求呢？我們看看他在約束下的導數：

$\frac{\partial J_{G} (\mathbf{w} )-\beta \left( \| \mathbf{w} \| ^{2} -1\right)}{\partial \mathbf{w}} =2E\left\{\mathbf{x} G'\left(\mathbf{w}^{T}\mathbf{x}\right)\right\} -2\beta \mathbf{w} =0$

當這個方程等於0的時候就得到最優的w了。此外 $\displaystyle \beta$ 的值可以用最優的w反推出來，假設 $\displaystyle w_{0}$ 是最優的，一定有 $\displaystyle E\left\{\mathbf{x} G'\left(\mathbf{w}^{T}_{0}\mathbf{x}\right)\right\} -\beta \mathbf{w}_{o} =0\Longrightarrow \beta =E\left\{\mathbf{w^{T}_{0} x} G'\left(\mathbf{w}^{T}_{0}\mathbf{x}\right)\right\}$ 。於是這其實是一個方程求根的問題，我們可以用牛頓法解決：

$x_{n+1} =x_{n} -\frac{f( x_{n})}{f^{\prime }( x_{n})}$

簡單地介紹一下，我們知道方程的根就是他跟x軸的交點，那如果我們用切線來近似曲線，那麼可以“認爲“切線與x軸的交點就是根，如果不是的話，我們可以一直重複這個流程直到收斂。

$\frac{\partial E\left\{\mathbf{x} G'\left(\mathbf{w}^{T}\mathbf{x}\right)\right\} -\beta \mathbf{w}}{\partial w} =E\left\{\mathbf{xx}^{T} G''\left(\mathbf{w}^{T}\mathbf{x}\right)\right\} -\beta \mathbf{I}$

其中用一個近似的方法 $\displaystyle E\left\{\mathbf{xx}^{T} G''\left(\mathbf{w}^{T}\mathbf{x}\right)\right\} \approx E\left\{\mathbf{xx}^{T}\right\} E\left\{G''\left(\mathbf{w}^{T}\mathbf{x}\right)\right\} =E\left\{G''\left(\mathbf{w}^{T}\mathbf{x}\right)\right\}\mathbf{I}$ ，最終用牛頓法的公式

$\mathbf{w}_{n+1} =\mathbf{w}_{n} -\frac{E\left\{\mathbf{x} G'\left(\mathbf{w}_{n}^{T}\mathbf{x}\right)\right\} -\beta \mathbf{w}_{n}}{E\left\{G''\left(\mathbf{w}^{T}_{n}\mathbf{x}\right)\right\} -\beta \mathbf{I}}$

我們用 $\displaystyle \mathbf{w}_{n}$ 而不是 $\displaystyle \mathbf{w}_{0}$ 來近似 $\displaystyle \beta$ 就得到了遞推公式：

$\mathbf{w}_{n+1} =\frac{\mathbf{E\left\{G''\left( w^{T}_{n} x\right)\right\} w_{n}} -E\left\{\mathbf{x} G'\left(\mathbf{w}_{n}^{T}\mathbf{x}\right)\right\}}{E\left\{G''\left(\mathbf{w}^{T}_{n}\mathbf{x}\right)\right\} -\beta }$

最後兩邊同時乘以 $\displaystyle \beta -E\{G''\left(\mathbf{w}^{T}_{n}\mathbf{x}\right)$ 可以進一步簡化？論文裏面這一步沒看懂，希望看懂的告訴我。。。於是迭代公式就變成

$\mathbf{w}_{n+1} =E\left\{\mathbf{x} G'\left(\mathbf{w}_{n}^{T}\mathbf{x}\right)\right\} -\mathbf{E\left\{G''\left( w^{T}_{n} x\right)\right\} w_{n}}\\ \mathbf{w}_{n+1} =\mathbf{w}_{n} /\| \mathbf{w}_{n} \|$

這個就是fastICA所使用的迭代公式了。

最後，這個G到底是什麼，常用的G:

$\begin{aligned} G _ { 1 } ( u ) & = \frac { 1 } { a _ { 1 } } \log \cosh \left( a _ { 1 } u \right) \\ G' _ { 1 } ( u ) & = \tanh \left( a _ { 1 } u \right) \\ G _ { 2 } ( u ) & = - \frac { 1 } { a _ { 2 } } \exp \left( - a _ { 2 } u ^ { 2 } / 2 \right) \\ G' _ { 2 } ( u ) & = u \exp \left( - a _ { 2 } u ^ { 2 } / 2 \right) \\ G _ { 3 } ( u ) & = \frac { 1 } { 4 } u ^ { 4 } \\ G' _ { 3 } ( u ) & = u ^ { 3 } \end{aligned}$

其中 $1\ge a_1 \le2,a_2\approx1$ ，一般來說

$G_1$ 是一般選擇
如果獨立成分s是非常超高斯的，那麼用 $G_2$ 會好一點
如果獨立成分是sub-gaussian ，且沒有異常值，則用 $G_3$
如果要減少計算量，那麼或許可以用 $G_1$ 和 $G_2$ 的線性組合

參考文獻

[1] HYVÄRINEN A, OJA E. Independent component analysis: algorithms and applications[J]. Neural networks, Elsevier, 2000, 13(4–5): 411–430.

[2] HYVARINEN A. Fast and robust fixed-point algorithms for independent component analysis[J]. IEEE transactions on Neural Networks, IEEE, 1999, 10(3): 626–634.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

獨立成分分析FastICA算法原理

獨立成分分析FastICA算法原理

白化 (Whitening)

FastICA的迭代算法

參考文獻

.Net 8.0 下的新RPC，IceRPC之試試的新玩法"打洞"

完美替代postman的軟件

Vue mockjs mock.js

關於遊戲付費的一點想法

我通過CKA和CKS啦！

《最新出爐》系列入門篇-Python+Playwright自動化測試-42-強大的可視化追蹤利器Trace Viewer

大數據怎麼學？對大數據開發領域及崗位的詳細解讀，完整理解大數據開發領域技術體系

理解Jacobian矩陣與分佈變換

無痛理解梯度下降

使用SVD來求解優化問題最優值以及求解PCA

理解意向性分析(intention-to-treat, ITT)

Propensity score簡介

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結