Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

原創

2020-07-02 05:00

1. Goal

This paper mainly deals with Sparse Principal Component Analysis(PCA) using subspace method.

2. Theorey

2.1 How to get their formulation

Notation: λ1,λ2,⋯,λp are in decreasing order.
From Ky Fan’s maximum principal 1, we know that

\sum i = 1 d λ i (Σ) = t r (V *' Σ V *) = max V' V = I d t r (V' Σ V) = max V' V = I d ⟨ Σ, V V' ⟩

If we regard the last formula as a function of VV′ , it is linear. So if we change the constrain to its convex hull does not change the optimization problem. From the less well known observation that

F d p = {H : trace (H) = d, 0 \leq H \leq I d} = conv ({V V' : V' V = I d})

From all the analysis, we get

\sum i = 1 d λ i (Σ) = ⟨ Σ, Π ⟩ = max H \in F d p ⟨ Σ, H ⟩

How to introduce the sparsity? And which norm is suitable to use? The goal of this paper is to get sparse PCs, then we should choose penalty making V∗∈Rp×d sparse. For matrix, there are two ways to get sparsity:

columnwise sparsity: for matrix A , each of its column is sparse, i.e. only few elements of A∗i are nonzero.
row sparsity: for matrix A , its rows are sparse, i.e. only few rows of A are sparse, which produce the group sparsity.

For sparse PCA, to select the import features, this paper uses row sparsity. An intuitive penalty is ∥V∥2,0 . But in high dimensional situation, ℓ0 norm is NP hard to deal. A common trick is replacing ℓ0 with ℓ1 . Then the penalty becomes ∥V∥2,1 . But our model is function of H=VV′ . So what sparsity on H can approximate well of ∥V∥2,1 . Note that

∥ H ∥ 1, 1 = \sum i j k | V i k | | V j k | = \sum i j \sum k | V i k | | V j k | \leq \sum i j (\sum k V 2 i k) 1 2 (\sum k V 2 j k) 1 2 = ∥ V ∥ 2 2, 1

where

≤ uses Cauchy-Schiwarz Inequality. Final model:

H^= max H \in F d p ⟨ S, H ⟩ - ρ ∥ H ∥ 1, 1

where

S is an estimation of

Σ .

2.2 Consistency

Make assumption:

assumption 1: (symmetry) S and Σ are symmetric matrices.
assumption 2: (Identity) δ=λd(Σ)−λd+1(Σ)>0 .
assumption 3: (Sparsity) ∥Π∥2,0=s≪p

Theorem (ℓ2 consistent): H^ is ℓ2 consistent for Π under some regular condition. In more detail, if ρ≥∥S−Σ∥∞,∞ ,

∥ H^- Π ∥ F \leq 4 s ρ δ

proof: The main tool is the curvature lemma2. Let Δ=H^−Π and W=S−Σ . From curvature lemma, we have

δ 2 ∥ Δ ∥ 2 F \leq - ⟨ Σ, Δ ⟩

.
Together with the optimality of

H^ ,

⟨ S, Δ ⟩ - ρ (∥ H^∥ 1, 1 - ∥ Π ∥ 1, 1) \geq 0

we have

δ 2 ∥ Δ ∥ 2 F \leq ⟨ W, Δ ⟩ - ρ (∥ H^∥ 1, 1 - ∥ Π ∥ 1, 1) \leq ∥ W ∥ \infty, \infty ∥ Δ ∥ 1, 1 - ρ (∥ Π + Δ ∥ 1, 1 - ∥ Π ∥ 1, 1) \leq ∥ W ∥ \infty, \infty ∥ Δ ∥ 1, 1 - ρ (∥ Π S + Δ S ∥ 1, 1 - ∥ Π S ∥ 1, 1 + ∥ Δ S c ∥ 1, 1) \leq ρ (∥ Δ ∥ 1, 1 - ∥ Π S + Δ S ∥ 1, 1 + ∥ Π S ∥ 1, 1 - ∥ Δ S c ∥ 1, 1) \leq ρ (∥ Δ S ∥ 1, 1 - ∥ Π S + Δ S ∥ 1, 1 + ∥ Π S ∥ 1, 1) \leq ρ (∥ Δ S ∥ 1, 1 - ∥ Π S ∥ 1, 1 + ∥ Δ S ∥ 1, 1 + ∥ Π S ∥ 1, 1) \leq 2 ρ ∥ Δ S ∥ 1, 1 \leq 2 s ρ ∥ Δ ∥ F

∥ Δ ∥ F \leq 4 s ρ δ .

3. Algorithm

The chief difficulty in solving this problem is the interaction between the penalty and the Fantope constraint. Without either of these features, the optimization problem would be much easier. ADMM an exploit this fact:

rewrite the model as
$max s.t. {- 1 F d p (H) + ⟨ S, H ⟩ - ρ ∥ Z ∥ 1, 1 + λ 2 ∥ H - Z ∥ 2 F} H - Z = 0$
Lagrange function:
$L (H, Z, U) = - 1 F d p (H) + ⟨ S, H ⟩ - ρ ∥ Z ∥ 1, 1 + λ 2 ∥ H - Z ∥ 2 F - ⟨ U, H - Z ⟩ = - 1 F d p (H) + ⟨ S, H ⟩ - ρ ∥ Z ∥ 1, 1 + λ 2 (∥ ∥ ∥ H - Z - 1 λ U ∥ ∥ ∥ 2 F - ∥ ∥ ∥ 1 λ U ∥ ∥ ∥ 2 F) = - 1 F d p (H) + ⟨ S, H ⟩ - ρ ∥ Z ∥ 1, 1 + λ 2 (∥ ∥ H - Z - U' ∥ ∥ 2 F - ∥ ∥ U' ∥ ∥ 2 F)$

R package code can be found, which is written in c++:
https://github.com/vqv/fps

where U′=1λU and then update (H,Z,U) interatively 3.

H + Z + U + = arg max L (H, Z, U) = arg max L (H +, Z, U) = U + H + - Z +

Ky Fan’s maximum principal:
$\sum i = 1 d λ i (Σ) \sum i = p - d + 1 p λ i (Σ) = max V' V = I d t r (V' Σ V) = min V' V = I d t r (V' Σ V)$ ↩
Curvature lemma: Let A be a symmetric matrix and E be the projection onto the subspace spanned by the eigenvectors of A corresponding to its d largest eigenvalues λ1≥λ2≥⋯ . If δA=λd−λd+1 , then
$⟨ A, E - F ⟩ \geq δ A 2 ∥ E - F ∥ 2 F$
for all F∈Fdp . ↩
$H^= arg max L (H, Z, U *) = arg max H \in F d p ⟨ S, H ⟩ + λ 2 ∥ H - Z - U * ∥ 2 F = arg min H \in F d p ∥ ∥ ∥ H - (Z - U * - S λ) ∥ ∥ ∥ 2 F = P F d p (Z - U * - S λ)$ ↩

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

1. Goal

2. Theorey

2.1 How to get their formulation

2.2 Consistency

3. Algorithm

容器中nginx無法使用同一個網絡下的容器域名

Python: SunMoonTimeCalculator

NETCore中實現一個輕量無負擔的極簡任務調度ScheduleTask

docker使用特定的網絡

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

nodejs學習07——API

避免DbContext同時在多個線程調用

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

[Getting and Cleaning data] Week 1

[Help] Proximal mapping

[Latex] natbib package to cite references

[R] How to install RMySQL package on Window

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結