Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

1. Goal

This paper mainly deals with Sparse Principal Component Analysis(PCA) using subspace method.

2. Theorey

2.1 How to get their formulation

Notation: λ1,λ2,,λp are in decreasing order.
From Ky Fan’s maximum principal 1, we know that

i=1dλi(Σ)=tr(VΣV)=maxVV=Idtr(VΣV)=maxVV=IdΣ,VV

If we regard the last formula as a function of VV , it is linear. So if we change the constrain to its convex hull does not change the optimization problem. From the less well known observation that

Fdp={H:trace(H)=d,0HId}=conv({VV:VV=Id})

From all the analysis, we get

i=1dλi(Σ)=Σ,Π=maxHFdpΣ,H

How to introduce the sparsity? And which norm is suitable to use? The goal of this paper is to get sparse PCs, then we should choose penalty making VRp×d sparse. For matrix, there are two ways to get sparsity:

  • columnwise sparsity: for matrix A , each of its column is sparse, i.e. only few elements of Ai are nonzero.
  • row sparsity: for matrix A , its rows are sparse, i.e. only few rows of A are sparse, which produce the group sparsity.

For sparse PCA, to select the import features, this paper uses row sparsity. An intuitive penalty is V2,0 . But in high dimensional situation, 0 norm is NP hard to deal. A common trick is replacing 0 with 1 . Then the penalty becomes V2,1 . But our model is function of H=VV . So what sparsity on H can approximate well of V2,1 . Note that

H1,1=ijk|Vik||Vjk|=ijk|Vik||Vjk|ij(kV2ik)12(kV2jk)12=V22,1

where uses Cauchy-Schiwarz Inequality. Final model:

H^=maxHFdpS,HρH1,1

where S is an estimation of Σ .

2.2 Consistency

Make assumption:

  • assumption 1: (symmetry) S and Σ are symmetric matrices.
  • assumption 2: (Identity) δ=λd(Σ)λd+1(Σ)>0 .
  • assumption 3: (Sparsity) Π2,0=sp

Theorem (2 consistent): H^ is 2 consistent for Π under some regular condition. In more detail, if ρSΣ, ,

H^ΠF4sρδ
.

proof: The main tool is the curvature lemma2. Let Δ=H^Π and W=SΣ . From curvature lemma, we have

δ2Δ2FΣ,Δ
.
Together with the optimality of H^ ,
S,Δρ(H^1,1Π1,1)0

we have
δ2Δ2FW,Δρ(H^1,1Π1,1)W,Δ1,1ρ(Π+Δ1,1Π1,1)W,Δ1,1ρ(ΠS+ΔS1,1ΠS1,1+ΔSc1,1)ρ(Δ1,1ΠS+ΔS1,1+ΠS1,1ΔSc1,1)ρ(ΔS1,1ΠS+ΔS1,1+ΠS1,1)ρ(ΔS1,1ΠS1,1+ΔS1,1+ΠS1,1)2ρΔS1,12sρΔF

so
ΔF4sρδ.

3. Algorithm

The chief difficulty in solving this problem is the interaction between the penalty and the Fantope constraint. Without either of these features, the optimization problem would be much easier. ADMM an exploit this fact:

  • rewrite the model as
    maxs.t.{1Fdp(H)+S,HρZ1,1+λ2HZ2F}HZ=0
  • Lagrange function:
    L(H,Z,U)=1Fdp(H)+S,HρZ1,1+λ2HZ2FU,HZ=1Fdp(H)+S,HρZ1,1+λ2(HZ1λU2F1λU2F)=1Fdp(H)+S,HρZ1,1+λ2(HZU2FU2F)

R package code can be found, which is written in c++:
https://github.com/vqv/fps

where U=1λU and then update (H,Z,U) interatively 3.

H+Z+U+=argmaxL(H,Z,U)=argmaxL(H+,Z,U)=U+H+Z+

  1. Ky Fan’s maximum principal:
    i=1dλi(Σ)i=pd+1pλi(Σ)=maxVV=Idtr(VΣV)=minVV=Idtr(VΣV)
  2. Curvature lemma: Let A be a symmetric matrix and E be the projection onto the subspace spanned by the eigenvectors of A corresponding to its d largest eigenvalues λ1λ2 . If δA=λdλd+1 , then
    A,EFδA2EF2F

    for all FFdp .
  3. H^ =argmaxL(H,Z,U)=argmaxHFdpS,H+λ2HZU2F=argminHFdpH(ZUSλ)2F=PFdp(ZUSλ)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章