1. Goal
This paper mainly deals with Sparse Principal Component Analysis(PCA) using subspace method.
2. Theorey
2.1 How to get their formulation
Notation:
From Ky Fan’s maximum principal 1, we know that
If we regard the last formula as a function of
From all the analysis, we get
How to introduce the sparsity? And which norm is suitable to use? The goal of this paper is to get sparse PCs, then we should choose penalty making
- columnwise sparsity: for matrix
A , each of its column is sparse, i.e. only few elements ofA∗i are nonzero. - row sparsity: for matrix
A , its rows are sparse, i.e. only few rows ofA are sparse, which produce the group sparsity.
For sparse PCA, to select the import features, this paper uses row sparsity. An intuitive penalty is
where
where
2.2 Consistency
Make assumption:
- assumption 1: (symmetry)
S andΣ are symmetric matrices. - assumption 2: (Identity)
δ=λd(Σ)−λd+1(Σ)>0 . - assumption 3: (Sparsity)
∥Π∥2,0=s≪p
Theorem (
proof: The main tool is the curvature lemma2. Let
Together with the optimality of
we have
so
3. Algorithm
The chief difficulty in solving this problem is the interaction between the penalty and the Fantope constraint. Without either of these features, the optimization problem would be much easier. ADMM an exploit this fact:
- rewrite the model as
maxs.t.{−1Fdp(H)+⟨S,H⟩−ρ∥Z∥1,1+λ2∥H−Z∥2F}H−Z=0 - Lagrange function:
L(H,Z,U)=−1Fdp(H)+⟨S,H⟩−ρ∥Z∥1,1+λ2∥H−Z∥2F−⟨U,H−Z⟩=−1Fdp(H)+⟨S,H⟩−ρ∥Z∥1,1+λ2(∥∥∥H−Z−1λU∥∥∥2F−∥∥∥1λU∥∥∥2F)=−1Fdp(H)+⟨S,H⟩−ρ∥Z∥1,1+λ2(∥∥H−Z−U′∥∥2F−∥∥U′∥∥2F)
R package code can be found, which is written in c++:
https://github.com/vqv/fps
where
- Ky Fan’s maximum principal:
↩∑i=1dλi(Σ)∑i=p−d+1pλi(Σ)=maxV′V=Idtr(V′ΣV)=minV′V=Idtr(V′ΣV) - Curvature lemma: Let A be a symmetric matrix and E be the projection onto the subspace spanned by the eigenvectors of A corresponding to its d largest eigenvalues
λ1≥λ2≥⋯ . IfδA=λd−λd+1 , then
⟨A,E−F⟩≥δA2∥E−F∥2F
for allF∈Fdp . ↩ - ↩
H^ =argmaxL(H,Z,U∗)=argmaxH∈Fdp⟨S,H⟩+λ2∥H−Z−U∗∥2F=argminH∈Fdp∥∥∥H−(Z−U∗−Sλ)∥∥∥2F=PFdp(Z−U∗−Sλ)