線性判別分析(Linear Discriminat Analysis)
PCA找尋的投影向量力求找到使得特徵點方差較大(也就是說散的比較開),與PCA所找尋的投影向量不同,LAD所找尋的投影向量具有下面兩種特性:
映射後不同類數據之間的中心點(均值點)相距較遠
映射後同類數據之間方差較小(分佈比較集中)
類似於一種聚類分析,但是卻是一種監督學習算法。而PCA屬於一種無監督學習算法。
那麼將LDA的主軸與PCA的主軸畫出如下:
可以看出實際上數據在映射在 LDA 的主軸上可分性更高。
在下面的正文中的一些數學符號的表示:
L : number of classes n i : number of samples in class i n : number of all samples x ℓ ( i ) : the ℓ -th sample in class i P i : the prior probability of class i
\begin{aligned} L & : \text { number of classes } \\ n _ { i } & : \text { number of samples in class } i \\ n & : \text { number of all samples } \\ x _ { \ell } ^ { ( i ) } & : \text { the } \ell \text { -th sample in class } i \\ P _ { i } & : \text { the prior probability of class } i \end{aligned}
L n i n x ℓ ( i ) P i : number of classes : number of samples in class i : number of all samples : the ℓ -th sample in class i : the prior probability of class i
類間分散矩陣(Between-class Scatter Matrix)
那麼所有的樣本點 x ℓ ( i ) x _ { \ell } ^ { ( i ) } x ℓ ( i ) 在方向 e e e 上的投影爲:
{ e T x 1 ( 1 ) , … , e T x n 1 ( 1 ) , e T x 1 ( 2 ) , … , e T x n 2 ( 2 ) , … , e T x ℓ ( i ) , … , e T x 1 ( L ) , … , e T x n L ( L ) }
\left\{ e ^ { T } x _ { 1 } ^ { ( 1 ) } , \ldots , e ^ { T } x _ { n _ { 1 } } ^ { ( 1 ) } , e ^ { T } x _ { 1 } ^ { ( 2 ) } , \ldots , e ^ { T } x _ { n _ { 2 } } ^ { ( 2 ) } , \ldots , e ^ { T } x _ { \ell } ^ { ( i ) } , \ldots , e ^ { T } x _ { 1 } ^ { ( L ) } , \ldots , e ^ { T } x _ { n _ { L } } ^ { ( L ) } \right\}
{ e T x 1 ( 1 ) , … , e T x n 1 ( 1 ) , e T x 1 ( 2 ) , … , e T x n 2 ( 2 ) , … , e T x ℓ ( i ) , … , e T x 1 ( L ) , … , e T x n L ( L ) }
投影之後的中心點爲:
m i = 1 n i ∑ ℓ = 1 n i e T x ℓ ( i ) = e T { 1 n i ∑ ℓ = 1 n i x ℓ ( i ) } = e T m i
m _ { i } = \frac { 1 } { n _ { i } } \sum _ { \ell = 1 } ^ { n _ { i } } \boldsymbol { e } ^ { T } \boldsymbol { x } _ { \ell } ^ { ( i ) } = \boldsymbol { e } ^ { T } \left\{ \frac { 1 } { n _ { i } } \sum _ { \ell = 1 } ^ { n _ { i } } \boldsymbol { x } _ { \ell } ^ { ( i ) } \right\} = \boldsymbol { e } ^ { T } \boldsymbol { m } _ { i }
m i = n i 1 ℓ = 1 ∑ n i e T x ℓ ( i ) = e T { n i 1 ℓ = 1 ∑ n i x ℓ ( i ) } = e T m i
其中 m i = 1 n i ∑ ℓ = 1 n i x ℓ ( i ) \boldsymbol { m } _ { i } = \frac { 1 } { n _ { i } } \sum _ { \ell = 1 } ^ { n _ { i } } \boldsymbol { x } _ { \ell } ^ { ( i ) } m i = n i 1 ∑ ℓ = 1 n i x ℓ ( i ) 實際上就是投影之前的中心,所以投影之後的中心則是原數據中心得投影。
那麼不同類中心之間的平方距離(歐氏距離):
1 2 ∑ i = 1 L ∑ j = 1 L P i P j ∣ ∣ m i − m j ∣ ∣ = 1 2 ∑ i = 1 L ∑ j = 1 L P i P j ( m i − m j ) ( m i − m j ) T = 1 2 ∑ i = 1 L ∑ j = 1 L P i P j ( e T m i − e T m j ) ( e T m i − e T m j ) T = 1 2 ∑ i = 1 L ∑ j = 1 L P i P j e T ( m i − m j ) ( m i − m j ) T e = e T ( 1 2 ∑ i = 1 L ∑ j = 1 L P i P j ( m i − m j ) ( m i − m j ) T ) ⏟ S b L D A e
\begin{aligned}
&\frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \left| \kern-0.15em \left| m _ { i } - m _ { j } \right| \kern-0.15em \right| \\
= &\frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } (m _ { i } - m _ { j })(m _ { i } - m _ { j })^T \\
= &\frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } (e ^ { T } \boldsymbol m _ { i } - e ^ { T } \boldsymbol m _ { j })(e ^ { T } \boldsymbol m _ { i } - e ^ { T } \boldsymbol m _ { j })^T \\
= &\frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } e ^ { T } ( \boldsymbol m _ { i } - \boldsymbol m _ { j })(\boldsymbol m _ { i } - \boldsymbol m _ { j })^T e \\
= &e ^ { T } \underbrace{\left( \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \left( \boldsymbol { m } _ { i } - \boldsymbol { m } _ { j } \right) \left( \boldsymbol { m } _ { i } - \boldsymbol { m } _ { j } \right) ^ { T } \right)}_{S^{LDA}_b} e \\
\end{aligned}
= = = = 2 1 i = 1 ∑ L j = 1 ∑ L P i P j ∣ ∣ m i − m j ∣ ∣ 2 1 i = 1 ∑ L j = 1 ∑ L P i P j ( m i − m j ) ( m i − m j ) T 2 1 i = 1 ∑ L j = 1 ∑ L P i P j ( e T m i − e T m j ) ( e T m i − e T m j ) T 2 1 i = 1 ∑ L j = 1 ∑ L P i P j e T ( m i − m j ) ( m i − m j ) T e e T S b L D A ( 2 1 i = 1 ∑ L j = 1 ∑ L P i P j ( m i − m j ) ( m i − m j ) T ) e
其中 S b L D A S^{LDA}_b S b L D A 代表了類間分散矩陣(Between-class Scatter Matrix),其中下標 b b b 代表的 between。
加入每一類的比例是認爲這個距離也就是說拉的有多開,跟這兩類數據的佔比有關。
下面要證明:
S b L D A = 1 2 ∑ i = 1 L ∑ j = 1 L P i P j ( m i − m j ) ( m i − m j ) T = ∑ i = 1 L P i ( m i − m ) ( m i − m ) T
S _ { b } ^ { L D A } = \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \left( \boldsymbol { m } _ { i } - \boldsymbol { m } _ { j } \right) \left( \boldsymbol { m } _ { i } - \boldsymbol { m } _ { j } \right) ^ { T } = \sum _ { i = 1 } ^ { L } P _ { i } \left( \boldsymbol { m } _ { i } - \boldsymbol { m } \right) \left( \boldsymbol { m } _ { i } - \boldsymbol { m } \right) ^ { T }
S b L D A = 2 1 i = 1 ∑ L j = 1 ∑ L P i P j ( m i − m j ) ( m i − m j ) T = i = 1 ∑ L P i ( m i − m ) ( m i − m ) T
其中 m \boldsymbol { m } m 代表了全部樣本點的中心(均值):
m = 1 n ∑ i = 1 L ∑ ℓ = 1 n i x ℓ ( i )
\boldsymbol { m } = \frac { 1 } { n } \sum _ { i = 1 } ^ { L } \sum _ { \ell = 1 } ^ { n _ { i } } \boldsymbol { x } _ { \ell } ^ { ( i ) }
m = n 1 i = 1 ∑ L ℓ = 1 ∑ n i x ℓ ( i )
直接換項不容易,所以這裏將兩邊進行整理轉換然後使得轉換後等式相等。
先進行左邊的等式轉換:
S b L D A = 1 2 ∑ i = 1 L ∑ j = 1 L P i P j ( m i − m j ) ( m i − m j ) T = 1 2 ∑ i = 1 L ∑ j = 1 L P i P j ( m i m i T − m i m j T − m j m i T + m j m j T ) = 1 2 ∑ i = 1 L ∑ j = 1 L P i P j m i m i T − 1 2 ∑ i = 1 L ∑ j = 1 L P i P j m i m j T − 1 2 ∑ i = 1 L ∑ j = 1 L P i P j m j m i T + 1 2 ∑ i = 1 L ∑ j = 1 L P i P j m j m j T = 1 2 ∑ i = 1 L P i m i m i T ( ∑ j = 1 L P j ) − 1 2 ( ∑ i = 1 L P i m i ) ( ∑ j = 1 L P j m j T ) − 1 2 ( ∑ j = 1 L P j m j ) ( ∑ i = 1 L P i m i T ) + 1 2 ( ∑ i = 1 L P i ) ∑ j = 1 L P j m j m j T
\begin{aligned}
S _ { b } ^ { L D A } & = \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \left( \boldsymbol { m } _ { i } - \boldsymbol { m } _ { j } \right) \left( \boldsymbol { m } _ { i } - \boldsymbol { m } _ { j } \right) ^ { T } \\
& = \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \left( \boldsymbol { m } _ { i } \boldsymbol { m } _ { i }^ { T } -\boldsymbol { m } _ { i } \boldsymbol { m } _ { j }^ { T } - \boldsymbol { m } _ { j } \boldsymbol { m } _ { i }^ { T } + \boldsymbol { m } _ { j } \boldsymbol { m } _ { j }^ { T } \right)\\
& = \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \boldsymbol { m } _ { i } \boldsymbol { m } _ { i }^ { T } - \frac { 1 } { 2 }\sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \boldsymbol { m } _ { i } \boldsymbol { m } _ { j }^ { T } - \frac { 1 } { 2 }\sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \boldsymbol { m } _ { j } \boldsymbol { m } _ { i }^ { T } + \frac { 1 } { 2 }\sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \boldsymbol { m } _ { j } \boldsymbol { m } _ { j }^ { T } \\
& = \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } P _ { i} \boldsymbol { m } _ { i } \boldsymbol { m } _ { i }^ { T } \left( \sum _ { j = 1 } ^ { L } P _ { j } \right)
-\frac { 1 } { 2 } \left(\sum _ { i = 1 } ^ { L } P _ { i} \boldsymbol { m } _ { i }\right) \left(\sum _ { j = 1 } ^ { L } P _ { j } \boldsymbol { m } _ { j }^ { T } \right)
-\frac { 1 } { 2 } \left(\sum _ { j = 1 } ^ { L } P _ { j } \boldsymbol { m } _ { j } \right) \left(\sum _ { i = 1 } ^ { L } P _ { i} \boldsymbol { m } _ { i } ^ { T } \right)+ \frac { 1 } { 2 } \left( \sum _ { i = 1 } ^ { L } P _ { i } \right) \sum _ { j = 1 } ^ { L } P _ { j} \boldsymbol { m } _ { j } \boldsymbol { m } _ { j }^ { T }
\end{aligned}
S b L D A = 2 1 i = 1 ∑ L j = 1 ∑ L P i P j ( m i − m j ) ( m i − m j ) T = 2 1 i = 1 ∑ L j = 1 ∑ L P i P j ( m i m i T − m i m j T − m j m i T + m j m j T ) = 2 1 i = 1 ∑ L j = 1 ∑ L P i P j m i m i T − 2 1 i = 1 ∑ L j = 1 ∑ L P i P j m i m j T − 2 1 i = 1 ∑ L j = 1 ∑ L P i P j m j m i T + 2 1 i = 1 ∑ L j = 1 ∑ L P i P j m j m j T = 2 1 i = 1 ∑ L P i m i m i T ( j = 1 ∑ L P j ) − 2 1 ( i = 1 ∑ L P i m i ) ( j = 1 ∑ L P j m j T ) − 2 1 ( j = 1 ∑ L P j m j ) ( i = 1 ∑ L P i m i T ) + 2 1 ( i = 1 ∑ L P i ) j = 1 ∑ L P j m j m j T
其中所有概率相加爲一:
( ∑ j = 1 L P j ) = ( ∑ i = 1 L P i ) = 1
\left( \sum _ { j = 1 } ^ { L } P _ { j } \right) = \left( \sum _ { i = 1 } ^ { L } P _ { i } \right) = 1
( j = 1 ∑ L P j ) = ( i = 1 ∑ L P i ) = 1
同時:
P i m i = n i n x 1 ( i ) + x 2 ( i ) + ⋯ + x n i ( n ) n i = 1 n { x 1 ( i ) + x 2 ( i ) + ⋯ + x n i ( n ) }
\begin{aligned}
P _ { i } \boldsymbol m _ { i } & = \frac { n _ { i } } { n } \frac { x ^ { ( i ) }_ { 1 } + x ^ { ( i ) } _ { 2 } + \cdots + x _ { n _i } ^ { ( n ) } } { n _ { i } } \\& = \frac { 1 } { n } \left\{ x ^ { ( i ) }_ { 1 } + x ^ { ( i ) } _ { 2 } + \cdots + x _ { n _i } ^ { ( n ) } \right\}
\end{aligned}
P i m i = n n i n i x 1 ( i ) + x 2 ( i ) + ⋯ + x n i ( n ) = n 1 { x 1 ( i ) + x 2 ( i ) + ⋯ + x n i ( n ) }
那麼:
∑ i = 1 L P i m i = ∑ i = 1 L 1 n { x 1 ( i ) + x 2 ( i ) + ⋯ + x n i ( n ) } = 1 n { x 1 ( 1 ) + ⋯ + x n 1 ( 1 ) + x 1 ( 2 ) + ⋯ + x n 2 ( 2 ) + ⋯ + x 1 ( i ) + ⋯ + x n i ( i ) } = 1 n ∑ i = 1 L ∑ ℓ = 1 n i x ℓ ( i ) = m
\begin{aligned}
\sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m _ { i } & = \sum _ { i = 1 } ^ { L } \frac { 1 } { n } \left\{ x ^ { ( i ) }_ { 1 } + x ^ { ( i ) } _ { 2 } + \cdots + x _ { n _i } ^ { ( n ) } \right\} \\& = \frac { 1 } { n } \left\{ x ^ { ( 1 ) }_ { 1 } + \cdots + x _ { n _ { 1 } } ^ { ( 1 ) } + x ^ { ( 2 ) }_ { 1 }+ \cdots + x _ { n _ { 2 } } ^ { ( 2 ) } + \cdots + x _ { 1 } ^ { ( i ) } + \cdots + x _ { n _ { i } } ^ { ( i ) } \right\} \\ &= \frac { 1 } { n } \sum _ { i = 1 } ^ { L } \sum _ { \ell = 1 } ^ { n _ { i } } \boldsymbol { x } _ { \ell } ^ { ( i ) } = \boldsymbol { m }
\end{aligned}
i = 1 ∑ L P i m i = i = 1 ∑ L n 1 { x 1 ( i ) + x 2 ( i ) + ⋯ + x n i ( n ) } = n 1 { x 1 ( 1 ) + ⋯ + x n 1 ( 1 ) + x 1 ( 2 ) + ⋯ + x n 2 ( 2 ) + ⋯ + x 1 ( i ) + ⋯ + x n i ( i ) } = n 1 i = 1 ∑ L ℓ = 1 ∑ n i x ℓ ( i ) = m
所以做如下改寫:
S b L D A = 1 2 ∑ i = 1 L P i m i m i T ( ∑ j = 1 L P j ) − 1 2 ( ∑ i = 1 L P i m i ) ( ∑ j = 1 L P j m j T ) − 1 2 ( ∑ j = 1 L P j m j ) ( ∑ i = 1 L P i m i T ) + 1 2 ( ∑ i = 1 L P i ) ∑ j = 1 L P j m j m j T = 1 2 ∑ i = 1 L P i m i m i T − 1 2 m m T − 1 2 m m T + 1 2 ∑ j = 1 L P j m j m j T = ∑ i = 1 L P i m i m i T − m m T
\begin{aligned}
S _ { b } ^ { L D A }
& = \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } P _ { i} \boldsymbol { m } _ { i } \boldsymbol { m } _ { i }^ { T } \left( \sum _ { j = 1 } ^ { L } P _ { j } \right)
-\frac { 1 } { 2 } \left(\sum _ { i = 1 } ^ { L } P _ { i} \boldsymbol { m } _ { i }\right) \left(\sum _ { j = 1 } ^ { L } P _ { j } \boldsymbol { m } _ { j }^ { T } \right)
-\frac { 1 } { 2 } \left(\sum _ { j = 1 } ^ { L } P _ { j } \boldsymbol { m } _ { j } \right) \left(\sum _ { i = 1 } ^ { L } P _ { i} \boldsymbol { m } _ { i } ^ { T } \right)+ \frac { 1 } { 2 } \left( \sum _ { i = 1 } ^ { L } P _ { i } \right) \sum _ { j = 1 } ^ { L } P _ { j} \boldsymbol { m } _ { j } \boldsymbol { m } _ { j }^ { T } \\
& = \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } P _ { i} \boldsymbol { m } _ { i } \boldsymbol { m } _ { i }^ { T }
-\frac { 1 } { 2 } \boldsymbol { m } \boldsymbol { m } ^T -\frac { 1 } { 2 } \boldsymbol { m } \boldsymbol { m } ^T+
\frac { 1 } { 2 } \sum _ { j = 1 } ^ { L } P _ { j} \boldsymbol { m } _ { j } \boldsymbol { m } _ { j }^ { T } \\
& = \sum _ { i = 1 } ^ { L } P _ { i} \boldsymbol { m } _ { i } \boldsymbol { m } _ { i }^ { T }
-\boldsymbol { m } \boldsymbol { m } ^T
\end{aligned}
S b L D A = 2 1 i = 1 ∑ L P i m i m i T ( j = 1 ∑ L P j ) − 2 1 ( i = 1 ∑ L P i m i ) ( j = 1 ∑ L P j m j T ) − 2 1 ( j = 1 ∑ L P j m j ) ( i = 1 ∑ L P i m i T ) + 2 1 ( i = 1 ∑ L P i ) j = 1 ∑ L P j m j m j T = 2 1 i = 1 ∑ L P i m i m i T − 2 1 m m T − 2 1 m m T + 2 1 j = 1 ∑ L P j m j m j T = i = 1 ∑ L P i m i m i T − m m T
現在進行右邊的等式轉換:
∑ i = 1 L P i ( m i − m ) ( m i − m ) T = ∑ 1 = 1 L P i m i m i T − ∑ i = 1 L P i m i m T − ∑ i = 1 L P i m m i T + ∑ i = 1 L P i m m T = ∑ i = 1 L P i m i m i T − ( ∑ i = 1 L P i m i ) m T − m ( ∑ i = 1 L P i m i T ) + ( ∑ i = 1 L P i ) m m T = ∑ i = 1 L P i m i m i T − m m T − m m T + m m T = ∑ i = 1 L P i m i m i T − m m T
\begin{aligned}
& \sum _ { i = 1 } ^ { L } P _ { i } \left( \boldsymbol { m } _ { i } - \boldsymbol { m } \right) \left( \boldsymbol { m } _ { i } - \boldsymbol { m } \right) ^ { T }\\
= & \sum _ { 1 = 1 } ^ { L } P _ { i } \boldsymbol m _ { i } \boldsymbol m _ { i } ^T- \sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m _ { i } \boldsymbol m ^ T- \sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m \boldsymbol m _ { i } ^T + \sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m \boldsymbol m ^ T \\
= & \sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m _ i \boldsymbol m _ { i } ^T - \left( \sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m _ { i } \right) \boldsymbol m ^T- \boldsymbol m \left( \sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m _ { i } ^T\right) + \left(\sum _ { i = 1 } ^ { L } P _ { i } \right) \boldsymbol m \boldsymbol m ^ T \\
= & \sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m _ i \boldsymbol m _ { i } ^T - \boldsymbol m \boldsymbol m ^T- \boldsymbol m \boldsymbol m ^T + \boldsymbol m \boldsymbol m ^ T \\
= & \sum _ { i = 1 } ^ { L } P _ { i } \boldsymbol m _ i \boldsymbol m _ { i } ^T - \boldsymbol m \boldsymbol m ^T
\end{aligned}
= = = = i = 1 ∑ L P i ( m i − m ) ( m i − m ) T 1 = 1 ∑ L P i m i m i T − i = 1 ∑ L P i m i m T − i = 1 ∑ L P i m m i T + i = 1 ∑ L P i m m T i = 1 ∑ L P i m i m i T − ( i = 1 ∑ L P i m i ) m T − m ( i = 1 ∑ L P i m i T ) + ( i = 1 ∑ L P i ) m m T i = 1 ∑ L P i m i m i T − m m T − m m T + m m T i = 1 ∑ L P i m i m i T − m m T
所以兩者相等。即:
S b L D A = 1 2 ∑ i = 1 L ∑ j = 1 L P i P j ( m i − m j ) ( m i − m j ) T = ∑ i = 1 L P i ( m i − m ) ( m i − m ) T
S _ { b } ^ { L D A } = \frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \left( \boldsymbol { m } _ { i } - \boldsymbol { m } _ { j } \right) \left( \boldsymbol { m } _ { i } - \boldsymbol { m } _ { j } \right) ^ { T } = \sum _ { i = 1 } ^ { L } P _ { i } \left( \boldsymbol { m } _ { i } - \boldsymbol { m } \right) \left( \boldsymbol { m } _ { i } - \boldsymbol { m } \right) ^ { T }
S b L D A = 2 1 i = 1 ∑ L j = 1 ∑ L P i P j ( m i − m j ) ( m i − m j ) T = i = 1 ∑ L P i ( m i − m ) ( m i − m ) T
類內分散矩陣(Within-class Scatter Matrix)
全部類的方差之和爲:
∑ i = 1 L P i ∑ ℓ = 1 n i ( e T x ℓ ( i ) − m i ) 2 = ∑ i = 1 L P i ∑ l = 1 n i ( e T x l ( i ) − e T m i ) 2 = ∑ i = 1 L P i ∑ l = 1 n i ( e T x l ( i ) − e T m i ) ( e T x l ( i ) − e T m i ) T = e T { ∑ i = 1 L P i ∑ l = 1 n i ( x i ( i ) − m i ) ( x i ( i ) − m i ) T ⏟ S w L D A } e
\begin{aligned}
& \sum _ { i = 1 } ^ { L } { P } _ { i } \sum _ { \ell = 1 } ^ { n _ { i } } \left( e ^ { T } { x } _ { \ell } ^ { ( i ) } - m _ { i } \right) ^ { 2 } \\
= & \sum _ { i = 1 } ^ { L } P _ { i } \sum _ { l = 1 } ^ { n _ { i } } \left( e ^ { T } x _ { l } ^ { ( i ) } - e ^ { T} \boldsymbol m _ { i } \right) ^ { 2 } \\
= & \sum _ { i = 1 } ^ { L } P _ { i } \sum _ { l = 1 } ^ { n _ { i } } \left( e ^ { T} x _ { l } ^ { ( i ) } - e ^ { T} \boldsymbol m _ { i } \right) \left( e ^ { T} x _ { l } ^ { ( i ) } - e ^ { T} \boldsymbol m _ { i } \right) ^ T \\
= & e^ { T} \left\{ \underbrace { \sum _ { i = 1 } ^ { L } P _ { i } \sum _ { l = 1 } ^ { n _ { i } } \left( x _ { i } ^ { ( i ) } - \boldsymbol m _ { i } \right) \left( x _ { i } ^ { ( i ) } - \boldsymbol m _ { i } \right) ^ { T } }_{\boldsymbol { S } _ { w } ^ { L D A }}\right\} e
\end{aligned}
= = = i = 1 ∑ L P i ℓ = 1 ∑ n i ( e T x ℓ ( i ) − m i ) 2 i = 1 ∑ L P i l = 1 ∑ n i ( e T x l ( i ) − e T m i ) 2 i = 1 ∑ L P i l = 1 ∑ n i ( e T x l ( i ) − e T m i ) ( e T x l ( i ) − e T m i ) T e T ⎩ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎧ S w L D A i = 1 ∑ L P i l = 1 ∑ n i ( x i ( i ) − m i ) ( x i ( i ) − m i ) T ⎭ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎫ e
其中 S w L D A \boldsymbol S^{LDA}_w S w L D A 代表了類間分散矩陣(Within-class Scatter Matrix),其中下標 w w w 代表的 within。
S w L D A = ∑ i = 1 L P i ∑ ℓ = 1 n i ( x ℓ ( i ) − m i ) ( x ℓ ( i ) − m i ) T
\boldsymbol { S } _ { w } ^ { L D A } = \sum _ { i = 1 } ^ { L } P _ { i } \sum _ { \ell = 1 } ^ { n _ { i } } \left( \boldsymbol { x } _ { \ell } ^ { ( i ) } - \boldsymbol { m } _ { i } \right) \left( \boldsymbol { x } _ { \ell } ^ { ( i ) } - \boldsymbol { m } _ { i } \right) ^ { T }
S w L D A = i = 1 ∑ L P i ℓ = 1 ∑ n i ( x ℓ ( i ) − m i ) ( x ℓ ( i ) − m i ) T
希望類(組)間距離越大越好:
1 2 ∑ i = 1 L ∑ j = 1 L P i P j ( m i − m j ) 2 = e T ( ∑ i = 1 L P i ( m i − m ) ( m i − m ) T ) e = e T S b L D A e
\frac { 1 } { 2 } \sum _ { i = 1 } ^ { L } \sum _ { j = 1 } ^ { L } P _ { i } P _ { j } \left( m _ { i } - m _ { j } \right) ^ { 2 } = e ^ { T } \left( \sum _ { i = 1 } ^ { L } P _ { i } \left( \boldsymbol { m } _ { i } - \boldsymbol { m } \right) \left( \boldsymbol { m } _ { i } - \boldsymbol { m } \right) ^ { T } \right) e = e^T \boldsymbol { S } _ { b } ^ { L D A } e
2 1 i = 1 ∑ L j = 1 ∑ L P i P j ( m i − m j ) 2 = e T ( i = 1 ∑ L P i ( m i − m ) ( m i − m ) T ) e = e T S b L D A e
希望類(組)間距離越小越好:
∑ i = 1 L P i ∑ ℓ = 1 n i ( e T x ℓ ( i ) − m i ) 2 = e T ( ∑ i = 1 L P i ∑ ℓ = 1 n i ( x ℓ ( i ) − m i ) ( x ℓ ( i ) − m i ) T ) e = e T S w L D A e
\sum _ { i = 1 } ^ { L } P _ { i } \sum _ { \ell = 1 } ^ { n _ { i } } \left( e ^ { T } x _ { \ell } ^ { ( i ) } - m _ { i } \right) ^ { 2 } = e ^ { T } { \left( \sum _ { i = 1 } ^ { L } P _ { i } \sum _ { \ell = 1 } ^ { n _ { i } } \left( x _ { \ell } ^ { ( i ) } - m _ { i } \right) \left( x _ { \ell } ^ { ( i ) } - m _ { i } \right) ^ { T } \right) } e = e^ T \boldsymbol { S } _ { w } ^ { L D A } e
i = 1 ∑ L P i ℓ = 1 ∑ n i ( e T x ℓ ( i ) − m i ) 2 = e T ( i = 1 ∑ L P i ℓ = 1 ∑ n i ( x ℓ ( i ) − m i ) ( x ℓ ( i ) − m i ) T ) e = e T S w L D A e
所以兩個不能分開來看,所以選擇目標函數的表示爲:
e = arg max e ∈ R d e T S b L D A e e T S w L D A e
e = \arg \max _ { e \in R ^ { d } } \frac { e ^ { T } S _ { b } ^ { L D A } e } { e ^ { T } S _ { w } ^ { L D A } e }
e = arg e ∈ R d max e T S w L D A e e T S b L D A e
來保證分子最大,分母最小,也就是說 組間距離/組內距離
越大越好。
現在引入一個概念:Rayleigh Quotient
r ( e ) = e T S b L D A e e T S w L D A e
r ( { e } ) = \frac { { e } ^ { T } { S } _ { b } ^ { L D A } { e } } { { e } ^ { T } { S } _ { w } ^ { L D A } { e } }
r ( e ) = e T S w L D A e e T S b L D A e
那麼凸最優解,就是求導爲零:
∇ r ( e ) = 1 ( e T S w L D A e ) 2 { ( e T S w L D A e ) ( 2 S b L D A e ) − ( e T S b L D A e ) ( 2 S w L D A e ) } = 0 = 1 e T S w L D A e { 1 ⋅ 2 S b L D A e − e T S b L D A e e T S w L D A e 2 S W L D A e ⏟ 0 }
\begin{aligned}
\nabla r ( \boldsymbol { e } ) & = \frac { 1 } { \left( \boldsymbol { e } ^ { T } \boldsymbol { S } _ { w } ^ { L D A } \boldsymbol { e } \right) ^ { 2 } } \left\{ \left( \boldsymbol { e } ^ { T } \boldsymbol { S } _ { w } ^ { L D A } \boldsymbol { e } \right) \left( 2 \boldsymbol { S } _ { b } ^ { L D A } \boldsymbol { e } \right) - \left( \boldsymbol { e } ^ { T } \boldsymbol { S } _ { b } ^ { L D A } \boldsymbol { e } \right) \left( 2 \boldsymbol { S } _ { w } ^ { L D A } \boldsymbol { e } \right) \right\} = 0\\
& = \frac { 1 } { e ^ { T } S _ { w } ^ { LDA } e } \left\{\underbrace {1 \cdot 2 S _ { b } ^ { L D A } e - \frac { e ^ { T} S _ { b } ^ { L D A } e } { e ^ { T} S _ { w } ^ { L D A} e } 2 S _ { W } ^ { L D A } e}_{0} \right\}
\end{aligned}
∇ r ( e ) = ( e T S w L D A e ) 2 1 { ( e T S w L D A e ) ( 2 S b L D A e ) − ( e T S b L D A e ) ( 2 S w L D A e ) } = 0 = e T S w L D A e 1 ⎩ ⎪ ⎪ ⎨ ⎪ ⎪ ⎧ 0 1 ⋅ 2 S b L D A e − e T S w L D A e e T S b L D A e 2 S W L D A e ⎭ ⎪ ⎪ ⎬ ⎪ ⎪ ⎫
也就是說
S b L D A e = e T S b L D A e e T S w L D A e S w L D A e = r ( e ) S w L D A e
S _ { b } ^ { L D A } e = \frac { e ^ { T} S _ { b } ^ { L D A } e } { e ^ { T} S _ { w } ^ { L D A} e } S _ { w } ^ { L D A } e = r(e) S _ { w } ^ { L D A } e
S b L D A e = e T S w L D A e e T S b L D A e S w L D A e = r ( e ) S w L D A e
這是一種廣義特徵值問題(generalized eigenvalue problem),即:
S b L D A e = λ S w L D A e
\boldsymbol { S } _ { b } ^ { L D A } \boldsymbol { e } = \lambda \boldsymbol { S } _ { w } ^ { L D A } \boldsymbol { e }
S b L D A e = λ S w L D A e
這裏的特徵向量不在是原來的特徵向量而是原來的特徵向量乘以一個矩陣,所以是在求 S b L D A \boldsymbol { S } _ { b } ^ { L D A } S b L D A 的廣義特徵值 λ \lambda λ 。
工程上的解法是,做如下轉換:
pinv ( S w L D A ) S b L D A e = λ e
\text{pinv} (\boldsymbol { S } _ { w } ^ { L D A } )\boldsymbol { S } _ { b } ^ { L D A } \boldsymbol { e } = \lambda \boldsymbol { e }
pinv ( S w L D A ) S b L D A e = λ e
然後使用特徵值求解方法。