爲什麼需要卡爾曼濾波?
看上圖,這其實是一個典型的測量模型,我們設y是觀測到的值,x是隱變量。舉個例子,x表示火箭燃料溫度,可惜的是,燃料內部的溫度太高,我們沒有辦法直接測量,只能測量他火箭外圍的溫度y,因此每一步的測量都伴隨着隨機誤差,那麼如何僅使用觀測到的數據y來預測真實的x,這就是卡爾曼濾波(filter)所做的事情。
State Space Model
這個圖模型有兩類概率,第一類是p ( y y ∣ x t ) \displaystyle p( y_{y} |x_{t}) p ( y y ∣ x t ) ,稱爲measurement probability,或者emission probability,另外還有p ( x t ∣ x t − 1 ) \displaystyle p( x_{t} |x_{t-1}) p ( x t ∣ x t − 1 ) ,稱爲轉移概率,再加上一個初始概率p ( x 1 ) \displaystyle p( x_{1}) p ( x 1 ) ,就可以完全表示這個state space model了。通過定義這幾個概率的形式,就可以得到不同的模型:
HMM模型:(i)p ( x t ∣ x t − 1 ) = A x t − 1 , t \displaystyle p( x_{t} |x_{t-1}) =A_{x_{t-1,t}} p ( x t ∣ x t − 1 ) = A x t − 1 , t ,A一個離散的轉移矩陣,x是離散的。(ii)P ( y t ∣ x t ) \displaystyle P( y_{t} |x_{t}) P ( y t ∣ x t ) 是任意的。(iii)p ( x 1 ) = π \displaystyle p( x_{1}) =\pi p ( x 1 ) = π
線性高斯SSM:(i)p ( x t ∣ x t − 1 ) = N ( A x t − 1 + B , Q ) \displaystyle p( x_{t} |x_{t-1}) =\mathcal{N}( Ax_{t-1} +B,Q) p ( x t ∣ x t − 1 ) = N ( A x t − 1 + B , Q ) (ii)P ( y t ∣ x t ) = N ( H x t − 1 + C , R ) \displaystyle P( y_{t} |x_{t}) =\mathcal{N}( Hx_{t-1} +C,R) P ( y t ∣ x t ) = N ( H x t − 1 + C , R ) 。(iii)p ( x 1 ) = N ( μ 0 , σ 0 ) \displaystyle p( x_{1}) =\mathcal{N}( \mu _{0} ,\sigma _{0}) p ( x 1 ) = N ( μ 0 , σ 0 )
對於上述dynamic模型或者State Space Model(SSM)來說,主要有4種任務:
Evaluation:p ( y 1 , y 2 , . . . , y t ) \displaystyle p( y_{1} ,y_{2} ,...,y_{t}) p ( y 1 , y 2 , . . . , y t )
參數學習:arg max θ log p ( y 1 , y 2 , . . . , y ∣ θ ) \displaystyle \arg\max_{\theta }\log p( y_{1} ,y_{2} ,...,y|\theta ) arg θ max log p ( y 1 , y 2 , . . . , y ∣ θ )
State decoding: p ( x 1 , x 2 , . . . , x t ∣ y 1 , y 2 , . . . , y t ) \displaystyle p( x_{1} ,x_{2} ,...,x_{t} |y_{1} ,y_{2} ,...,y_{t}) p ( x 1 , x 2 , . . . , x t ∣ y 1 , y 2 , . . . , y t )
Filtering:p ( x t ∣ y 1 , y 2 , . . . , y t ) \displaystyle p( x_{t} |y_{1} ,y_{2} ,...,y_{t}) p ( x t ∣ y 1 , y 2 , . . . , y t )
其實HMM和線性高斯SSM都可以做這4種任務,但是,HMM可能會更多的涉及第1,2類任務,而線性高斯SSM更多是涉及濾波任務,現在我們要講的卡爾曼濾波就是做第4個任務filtering.
現在,我們形式化寫一下線性高斯SSM的定義:
x t = A x t − 1 + B + w , w ∼ N ( 0 , Q ) y t = H x t + C + v , v ∼ N ( 0 , R ) c o v ( x t − 1 , w ) = 0 , c o v ( x t , v ) = 0 , c o v ( w , v ) = 0
x_{t} =Ax_{t-1} +B+w,\ w\sim N( 0,Q)\\
y_{t} =Hx_{t} +C+v,\ v\sim N( 0,R)\\
cov( x_{t-1} ,w) =0,cov( x_{t} ,v) =0,cov( w,v) =0
x t = A x t − 1 + B + w , w ∼ N ( 0 , Q ) y t = H x t + C + v , v ∼ N ( 0 , R ) c o v ( x t − 1 , w ) = 0 , c o v ( x t , v ) = 0 , c o v ( w , v ) = 0
於是p ( x t ∣ x t − 1 ) = N ( A x t − 1 + B , Q ) , P ( y t ∣ x t ) = N ( H x t − 1 + C , R ) \displaystyle p( x_{t} |x_{t-1}) =\mathcal{N}( Ax_{t-1} +B,Q) ,P( y_{t} |x_{t}) =\mathcal{N}( Hx_{t-1} +C,R) p ( x t ∣ x t − 1 ) = N ( A x t − 1 + B , Q ) , P ( y t ∣ x t ) = N ( H x t − 1 + C , R ) ,爲了推導的方便,我們暫時把B,C去掉。
卡爾曼濾波數學推導
現在我們看看怎麼做濾波。考慮這個濾波的概率分佈:
p ( x t ∣ y 1 , y 2 , . . . , y t ) ⏟ f i l t e r i n g a t t = p ( x t , y 1 , y 2 , . . . , y t ) p ( y 1 , y 2 , . . . , y t ) = p ( y t ∣ x t , y 1 , y 2 , . . . , y t − 1 ) p ( x t , y 1 , y 2 , . . . , y t − 1 ) p ( y 1 , y 2 , . . . , y t ) = p ( y t ∣ x t ) p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) p ( y 1 , y 2 , . . . , y t − 1 ) p ( y 1 , y 2 , . . . , y t ) ∝ p ( y t ∣ x t ) p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) ⏟ P r e d i c t i o n a t t
\begin{aligned}
\underbrace{p( x_{t} |y_{1} ,y_{2} ,...,y_{t})}_{filtering\ at\ t} & =\frac{p( x_{t} ,y_{1} ,y_{2} ,...,y_{t})}{p( y_{1} ,y_{2} ,...,y_{t})}\\
& =\frac{p( y_{t} |x_{t} ,y_{1} ,y_{2} ,...,y_{t-1}) p( x_{t} ,y_{1} ,y_{2} ,...,y_{t-1})}{p( y_{1} ,y_{2} ,...,y_{t})}\\
& =\frac{p( y_{t} |x_{t}) p( x_{t} |y_{1} ,y_{2} ,...,y_{t-1}) p( y_{1} ,y_{2} ,...,y_{t-1})}{p( y_{1} ,y_{2} ,...,y_{t})}\\
& \varpropto p( y_{t} |x_{t})\underbrace{p( x_{t} |y_{1} ,y_{2} ,...,y_{t-1})}_{Prediction\ at\ t}
\end{aligned}
f i l t e r i n g a t t p ( x t ∣ y 1 , y 2 , . . . , y t ) = p ( y 1 , y 2 , . . . , y t ) p ( x t , y 1 , y 2 , . . . , y t ) = p ( y 1 , y 2 , . . . , y t ) p ( y t ∣ x t , y 1 , y 2 , . . . , y t − 1 ) p ( x t , y 1 , y 2 , . . . , y t − 1 ) = p ( y 1 , y 2 , . . . , y t ) p ( y t ∣ x t ) p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) p ( y 1 , y 2 , . . . , y t − 1 ) ∝ p ( y t ∣ x t ) P r e d i c t i o n a t t p ( x t ∣ y 1 , y 2 , . . . , y t − 1 )
那個正比是因爲觀測值y是確定的,所以p(y)可以看作常數,於是我們發現,一個濾波,其實是prediction和一個生成概率的乘積,而如果我們把prediction展開來寫:
p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) ⏟ p r e d i c t i o n a t t = ∫ p ( x t ∣ x t − 1 ) p ( x t − 1 ∣ y 1 , y 2 , . . . , y t − 1 ) ⏟ f i l t e r i n g a t t − 1 d x t − 1 = N ( A E ( x t − 1 ∣ y 1 , y 2 , . . . , y t − 1 ) , A Σ ^ t − 1 A T + Q )
\underbrace{p( x_{t} |y_{1} ,y_{2} ,...,y_{t-1})}_{prediction\ at\ t} =\int p( x_{t} |x_{t-1})\underbrace{p( x_{t-1} |y_{1} ,y_{2} ,...,y_{t-1})}_{filtering\ at\ t-1} dx_{t-1} =N\left( AE( x_{t-1} |y_{1} ,y_{2} ,...,y_{t-1}) ,A\hat{\Sigma }_{t-1} A^{T} +Q\right)
p r e d i c t i o n a t t p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) = ∫ p ( x t ∣ x t − 1 ) f i l t e r i n g a t t − 1 p ( x t − 1 ∣ y 1 , y 2 , . . . , y t − 1 ) d x t − 1 = N ( A E ( x t − 1 ∣ y 1 , y 2 , . . . , y t − 1 ) , A Σ ^ t − 1 A T + Q )
神奇的事情發生了,那就是我們的prediction恰好可以用上一時刻的濾波來算,這就形成了一個遞歸的過程,只要我們迭代地來算,那麼整個濾波的都可以算出來!所以,基本套路就是,
計算t時刻的prediction,然後t時刻的prediction就用到t-1時刻的filtering,
計算t時刻的filtering,而t時刻的filtering又用到了t時刻的prediction。
那麼這個迭代的過程就是從t=1開始,往下迭代計算:
t = 1 : ( F i l t e r ) p ( x 1 ∣ y 1 ) ∼ N ( μ ^ 1 , σ ^ 1 ) t = 2 : ( P r e d i c t ) p ( x 2 ∣ y 1 ) ∼ N ( μ ‾ 2 , σ ‾ 2 ) ( F i l t e r ) p ( x 2 ∣ y 1 , y 2 ) ∼ N ( μ ^ 2 , σ ^ 2 ) t = 3 : ( P r e d i c t ) p ( x 3 ∣ y 1 , y 2 ) ∼ N ( μ ‾ 3 , σ ‾ 3 ) ( F i l t e r ) p ( x 3 ∣ y 1 , y 2 , y 3 ) ∼ N ( μ ^ 3 , σ ^ 3 )
\begin{aligned}
t=1: & ( Filter) & p( x_{1} |y_{1}) \sim N\left(\hat{\mu }_{1} ,\hat{\sigma }_{1}\right)\\
t=2: & ( Predict) & p( x_{2} |y_{1}) \sim N\left(\overline{\mu }_{2} ,\overline{\sigma }_{2}\right)\\
& ( Filter) & p( x_{2} |y_{1} ,y_{2}) \sim N\left(\hat{\mu }_{2} ,\hat{\sigma }_{2}\right)\\
t=3: & ( Predict) & p( x_{3} |y_{1} ,y_{2}) \sim N\left(\overline{\mu }_{3} ,\overline{\sigma }_{3}\right)\\
& ( Filter) & p( x_{3} |y_{1} ,y_{2} ,y_{3}) \sim N\left(\hat{\mu }_{3} ,\hat{\sigma }_{3}\right)
\end{aligned}
t = 1 : t = 2 : t = 3 : ( F i l t e r ) ( P r e d i c t ) ( F i l t e r ) ( P r e d i c t ) ( F i l t e r ) p ( x 1 ∣ y 1 ) ∼ N ( μ ^ 1 , σ ^ 1 ) p ( x 2 ∣ y 1 ) ∼ N ( μ 2 , σ 2 ) p ( x 2 ∣ y 1 , y 2 ) ∼ N ( μ ^ 2 , σ ^ 2 ) p ( x 3 ∣ y 1 , y 2 ) ∼ N ( μ 3 , σ 3 ) p ( x 3 ∣ y 1 , y 2 , y 3 ) ∼ N ( μ ^ 3 , σ ^ 3 )
首先記住一個事實,只要隨機變量組成的聯合分佈是高斯分佈,那麼這些變量的邊緣概率分佈,或者條件概率分佈,都是服從高斯分佈的。在這裏,顯然由於SSM服從線性高斯的模型,所以上面的這些條件概率分佈都是服從高斯分佈的。此外再記一個事實,當聯合分佈是高斯分佈的時候,條件概率分佈的高斯分佈是這樣的:
( x 1 x 2 ) ∼ N ( ( μ 1 μ 2 ) , ( Σ 11 Σ 12 Σ 21 Σ 22 ) ) p ( x 1 ∣ x 2 ) ∼ N ( μ 1 + Σ 12 Σ 22 − 1 ( x 2 − μ 2 ) , Σ 11 − Σ 12 Σ 22 − 1 Σ 21 )
\left(\begin{array}{ c }
\mathbf{x}_{1}\\
\mathbf{x}_{2}
\end{array}\right) \sim N\left(\left(\begin{array}{ c }
\boldsymbol{\mu }_{1}\\
\boldsymbol{\mu }_{2}
\end{array}\right) ,\left(\begin{array}{ c c }
\boldsymbol{\Sigma }_{11} & \boldsymbol{\Sigma }_{12}\\
\boldsymbol{\Sigma }_{21} & \mathbf{\Sigma }_{22}
\end{array}\right)\right)\\
p(\mathbf{x}_{1} |\mathbf{x}_{2}) \sim N\left(\boldsymbol{\mu }_{1} +\boldsymbol{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2}) ,\boldsymbol{\Sigma }_{11} -\boldsymbol{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}\boldsymbol{\Sigma }_{21}\right)
( x 1 x 2 ) ∼ N ( ( μ 1 μ 2 ) , ( Σ 1 1 Σ 2 1 Σ 1 2 Σ 2 2 ) ) p ( x 1 ∣ x 2 ) ∼ N ( μ 1 + Σ 1 2 Σ 2 2 − 1 ( x 2 − μ 2 ) , Σ 1 1 − Σ 1 2 Σ 2 2 − 1 Σ 2 1 )
那好現在,我們回顧一下,我們的目標就是要推導出一下兩個公式的具體形式:
p r e d i c t i o n : p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) f i l t e r i n g : p ( x t ∣ y 1 , y 2 , . . . , y t )
prediction:\ p( x_{t} |y_{1} ,y_{2} ,...,y_{t-1})\\
filtering:\ p( x_{t} |y_{1} ,y_{2} ,...,y_{t})
p r e d i c t i o n : p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) f i l t e r i n g : p ( x t ∣ y 1 , y 2 , . . . , y t )
我們先考慮prediction,想要預測t時刻f的真實狀態x t \displaystyle x_{t} x t ,那麼根據SSM的定義,自然是需要t-1時刻的真實值x t − 1 \displaystyle x_{t-1} x t − 1 來預測的,而這個真實值是用t-1時刻的filtering得到的。那麼t-1時刻的真實值(filter)是怎樣的呢?設t-1時刻的filter爲p ( x t − 1 ∣ y 1 , y 2 , . . . , y t − 1 ) ∼ N ( μ ^ t − 1 , σ ^ t − 1 ) \displaystyle p( x_{t-1} |y_{1} ,y_{2} ,...,y_{t-1}) \sim N\left(\hat{\mu }_{t-1} ,\hat{\sigma }_{t-1}\right) p ( x t − 1 ∣ y 1 , y 2 , . . . , y t − 1 ) ∼ N ( μ ^ t − 1 , σ ^ t − 1 ) ,對其做重參數化:
x t − 1 ∣ y 1 , y 2 , . . . , y t − 1 = E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) + Δ x t − 1 , Δ x t − 1 ∼ N ( 0 , σ ^ t − 1 )
x_{t-1} |y_{1} ,y_{2} ,...,y_{t-1} =E( x_{t-1} |y_{1} ,...,y_{t-1}) +\Delta x_{t-1} ,\ \Delta x_{t-1} \sim N\left( 0,\hat{\sigma }_{t-1}\right)
x t − 1 ∣ y 1 , y 2 , . . . , y t − 1 = E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) + Δ x t − 1 , Δ x t − 1 ∼ N ( 0 , σ ^ t − 1 )
因爲這個分佈是高斯分佈,所以我們將他可以寫成是均值加上一個隨機變量來表示。於是t時刻的預測值可以通過下面的公式進行計算:
x t ∣ y 1 , y 2 , . . . , y t − 1 = A x t − 1 + w = A ( E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) + Δ x t − 1 ) + w = A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) ⏟ E ( x t ∣ y 1 , . . . , y t − 1 ) + A Δ x t − 1 + w ⏟ Δ x t
\begin{aligned}
x_{t} |y_{1} ,y_{2} ,...,y_{t-1} & =Ax_{t-1} +w\\
& =A( E( x_{t-1} |y_{1} ,...,y_{t-1}) +\Delta x_{t-1}) +w\\
& =\underbrace{AE( x_{t-1} |y_{1} ,...,y_{t-1})}_{E( x_{t} |y_{1} ,...,y_{t-1})} +\underbrace{A\Delta x_{t-1} +w}_{\Delta x_{t}}
\end{aligned}
x t ∣ y 1 , y 2 , . . . , y t − 1 = A x t − 1 + w = A ( E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) + Δ x t − 1 ) + w = E ( x t ∣ y 1 , . . . , y t − 1 ) A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) + Δ x t A Δ x t − 1 + w
好了現在t時刻prediction有了,那麼t時刻filtering怎麼得到呢?剛纔介紹的用聯合高斯分佈來求條件概率分佈的技巧就可以用上了,我們發現,如果能夠寫出這個聯合分佈的形式:
p ( x t , y t ∣ y 1 , y 2 , . . . , y t − 1 )
p( x_{t} ,y_{t} |y_{1} ,y_{2} ,...,y_{t-1})
p ( x t , y t ∣ y 1 , y 2 , . . . , y t − 1 )
那不就能夠用公式算出p ( x t ∣ y 1 , y 2 , . . . , y t − 1 , y t ) \displaystyle p( x_{t} |y_{1} ,y_{2} ,...,y_{t-1} ,y_{t}) p ( x t ∣ y 1 , y 2 , . . . , y t − 1 , y t ) 的形式了嗎?所以爲了知道這個聯合分佈,我們還需要知道分佈p ( y t ∣ y 1 , y 2 , . . . , y t − 1 ) \displaystyle p( y_{t} |y_{1} ,y_{2} ,...,y_{t-1}) p ( y t ∣ y 1 , y 2 , . . . , y t − 1 ) 的形式,於是同樣的套路:
y t ∣ y 1 , y 2 , . . . , y t − 1 = H x t + v = H ( A x t − 1 + w ) + v = H ( A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) + A Δ x t − 1 + w ) + v = H A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) ⏟ E ( y t ∣ y 1 , y 2 , . . . , y t − 1 ) + H A Δ x t − 1 + H w + v ⏟ Δ y t
\begin{aligned}
y_{t} |y_{1} ,y_{2} ,...,y_{t-1} & =Hx_{t} +v\\
& =H( Ax_{t-1} +w) +v\\
& =H( AE( x_{t-1} |y_{1} ,...,y_{t-1}) +A\Delta x_{t-1} +w) +v\\
& =\underbrace{HAE( x_{t-1} |y_{1} ,...,y_{t-1})}_{E( y_{t} |y_{1} ,y_{2} ,...,y_{t-1})} +\underbrace{HA\Delta x_{t-1} +Hw+v}_{\Delta y_{t}}
\end{aligned}
y t ∣ y 1 , y 2 , . . . , y t − 1 = H x t + v = H ( A x t − 1 + w ) + v = H ( A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) + A Δ x t − 1 + w ) + v = E ( y t ∣ y 1 , y 2 , . . . , y t − 1 ) H A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) + Δ y t H A Δ x t − 1 + H w + v
現在我們已經知道了兩個高斯的概率形式了:
p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) ∼ N ( A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) , E ( Δ x t Δ x t T ) ⏟ Σ ‾ t ) p ( y t ∣ y 1 , y 2 , . . . , y t − 1 ) ∼ N ( H A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) , E ( Δ y t Δ y t T ) ⏟ Σ ^ t )
p( x_{t} |y_{1} ,y_{2} ,...,y_{t-1}) \sim N\left( AE( x_{t-1} |y_{1} ,...,y_{t-1}) ,\underbrace{E\left( \Delta x_{t} \Delta x^{T}_{t}\right)}_{\overline{\Sigma }_{t}}\right)\\
p( y_{t} |y_{1} ,y_{2} ,...,y_{t-1}) \sim N\left( HAE( x_{t-1} |y_{1} ,...,y_{t-1}) ,\underbrace{E\left( \Delta y_{t} \Delta y^{T}_{t}\right)}_{\hat{\Sigma }_{t}}\right)
p ( x t ∣ y 1 , y 2 , . . . , y t − 1 ) ∼ N ⎝ ⎜ ⎛ A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) , Σ t E ( Δ x t Δ x t T ) ⎠ ⎟ ⎞ p ( y t ∣ y 1 , y 2 , . . . , y t − 1 ) ∼ N ⎝ ⎜ ⎛ H A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) , Σ ^ t E ( Δ y t Δ y t T ) ⎠ ⎟ ⎞
所以
p ( x t , y t ∣ y 1 , y 2 , . . . , y t − 1 ) = N ( ( A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) H A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) ) , ( E ( Δ x t Δ x t T ) E ( Δ x t Δ y t T ) E ( Δ y t T Δ x t T ) E ( Δ y t Δ y t T ) ) )
p( x_{t} ,y_{t} |y_{1} ,y_{2} ,...,y_{t-1}) =N\left(\left(\begin{array}{ c }
AE( x_{t-1} |y_{1} ,...,y_{t-1})\\
HAE( x_{t-1} |y_{1} ,...,y_{t-1})
\end{array}\right) ,\left(\begin{array}{ c c }
E\left( \Delta x_{t} \Delta x^{T}_{t}\right) & E\left( \Delta x_{t} \Delta y^{T}_{t}\right)\\
E\left( \Delta y^{T}_{t} \Delta x^{T}_{t}\right) & E\left( \Delta y_{t} \Delta y^{T}_{t}\right)
\end{array}\right)\right)
p ( x t , y t ∣ y 1 , y 2 , . . . , y t − 1 ) = N ( ( A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) H A E ( x t − 1 ∣ y 1 , . . . , y t − 1 ) ) , ( E ( Δ x t Δ x t T ) E ( Δ y t T Δ x t T ) E ( Δ x t Δ y t T ) E ( Δ y t Δ y t T ) ) )
現在,我們終於能夠算 p ( x t ∣ y 1 , y 2 , . . . , y t ) \displaystyle \ p( x_{t} |y_{1} ,y_{2} ,...,y_{t}) p ( x t ∣ y 1 , y 2 , . . . , y t ) 了,剩下的問題就是的他協方差是什麼,我們可以化簡一下:
Δ x t Δ x t T = ( A Δ x t − 1 + w ) ( A Δ x t − 1 + w ) T = ( A Δ x t − 1 + w ) ( Δ x t − 1 T A T + w T ) = A Δ x t − 1 Δ x t − 1 T A T + w Δ x t − 1 T A T + w Δ x t − 1 T A T ⏟ c o v = 0 + w w T E ( Δ x t Δ x t T ) = A E ( Δ x t − 1 Δ x t − 1 T ) A T + E ( w w T ) = A Σ ^ t − 1 A T + Q
\begin{aligned}
\Delta x_{t} \Delta x^{T}_{t} & =( A\Delta x_{t-1} +w)( A\Delta x_{t-1} +w)^{T}\\
& =( A\Delta x_{t-1} +w)\left( \Delta x^{T}_{t-1} A^{T} +w^{T}\right)\\
& =A\Delta x_{t-1} \Delta x^{T}_{t-1} A^{T} +\underbrace{w\Delta x^{T}_{t-1} A^{T} +w\Delta x^{T}_{t-1} A^{T}}_{cov=0} +ww^{T}\\
E\left( \Delta x_{t} \Delta x^{T}_{t}\right) & =AE\left( \Delta x_{t-1} \Delta x^{T}_{t-1}\right) A^{T} +E\left( ww^{T}\right)\\
& =A\hat{\Sigma }_{t-1} A^{T} +Q
\end{aligned}
Δ x t Δ x t T E ( Δ x t Δ x t T ) = ( A Δ x t − 1 + w ) ( A Δ x t − 1 + w ) T = ( A Δ x t − 1 + w ) ( Δ x t − 1 T A T + w T ) = A Δ x t − 1 Δ x t − 1 T A T + c o v = 0 w Δ x t − 1 T A T + w Δ x t − 1 T A T + w w T = A E ( Δ x t − 1 Δ x t − 1 T ) A T + E ( w w T ) = A Σ ^ t − 1 A T + Q
同理,對於y
Δ y t Δ y t T = ( H A Δ x t − 1 + H w + v ) ( H A Δ x t − 1 + H w + v ) T = ( H A Δ x t − 1 + H w + v ) ( Δ x t − 1 T A T H T + w T H T + v T ) = H A Δ x t − 1 Δ x t − 1 T A T H T + H w w T H T + v v T + . . . ⏟ c o v = 0 E ( Δ y t Δ y t T ) = H A ( Δ x t − 1 Δ x t − 1 T ) A T H T + H E ( w w T ) H T + E ( v v T ) = H A Σ ^ t − 1 A T H T + H Q H T + R
\begin{aligned}
\Delta y_{t} \Delta y^{T}_{t} & =( HA\Delta x_{t-1} +Hw+v)( HA\Delta x_{t-1} +Hw+v)^{T}\\
& =( HA\Delta x_{t-1} +Hw+v)\left( \Delta x^{T}_{t-1} A^{T} H^{T} +w^{T} H^{T} +v^{T}\right)\\
& =HA\Delta x_{t-1} \Delta x^{T}_{t-1} A^{T} H^{T} +Hww^{T} H^{T} +vv^{T} +\underbrace{...}_{cov=0}\\
E\left( \Delta y_{t} \Delta y^{T}_{t}\right) & =HA\left( \Delta x_{t-1} \Delta x^{T}_{t-1}\right) A^{T} H^{T} +HE\left( ww^{T}\right) H^{T} +E\left( vv^{T}\right)\\
& =HA\hat{\Sigma }_{t-1} A^{T} H^{T} +HQH^{T} +R
\end{aligned}
Δ y t Δ y t T E ( Δ y t Δ y t T ) = ( H A Δ x t − 1 + H w + v ) ( H A Δ x t − 1 + H w + v ) T = ( H A Δ x t − 1 + H w + v ) ( Δ x t − 1 T A T H T + w T H T + v T ) = H A Δ x t − 1 Δ x t − 1 T A T H T + H w w T H T + v v T + c o v = 0 . . . = H A ( Δ x t − 1 Δ x t − 1 T ) A T H T + H E ( w w T ) H T + E ( v v T ) = H A Σ ^ t − 1 A T H T + H Q H T + R
對於交叉項:
Δ x t Δ y t T = ( A Δ x t − 1 + w ) ( H A Δ x t − 1 + H w + v ) T = ( A Δ x t − 1 + w ) ( Δ x t − 1 T A T H T + w T H T + v T ) = A Δ x t − 1 Δ x t − 1 T A T H T + w w T H T + . . . ⏟ c o v = 0 E ( Δ y t Δ y t T ) = A ( Δ x t − 1 Δ x t − 1 T ) A T H T + E ( w w T ) H T = ( A Σ ^ t − 1 A T + Q ) H T = Σ ‾ t H T
\begin{aligned}
\Delta x_{t} \Delta y^{T}_{t} & =( A\Delta x_{t-1} +w)( HA\Delta x_{t-1} +Hw+v)^{T}\\
& =( A\Delta x_{t-1} +w)\left( \Delta x^{T}_{t-1} A^{T} H^{T} +w^{T} H^{T} +v^{T}\right)\\
& =A\Delta x_{t-1} \Delta x^{T}_{t-1} A^{T} H^{T} +ww^{T} H^{T} +\underbrace{...}_{cov=0}\\
E\left( \Delta y_{t} \Delta y^{T}_{t}\right) & =A\left( \Delta x_{t-1} \Delta x^{T}_{t-1}\right) A^{T} H^{T} +E\left( ww^{T}\right) H^{T}\\
& =\left( A\hat{\Sigma }_{t-1} A^{T} +Q\right) H^{T}\\
& =\overline{\Sigma }_{t} H^{T}
\end{aligned}
Δ x t Δ y t T E ( Δ y t Δ y t T ) = ( A Δ x t − 1 + w ) ( H A Δ x t − 1 + H w + v ) T = ( A Δ x t − 1 + w ) ( Δ x t − 1 T A T H T + w T H T + v T ) = A Δ x t − 1 Δ x t − 1 T A T H T + w w T H T + c o v = 0 . . . = A ( Δ x t − 1 Δ x t − 1 T ) A T H T + E ( w w T ) H T = ( A Σ ^ t − 1 A T + Q ) H T = Σ t H T
至此推導完成,我們可以不停地迭代計算卡爾曼濾波了!總結一下,基本流程就是f i l t e r i n g 1 → p r e d i c t i o n 2 → f i l t e r i n g 2 → p r e d i c t i o n 3 → . . . \displaystyle filtering_{1}\rightarrow prediction_{2}\rightarrow filtering_{2}\rightarrow prediction_{3}\rightarrow ... f i l t e r i n g 1 → p r e d i c t i o n 2 → f i l t e r i n g 2 → p r e d i c t i o n 3 → . . .
附錄:多元高斯分佈
假設x = ( x 1 , x 2 ) \mathbf{x} =(\mathbf{x}_{1} ,\mathbf{x}_{2}) x = ( x 1 , x 2 ) 是聯合高斯分佈,並且參數爲:
μ = ( μ 1 μ 2 ) , Σ = ( Σ 11 Σ 12 Σ 21 Σ 22 ) , Λ = Σ − 1 = ( Λ 11 Λ 12 Λ 21 Λ 22 )
\boldsymbol{\mu } =\left(\begin{array}{ c }
\boldsymbol{\mu }_{1}\\
\boldsymbol{\mu }_{2}
\end{array}\right) ,\boldsymbol{\Sigma } =\left(\begin{array}{ c c }
\boldsymbol{\Sigma }_{11} & \boldsymbol{\Sigma }_{12}\\
\boldsymbol{\Sigma }_{21} & \mathbf{\Sigma }_{22}
\end{array}\right) ,\boldsymbol{\Lambda } =\mathbf{\Sigma }^{-1} =\left(\begin{array}{ c c }
\boldsymbol{\Lambda }_{11} & \mathbf{\Lambda }_{12}\\
\boldsymbol{\Lambda }_{21} & \mathbf{\Lambda }_{22}
\end{array}\right)
μ = ( μ 1 μ 2 ) , Σ = ( Σ 1 1 Σ 2 1 Σ 1 2 Σ 2 2 ) , Λ = Σ − 1 = ( Λ 1 1 Λ 2 1 Λ 1 2 Λ 2 2 )
那麼他們的邊緣分佈爲
p ( x 1 ) = N ( x 1 ∣ μ 1 , Σ 11 ) p ( x 2 ) = N ( x 2 ∣ μ 2 , Σ 22 )
\begin{aligned}
p(\mathbf{x}_{1}) & =\mathcal{N}(\mathbf{x}_{1} |\boldsymbol{\mu }_{1} ,\mathbf{\Sigma }_{11})\\
p(\mathbf{x}_{2}) & =\mathcal{N}(\mathbf{x}_{2} |\boldsymbol{\mu }_{2} ,\boldsymbol{\Sigma }_{22})
\end{aligned}
p ( x 1 ) p ( x 2 ) = N ( x 1 ∣ μ 1 , Σ 1 1 ) = N ( x 2 ∣ μ 2 , Σ 2 2 )
以及其條件分佈爲:
p ( x 1 ∣ x 2 ) = N ( x 1 ∣ μ 1 ∣ 2 , Σ 1 ∣ 2 ) μ 1 ∣ 2 = μ 1 + Σ 12 Σ 22 − 1 ( x 2 − μ 2 ) = μ 1 − Λ 11 − 1 Λ 12 ( x 2 − μ 2 ) = Σ 1 ∣ 2 ( Λ 11 μ 1 − Λ 12 ( x 2 − μ 2 ) ) Σ 1 ∣ 2 = Σ 11 − Σ 12 Σ 22 − 1 Σ 21
\begin{aligned}
p(\mathbf{x}_{1} |\mathbf{x}_{2}) & =\mathcal{N}(\mathbf{x}_{1} |\boldsymbol{\mu }_{1|2} ,\boldsymbol{\Sigma }_{1|2})\\
\boldsymbol{\mu }_{1|2} & =\boldsymbol{\mu }_{1} +\boldsymbol{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2})\\
& =\boldsymbol{\mu }_{1} -\boldsymbol{\Lambda }^{-1}_{11}\boldsymbol{\Lambda }_{12}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2})\\
& =\boldsymbol{\Sigma }_{1|2}(\boldsymbol{\Lambda }_{11}\boldsymbol{\mu }_{1} -\boldsymbol{\Lambda }_{12}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2}))\\
\boldsymbol{\Sigma }_{1|2} & =\boldsymbol{\Sigma }_{11} -\boldsymbol{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}\boldsymbol{\Sigma }_{21}
\end{aligned}
p ( x 1 ∣ x 2 ) μ 1 ∣ 2 Σ 1 ∣ 2 = N ( x 1 ∣ μ 1 ∣ 2 , Σ 1 ∣ 2 ) = μ 1 + Σ 1 2 Σ 2 2 − 1 ( x 2 − μ 2 ) = μ 1 − Λ 1 1 − 1 Λ 1 2 ( x 2 − μ 2 ) = Σ 1 ∣ 2 ( Λ 1 1 μ 1 − Λ 1 2 ( x 2 − μ 2 ) ) = Σ 1 1 − Σ 1 2 Σ 2 2 − 1 Σ 2 1
以上條件分佈非常重要!
這個東西是怎麼推出來的呢?他的推導比較直接,那就是利用概率分解:
p ( x 1 , x 2 ) = p ( x 1 ∣ x 2 ) p ( x 2 )
p(\mathbf{x}_{1} ,\mathbf{x}_{2}) =p(\mathbf{x}_{1} |\mathbf{x}_{2}) p(\mathbf{x}_{2})
p ( x 1 , x 2 ) = p ( x 1 ∣ x 2 ) p ( x 2 )
只要我們把p ( x 1 , x 2 ) \displaystyle p(\mathbf{x}_{1} ,\mathbf{x}_{2}) p ( x 1 , x 2 ) 和p ( x 2 ) \displaystyle p(\mathbf{x}_{2}) p ( x 2 ) 都寫出來,自然就知道p ( x 1 ∣ x 2 ) \displaystyle p(\mathbf{x}_{1} |\mathbf{x}_{2}) p ( x 1 ∣ x 2 ) 是什麼了。首先對於p ( x 1 , x 2 ) \displaystyle p(\mathbf{x}_{1} ,\mathbf{x}_{2}) p ( x 1 , x 2 ) 的分佈如下:
E = exp { − 1 2 ( x 1 − μ 1 x 2 − μ 2 ) T ( Σ 11 Σ 12 Σ 21 Σ 22 ) − 1 ( x 1 − μ 1 x 2 − μ 2 ) }
E=\exp\left\{-\frac{1}{2}\left(\begin{array}{ c }
\mathbf{x}_{1} -\boldsymbol{\mu }_{1}\\
\mathbf{x}_{2} -\boldsymbol{\mu }_{2}
\end{array}\right)^{T}\left(\begin{array}{ c c }
\boldsymbol{\Sigma }_{11} & \boldsymbol{\Sigma }_{12}\\
\boldsymbol{\Sigma }_{21} & \boldsymbol{\Sigma }_{22}
\end{array}\right)^{-1}\left(\begin{array}{ c }
\mathbf{x}_{1} -\boldsymbol{\mu }_{1}\\
\mathbf{x}_{2} -\boldsymbol{\mu }_{2}
\end{array}\right)\right\}
E = exp { − 2 1 ( x 1 − μ 1 x 2 − μ 2 ) T ( Σ 1 1 Σ 2 1 Σ 1 2 Σ 2 2 ) − 1 ( x 1 − μ 1 x 2 − μ 2 ) }
於是我們用公式將逆矩陣展開得到:
E = exp { − 1 2 ( x 1 − μ 1 x 2 − μ 2 ) T ( I 0 Σ 22 − 1 Σ 21 I ) ( ( Σ / Σ 22 ) − 1 0 0 Σ 22 − 1 ) × ( I − Σ 12 Σ 22 − 1 0 I ) ( x 1 − μ 1 x 2 − μ 2 ) } = exp { − 1 2 ( x 1 − μ 1 − Σ 12 Σ 22 − 1 ( x 2 − μ 2 ) ) ⏟ μ 1 ∣ 2 T ( Σ / Σ 22 ⏟ Σ 1 ∣ 2 ) − 1 ( x 1 − μ 1 − Σ 12 Σ 22 − 1 ( x 2 − μ 2 ) ) ⏟ μ 1 ∣ 2 } × exp { − 1 2 ( x 2 − μ 2 ) T Σ 22 − 1 ( x 2 − μ 2 ) } = p ( x 1 ∣ x 2 ) p ( x 2 )
\begin{aligned}
E= & \exp\{-\frac{1}{2}\left(\begin{array}{ c }
\mathbf{x}_{1} -\boldsymbol{\mu }_{1}\\
\mathbf{x}_{2} -\boldsymbol{\mu }_{2}
\end{array}\right)^{T}\left(\begin{array}{ c c }
\mathbf{I} & \mathbf{0}\\
\mathbf{\Sigma }^{-1}_{22}\boldsymbol{\Sigma }_{21} & \mathbf{I}
\end{array}\right)\left(\begin{array}{ c c }
(\boldsymbol{\Sigma } /\boldsymbol{\Sigma }_{22})^{-1} & \mathbf{0}\\
\mathbf{0} & \mathbf{\Sigma }^{-1}_{22}
\end{array}\right)\\
& \times \left(\begin{array}{ c c }
\mathbf{I} & -\boldsymbol{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}\\
\mathbf{0} & \mathbf{I}
\end{array}\right)\left(\begin{array}{ l }
\mathbf{x}_{1} -\boldsymbol{\mu }_{1}\\
\mathbf{x}_{2} -\boldsymbol{\mu }_{2}
\end{array}\right)\}\\
= & \exp\left\{-\frac{1}{2}\underbrace{\left(\mathbf{x}_{1} -\boldsymbol{\mu }_{1} -\boldsymbol{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2})\right)}_{\boldsymbol{\mu }_{1|2}}^{T}\left(\underbrace{\boldsymbol{\Sigma } /\mathbf{\Sigma }_{22}}_{\mathbf{\Sigma }_{1|2}}\right)^{-1}\underbrace{\left(\mathbf{x}_{1} -\boldsymbol{\mu }_{1} -\mathbf{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2})\right)}_{\boldsymbol{\mu }_{1|2}}\right\}\\
& \times \exp\left\{-\frac{1}{2}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2})^{T}\boldsymbol{\Sigma }^{-1}_{22}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2})\right\}\\
= & p(\mathbf{x}_{1} |\mathbf{x}_{2}) p(\mathbf{x}_{2})
\end{aligned}
E = = = exp { − 2 1 ( x 1 − μ 1 x 2 − μ 2 ) T ( I Σ 2 2 − 1 Σ 2 1 0 I ) ( ( Σ / Σ 2 2 ) − 1 0 0 Σ 2 2 − 1 ) × ( I 0 − Σ 1 2 Σ 2 2 − 1 I ) ( x 1 − μ 1 x 2 − μ 2 ) } exp ⎩ ⎪ ⎨ ⎪ ⎧ − 2 1 T ( x 1 − μ 1 − Σ 1 2 Σ 2 2 − 1 ( x 2 − μ 2 ) ) ⎝ ⎜ ⎛ Σ 1 ∣ 2 Σ / Σ 2 2 ⎠ ⎟ ⎞ − 1 μ 1 ∣ 2 ( x 1 − μ 1 − Σ 1 2 Σ 2 2 − 1 ( x 2 − μ 2 ) ) ⎭ ⎪ ⎬ ⎪ ⎫ × exp { − 2 1 ( x 2 − μ 2 ) T Σ 2 2 − 1 ( x 2 − μ 2 ) } p ( x 1 ∣ x 2 ) p ( x 2 )
最終我們發現,條件高斯分佈的期望和方差分別就是
μ 1 ∣ 2 = μ 1 + Σ 12 Σ 22 − 1 ( x 2 − μ 2 ) Σ 1 ∣ 2 = Σ / Σ 22 = Σ 11 − Σ 12 Σ 22 − 1 Σ 21
\begin{aligned}
\boldsymbol{\mu }_{1|2} & =\boldsymbol{\mu }_{1} +\boldsymbol{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}(\mathbf{x}_{2} -\boldsymbol{\mu }_{2})\\
\boldsymbol{\Sigma }_{1|2} & =\boldsymbol{\Sigma } /\boldsymbol{\Sigma }_{22} =\boldsymbol{\Sigma }_{11} -\boldsymbol{\Sigma }_{12}\boldsymbol{\Sigma }^{-1}_{22}\boldsymbol{\Sigma }_{21}
\end{aligned}
μ 1 ∣ 2 Σ 1 ∣ 2 = μ 1 + Σ 1 2 Σ 2 2 − 1 ( x 2 − μ 2 ) = Σ / Σ 2 2 = Σ 1 1 − Σ 1 2 Σ 2 2 − 1 Σ 2 1
參考資料
文本思路主要參考了徐亦達老師的課程:徐亦達 卡爾曼濾波
https://en.wikipedia.org/wiki/Kalman_filter