Kalman filter is a classic state estimation technique that has found application in many places. In this simple tutorial, I will try to explain Kalman filter in an intuitive way. This is the most basic introduction to the Kalman filter and basically how I learned it. Before getting to the Kalman filter, I will first review some basic materials that we need.


Prerequisite

Let xi be a random variable that has a probability density function pi(x) whose mean and variance are μi and σi2. We write xipi(μi,σi2).

Assuming a set of pairwise uncorrelated random variables x1p1(μ1,σ12),xnpn(μn,σn2), if y is a random variable where y=i=1nαixi, then the mean and variance of y are

μy=i=1nαiμi σy2=i=1nαiσi2


Fusing two variables

Now, imagine that we want to measure a variable y, we have two totally different devices where they use different methods, one is based on an old method for example and its results are reported with x1p1(μ1,σ12), and one that uses a new method and its the results are reported with x2p2(μ2,σ22). Now the question is how to combine these two different measurements to create an optimal estimator for y. The simplest way is to combine these results linearly as y=αx1+βx2. A reasonable requirement is that if the two estimates x1 and x2 are giving the same result, then this linear combination should give out that same result. This implies that α+β=1. So our linear estimator so far becomes

yα(x1,x2)=αx1+(1α)x2

But what value should we pick for α? One reasonable way is to say that the optimal value of α minimizes the variance of yα. The variance of yα is

σy2=α2σ12+(1α)2σ22 ddασy2=2ασ122(1α)σ22=0α=σ22σ12+σ22

Since the second derivative is positive then this value of α minimizes the variance. The estimator then becomes

y(x1,x2)=σ22σ12+σ22x1+σ12σ12+σ22x2 y(x1,x2)=1/σ121/σ12+1/σ22x1+1/σ221/σ12+1/σ22x2,σy2=11/σ21+1/σ22


Fusing multiple variables

The above argument can be extended for multiple scalar estimates. Let xipi(μi,σi2) be a set of pairwise uncorrelated random variables. Consider unbiased linear estimator y=i=1nαixi. Using Lagrange multipliers, we have

f(α1,,αn)=i=1nαi2σi2+λ(i=1nαi1)

where λ is the Lagrange multiplier. Taking the derivative with respect to αj we find that α1σ12=α2σ22==λ/2. Since αi=1, then we can find that

αi=1σi2i=1n1σi2

where the variance σy is

σy=1i=1n1σi2


Vector estimates

Now let’s expand the same result to the vectors of random variables. Let x1p1(μ1,Σ1),,xnpn(μn,Σn) be a set of pairwise uncorrelated random variables of length m. If random variable y is a linear combination of these random variables as y=i=1nAixi, then the mean and covariance of y is obtianed as

μy=i=1nAiμi Σyy=i=1nAiΣiAi


Fusing multiple vector estimates

Imagine the linear estimator as

y(x1,,xn)=i=1nAixi,Ai=I

Similarly, we intend to minimize E[(yμ)(yμ)] . We define the following optimization problem using Lagrangian multipliers

f(A1,,An)=E[i=1n(xiμi)AiAi(xiμi)]+Λ,AiI

where the second term is the Lagrangian multipliers and Λ,AiI=tr[Λ(AiI)]. Taking derivative of f with respect to Ai and setting each derivative to zero to find the optimal values of Ai gives us

E[2Ai(xiμi)(xiμi)+Λ]=0 2AiΣi+Λ=0A1Σ1=A2Σ2==AnΣn=Λ2

Using the fact that Ai=I,

Ai=(i=1nΣj1)1Σi1

Therefore the optimal estimator becomes

y=(i=1nΣj1)1i=1nΣi1xi,Σyy=(i=1nΣj1)1


Special case of n=2

Let x1p1(μ1,Σ1), and x2p2(μ2,Σ2), then we have

K=Σ1(Σ1+Σ2)1 y=x1+K(x2x1),Σyy=(IK)Σ1

In order to prove the above relation, we start from the relation we obtained above, i.e.

y=(Σ11+Σ21)1(Σ11x1+Σ21x2)

Note that the following matrix identity holds true (A1+B1)1=A(A+B)1B=B(A+B)1A

y=Σ2(Σ1+Σ2)1Σ1Σ11x1+Σ1(Σ1+Σ2)1Σ2Σ21x2 y=Σ2(Σ1+Σ2)1x1+Σ1(Σ1+Σ2)1x2

We add and subtract Σ1(Σ1+Σ2)1x1 to the above equation to obtain

y=Σ2(Σ1+Σ2)1x1+Σ1(Σ1+Σ2)1x2+Σ1(Σ1+Σ2)1x1Σ1(Σ1+Σ2)1x1 y=x1+Σ1(Σ1+Σ2)1(x2x1)

Similarly for the covariance matrix we have

Σyy=(Σ11+Σ21)1=Σ1(Σ11+Σ21)1Σ2

We add and subtract the term (Σ11+Σ21)1Σ1 to the above equatio to obtain

Σyy=Σ1(Σ11+Σ21)1Σ2+Σ1(Σ11+Σ21)1Σ1Σ1(Σ11+Σ21)1Σ1 Σyy=Σ1Σ1(Σ11+Σ21)1Σ1 Σyy=(IΣ1(Σ11+Σ21)1)Σ1=(IK)Σ1


Best Linear Unbiased Estimator

Let (xy)p((μxμy),(ΣxxΣxyΣyxΣyy)) . The estimator y^=Ax+b for estimating values of y for a given x is

A=ΣyxΣxx1 b=μyAμx

Kalman Filter for a linear system

Now that we know all the ingredients we can discuss the Kalman filter. Assume a linear dynamical system where

xk=Fkxk1+Bkuk+wk

where Fk is the state transition model applied to the previous state xk1, and Bk is the control input model applied to the control vector uk, and wk is the process noise assumed to be drawn from a multivariate normal distribution with N(0,Q) where Q is the covariance matrix. At time k, we do an observation (or measurement) zk of the true state xk according to the

xk=Hkxk+vk

where Hk is the observation model, and vk is the observation noise drawn from Gaussian noise N(0,Rk) where Rk is the covariance matrix.

First let’s assume that Hk=I where we fully observe the state. Given an estimate that we have at time t1 based on all the observations we had as x^t1|t1, we make a prediction for x^t|t1 based on the dynamical system equation as

x^t|t1=Ftx^t1|t1+Btut

Next the variance can also be estimated as

Σt|t1=FtΣt1|t1Ft+Qt

Given these predictions for that state at time t, we also make an observation as zt=xt where the covariance matrix is Rt.

Now our goal is to combine these results to correct our estimate of xt|t . We use the derivation that we did above to combine these results based on their covariance matrix such that the covariance is minimized, we have

Kt=Σt|t1(Σt|t1+Rt)1 x^t|t=x^t|t1+Kt(ztx^t|t1) Σt|t=(IKt)Σt|t1

Now let’s imagine what happens if we only do a partial observation of the state or HkI. In this case, we do the prediction as before, but in the step that we want to combine the results to correct the prediction, we need to make some changes since we only have partial parts of xt. In such a case, we used the best linear estimator that we introduced earlier to construct the full xt and then use that to update the prediction.

The estimation with partial observation becomes

Htx^t|t=Htx^t|t1+HtΣt|t1(Σt|t1+Rt)1Ht(ztHtx^t|t1)

We can define

Kt=Σt|t1(Σt|t1+Rt)1Ht

and the observable simplifies to

Htx^t|t=Htx^t|t1+HtKt(ztHtx^t|t1)

The rest of the variables (hidden states) can be obtained using Ctx^t|t1 where (HtCt) becomes an invertible matrix. The simplest example is to have it be equal to the identity matrix. The covariance between Ctx^t|t1 and the observable Htx^t|t1 is CtΣt|t1Ht . Using the best linear estimate estimator, we can find the hidden portion estimation as

Ctx^t|t=Ctx^t|t1+(CtΣt|t1Ht)(HtΣt|t1Ht)1HtKt(ztHtx^t|t1) Ctx^t|t=Ctx^t|t1+CtKt(ztHtx^t|t1)

Combining the above two results we find that

(HtCt)x^t|t=(HtCt)x^t|t1+(HtCt)Kt(ztHtx^t|t1)

Since (HtCt) is an invertible matrix, it can be removed from both sides, and we obtain

x^t|t=x^t|t1+Kt(ztHtx^t|t1)

Note that the covariance matrix can be obtained using the above equation as

Σt|t=(IKtHt)Σt|t1(IKtHt)+KtRtKt