Kalman filter is a classic state estimation technique that has found application in many places. In this simple tutorial, I will try to explain Kalman filter in an intuitive way. This is the most basic introduction to the Kalman filter and basically how I learned it. Before getting to the Kalman filter, I will first review some basic materials that we need.
Prerequisite
Let be a random variable that has a probability density function whose mean and variance are and . We write .
Assuming a set of pairwise uncorrelated random variables , if is a random variable where , then the mean and variance of are
Fusing two variables
Now, imagine that we want to measure a variable , we have two totally different devices where they use different methods, one is based on an old method for example and its results are reported with , and one that uses a new method and its the results are reported with . Now the question is how to combine these two different measurements to create an optimal estimator for . The simplest way is to combine these results linearly as . A reasonable requirement is that if the two estimates and are giving the same result, then this linear combination should give out that same result. This implies that . So our linear estimator so far becomes
But what value should we pick for ? One reasonable way is to say that the optimal value of minimizes the variance of . The variance of is
Since the second derivative is positive then this value of minimizes the variance. The estimator then becomes
Fusing multiple variables
The above argument can be extended for multiple scalar estimates. Let be a set of pairwise uncorrelated random variables. Consider unbiased linear estimator . Using Lagrange multipliers, we have
where is the Lagrange multiplier. Taking the derivative with respect to we find that . Since , then we can find that
where the variance is
Vector estimates
Now let’s expand the same result to the vectors of random variables. Let be a set of pairwise uncorrelated random variables of length . If random variable is a linear combination of these random variables as , then the mean and covariance of is obtianed as
Fusing multiple vector estimates
Imagine the linear estimator as
Similarly, we intend to minimize . We define the following optimization problem using Lagrangian multipliers
where the second term is the Lagrangian multipliers and . Taking derivative of with respect to and setting each derivative to zero to find the optimal values of gives us
Using the fact that ,
Therefore the optimal estimator becomes
Special case of
Let , and , then we have
In order to prove the above relation, we start from the relation we obtained above, i.e.
Note that the following matrix identity holds true
We add and subtract to the above equation to obtain
Similarly for the covariance matrix we have
We add and subtract the term to the above equatio to obtain
Best Linear Unbiased Estimator
Let . The estimator for estimating values of for a given is
Kalman Filter for a linear system
Now that we know all the ingredients we can discuss the Kalman filter. Assume a linear dynamical system where
where is the state transition model applied to the previous state and is the control input model applied to the control vector , and is the process noise assumed to be drawn from a multivariate normal distribution with where is the covariance matrix. At time , we do an observation (or measurement) of the true state according to the
where is the observation model, and is the observation noise drawn from Gaussian noise where is the covariance matrix.
First let’s assume that where we fully observe the state. Given an estimate
that we have at time based on all the observations we had as , we make a prediction for based on the dynamical system equation as
Next the variance can also be estimated as
Given these predictions for that state at time , we also make an observation as where the covariance matrix is .
Now our goal is to combine these results to correct our estimate of
. We use the derivation that we did above to combine these results based on their covariance matrix such that the covariance is minimized, we have
Now let’s imagine what happens if we only do a partial observation of the state or . In this case, we do the prediction as before, but in the step that we want to combine the results to correct the prediction, we need to make some changes since we only have partial parts of . In such a case, we used the best linear estimator that we introduced earlier to construct the full and then use that to update the prediction.
The estimation with partial observation becomes
We can define
and the observable simplifies to
The rest of the variables (hidden states) can be obtained using
where
becomes an invertible matrix. The simplest example is to have it be equal to the identity matrix. The covariance between
and the observable
is
. Using the best linear estimate estimator, we can find the hidden portion estimation as
Combining the above two results we find that
Since
is an invertible matrix, it can be removed from both sides, and we obtain
Note that the covariance matrix can be obtained using the above equation as