Kalman filter is a classic state estimation technique that has found application in many places. In this simple tutorial, I will try to explain Kalman filter in an intuitive way. This is the most basic introduction to the Kalman filter and basically how I learned it. Before getting to the Kalman filter, I will first review some basic materials that we need.

Prerequisite

Let $x_{i}$ be a random variable that has a probability density function $p_{i} (x)$ whose mean and variance are $μ_{i}$ and $σ_{i}^{2}$ . We write $x_{i} \sim p_{i} (μ_{i}, σ_{i}^{2})$ .

Assuming a set of pairwise uncorrelated random variables $x_{1} \sim p_{1} (μ_{1}, σ_{1}^{2}), \dots x_{n} \sim p_{n} (μ_{n}, σ_{n}^{2})$ , if $y$ is a random variable where $y = \sum_{i = 1}^{n} α_{i} x_{i}$ , then the mean and variance of $y$ are

μ_{y} = \sum_{i = 1}^{n} α_{i} μ_{i}

σ_{y}^{2} = \sum_{i = 1}^{n} α_{i} σ_{i}^{2}

Fusing two variables

Now, imagine that we want to measure a variable $y$ , we have two totally different devices where they use different methods, one is based on an old method for example and its results are reported with $x_{1} \sim p_{1} (μ_{1}, σ_{1}^{2})$ , and one that uses a new method and its the results are reported with $x_{2} \sim p_{2} (μ_{2}, σ_{2}^{2})$ . Now the question is how to combine these two different measurements to create an optimal estimator for $y$ . The simplest way is to combine these results linearly as $y = α x_{1} + β x_{2}$ . A reasonable requirement is that if the two estimates $x_{1}$ and $x_{2}$ are giving the same result, then this linear combination should give out that same result. This implies that $α + β = 1$ . So our linear estimator so far becomes

y_{α} (x_{1}, x_{2}) = α x_{1} + (1 - α) x_{2}

But what value should we pick for $α$ ? One reasonable way is to say that the optimal value of $α$ minimizes the variance of $y_{α}$ . The variance of $y_{α}$ is

σ_{y}^{2} = α^{2} σ_{1}^{2} + (1 - α)^{2} σ_{2}^{2}

\frac{d}{d α} σ_{y}^{2} = 2 α σ_{1}^{2} - 2 (1 - α) σ_{2}^{2} = 0 \to α = \frac{σ_{2}^{2}}{σ_{1}^{2} + σ_{2}^{2}}

Since the second derivative is positive then this value of $α$ minimizes the variance. The estimator then becomes

y (x_{1}, x_{2}) = \frac{σ_{2}^{2}}{σ_{1}^{2} + σ_{2}^{2}} x_{1} + \frac{σ_{1}^{2}}{σ_{1}^{2} + σ_{2}^{2}} x_{2}

y (x_{1}, x_{2}) = \frac{1 / σ_{1}^{2}}{1 / σ_{1}^{2} + 1 / σ_{2}^{2}} x_{1} + \frac{1 / σ_{2}^{2}}{1 / σ_{1}^{2} + 1 / σ_{2}^{2}} x_{2}, σ_{y}^{2} = \frac{1}{1 / σ_{2}^{1} + 1 / σ_{2}^{2}}

Fusing multiple variables

The above argument can be extended for multiple scalar estimates. Let $x_{i} \sim p_{i} (μ_{i}, σ_{i}^{2})$ be a set of pairwise uncorrelated random variables. Consider unbiased linear estimator $y = \sum_{i = 1}^{n} α_{i} x_{i}$ . Using Lagrange multipliers, we have

f (α_{1}, \dots, α_{n}) = \sum_{i = 1}^{n} α_{i}^{2} σ_{i}^{2} + λ (\sum_{i = 1}^{n} α_{i} - 1)

where $λ$ is the Lagrange multiplier. Taking the derivative with respect to $α_{j}$ we find that $α_{1} σ_{1}^{2} = α_{2} σ_{2}^{2} = \dots = - λ / 2$ . Since $\sum α_{i} = 1$ , then we can find that

α_{i} = \frac{\frac{1}{σ_{i}^{2}}}{\sum_{i = 1}^{n} \frac{1}{σ_{i}^{2}}}

where the variance $σ_{y}$ is

σ_{y} = \frac{1}{\sum_{i = 1}^{n} \frac{1}{σ_{i}^{2}}}

Vector estimates

Now let’s expand the same result to the vectors of random variables. Let $x_{1} \sim p_{1} (μ_{1}, Σ_{1}), \dots, x_{n} \sim p_{n} (μ_{n}, Σ_{n})$ be a set of pairwise uncorrelated random variables of length $m$ . If random variable $y$ is a linear combination of these random variables as $y = \sum_{i = 1}^{n} A_{i} x_{i}$ , then the mean and covariance of $y$ is obtianed as

μ_{y} = \sum_{i = 1}^{n} A_{i} μ_{i}

Σ_{y y} = \sum_{i = 1}^{n} A_{i} Σ_{i} A_{i}^{⊤}

Fusing multiple vector estimates

Imagine the linear estimator as

y (x_{1}, \dots, x_{n}) = \sum_{i = 1}^{n} A_{i} x_{i}, \sum A_{i} = I

Similarly, we intend to minimize $E [(y - μ)^{⊤} (y - μ)]$ . We define the following optimization problem using Lagrangian multipliers

f (A_{1}, \dots, A_{n}) = E [\sum_{i = 1}^{n} (x_{i} - μ_{i})^{⊤} A_{i}^{⊤} A_{i} (x_{i} - μ_{i})] + ⟨ Λ, A_{i} - I ⟩

where the second term is the Lagrangian multipliers and $⟨ Λ, A_{i} - I ⟩ = tr [Λ^{⊤} (A_{i} - I)]$ . Taking derivative of $f$ with respect to $A_{i}$ and setting each derivative to zero to find the optimal values of $A_{i}$ gives us

E [2 A_{i} (x_{i} - μ_{i}) (x_{i} - μ_{i})^{⊤} + Λ] = 0

2 A_{i} Σ_{i} + Λ = 0 \to A_{1} Σ_{1} = A_{2} Σ_{2} = \dots = A_{n} Σ_{n} = \frac{- Λ}{2}

Using the fact that $\sum A_{i} = I$ ,

A_{i} = {(\sum_{i = 1}^{n} Σ_{j}^{- 1})}^{- 1} Σ_{i}^{- 1}

Therefore the optimal estimator becomes

y = {(\sum_{i = 1}^{n} Σ_{j}^{- 1})}^{- 1} \sum_{i = 1}^{n} Σ_{i}^{- 1} x_{i}, Σ_{y y} = {(\sum_{i = 1}^{n} Σ_{j}^{- 1})}^{- 1}

Special case of $n = 2$

Let $x_{1} \sim p_{1} (μ_{1}, Σ_{1})$ , and $x_{2} \sim p_{2} (μ_{2}, Σ_{2})$ , then we have

K = Σ_{1} {(Σ_{1} + Σ_{2})}^{- 1}

y = x_{1} + K (x_{2} - x_{1}), Σ_{y y} = (I - K) Σ_{1}

In order to prove the above relation, we start from the relation we obtained above, i.e.

y = {(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1} (Σ_{1}^{- 1} x_{1} + Σ_{2}^{- 1} x_{2})

Note that the following matrix identity holds true $(A^{- 1} + B^{- 1})^{- 1} = A (A + B)^{- 1} B = B (A + B)^{- 1} A$

y = Σ_{2} {(Σ_{1} + Σ_{2})}^{- 1} Σ_{1} Σ_{1}^{- 1} x_{1} + Σ_{1} {(Σ_{1} + Σ_{2})}^{- 1} Σ_{2} Σ_{2}^{- 1} x_{2}

y = Σ_{2} {(Σ_{1} + Σ_{2})}^{- 1} x_{1} + Σ_{1} {(Σ_{1} + Σ_{2})}^{- 1} x_{2}

We add and subtract $Σ_{1} {(Σ_{1} + Σ_{2})}^{- 1} x_{1}$ to the above equation to obtain

y = Σ_{2} {(Σ_{1} + Σ_{2})}^{- 1} x_{1} + Σ_{1} {(Σ_{1} + Σ_{2})}^{- 1} x_{2} + Σ_{1} {(Σ_{1} + Σ_{2})}^{- 1} x_{1} - Σ_{1} {(Σ_{1} + Σ_{2})}^{- 1} x_{1}

y = x_{1} + Σ_{1} {(Σ_{1} + Σ_{2})}^{- 1} (x_{2} - x_{1})

Similarly for the covariance matrix we have

Σ_{y y} = {(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1} = Σ_{1} {(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1} Σ_{2}

We add and subtract the term ${(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1} Σ_{1}$ to the above equatio to obtain

Σ_{y y} = Σ_{1} {(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1} Σ_{2} + Σ_{1} {(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1} Σ_{1} - Σ_{1} {(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1} Σ_{1}

Σ_{y y} = Σ_{1} - Σ_{1} {(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1} Σ_{1}

Σ_{y y} = (I - Σ_{1} {(Σ_{1}^{- 1} + Σ_{2}^{- 1})}^{- 1}) Σ_{1} = (I - K) Σ_{1}

Best Linear Unbiased Estimator

Let $(\begin{matrix} x \\ y \end{matrix}) \sim p ((\begin{matrix} μ_{x} \\ μ_{y} \end{matrix}), (\begin{matrix} Σ_{x x} & Σ_{x y} \\ Σ_{y x} & Σ_{y y} \end{matrix}))$ . The estimator $\hat{y} = A x + b$ for estimating values of $y$ for a given $x$ is

A = Σ_{y x} Σ_{x x}^{- 1}

b = μ_{y} - A μ_{x}

Kalman Filter for a linear system

Now that we know all the ingredients we can discuss the Kalman filter. Assume a linear dynamical system where

x_{k} = F_{k} x_{k - 1} + B_{k} u_{k} + w_{k}

where $F_{k}$ is the state transition model applied to the previous state $x_{k - 1},$ and $B_{k}$ is the control input model applied to the control vector $u_{k}$ , and $w_{k}$ is the process noise assumed to be drawn from a multivariate normal distribution with $N (0, Q)$ where $Q$ is the covariance matrix. At time $k$ , we do an observation (or measurement) $z_{k}$ of the true state $x_{k}$ according to the

x_{k} = H_{k} x_{k} + v_{k}

where $H_{k}$ is the observation model, and $v_{k}$ is the observation noise drawn from Gaussian noise $N (0, R_{k})$ where $R_{k}$ is the covariance matrix.

First let’s assume that $H_{k} = I$ where we fully observe the state. Given an estimate that we have at time $t - 1$ based on all the observations we had as ${\hat{x}}_{t - 1 | t - 1}$ , we make a prediction for ${\hat{x}}_{t | t - 1}$ based on the dynamical system equation as

{\hat{x}}_{t | t - 1} = F_{t} {\hat{x}}_{t - 1 | t - 1} + B_{t} u_{t}

Next the variance can also be estimated as

Σ_{t | t - 1} = F_{t} Σ_{t - 1 | t - 1} F_{t}^{⊤} + Q_{t}

Given these predictions for that state at time $t$ , we also make an observation as $z_{t} = x_{t}$ where the covariance matrix is $R_{t}$ .

Now our goal is to combine these results to correct our estimate of $x_{t | t}$ . We use the derivation that we did above to combine these results based on their covariance matrix such that the covariance is minimized, we have

K_{t} = Σ_{t | t - 1} {(Σ_{t | t - 1} + R_{t})}^{- 1}

{\hat{x}}_{t | t} = {\hat{x}}_{t | t - 1} + K_{t} (z_{t} - {\hat{x}}_{t | t - 1})

Σ_{t | t} = (I - K_{t}) Σ_{t | t - 1}

Now let’s imagine what happens if we only do a partial observation of the state or $H_{k} \neq I$ . In this case, we do the prediction as before, but in the step that we want to combine the results to correct the prediction, we need to make some changes since we only have partial parts of $x_{t}$ . In such a case, we used the best linear estimator that we introduced earlier to construct the full $x_{t}$ and then use that to update the prediction.

The estimation with partial observation becomes

H_{t} {\hat{x}}_{t | t} = H_{t} {\hat{x}}_{t | t - 1} + H_{t} Σ_{t | t - 1} {(Σ_{t | t - 1} + R_{t})}^{- 1} H_{t}^{⊤} (z_{t} - H_{t} {\hat{x}}_{t | t - 1})

We can define

K_{t} = Σ_{t | t - 1} {(Σ_{t | t - 1} + R_{t})}^{- 1} H_{t}^{⊤}

and the observable simplifies to

H_{t} {\hat{x}}_{t | t} = H_{t} {\hat{x}}_{t | t - 1} + H_{t} K_{t} (z_{t} - H_{t} {\hat{x}}_{t | t - 1})

The rest of the variables (hidden states) can be obtained using $C_{t} {\hat{x}}_{t | t - 1}$ where $(\begin{matrix} H_{t} \\ C_{t} \end{matrix})$ becomes an invertible matrix. The simplest example is to have it be equal to the identity matrix. The covariance between $C_{t} {\hat{x}}_{t | t - 1}$ and the observable $H_{t} {\hat{x}}_{t | t - 1}$ is $C_{t} Σ_{t | t - 1} H_{t}^{⊤}$ . Using the best linear estimate estimator, we can find the hidden portion estimation as

C_{t} {\hat{x}}_{t | t} = C_{t} {\hat{x}}_{t | t - 1} + (C_{t} Σ_{t | t - 1} H_{t}^{⊤}) {(H_{t} Σ_{t | t - 1} H_{t}^{⊤})}^{- 1} H_{t} K_{t} (z_{t} - H_{t} {\hat{x}}_{t | t - 1})

C_{t} {\hat{x}}_{t | t} = C_{t} {\hat{x}}_{t | t - 1} + C_{t} K_{t} (z_{t} - H_{t} {\hat{x}}_{t | t - 1})

Combining the above two results we find that

(\begin{matrix} H_{t} \\ C_{t} \end{matrix}) {\hat{x}}_{t | t} = (\begin{matrix} H_{t} \\ C_{t} \end{matrix}) {\hat{x}}_{t | t - 1} + (\begin{matrix} H_{t} \\ C_{t} \end{matrix}) K_{t} (z_{t} - H_{t} {\hat{x}}_{t | t - 1})

Since $(\begin{matrix} H_{t} \\ C_{t} \end{matrix})$ is an invertible matrix, it can be removed from both sides, and we obtain

{\hat{x}}_{t | t} = {\hat{x}}_{t | t - 1} + K_{t} (z_{t} - H_{t} {\hat{x}}_{t | t - 1})

Note that the covariance matrix can be obtained using the above equation as

Σ_{t | t} = (I - K_{t} H_{t}) Σ_{t | t - 1} {(I - K_{t} H_{t})}^{⊤} + K_{t} R_{t} K_{t}^{⊤}