Backpropagation in Neural Networks
Backpropagation of Gradients
Layer ( l) values in the neural network after applying activation function is stored in a column vecotr ( {a}^l). The subscript represent the layer number. The connections are stored in a weight matrix ( W^l ), and the bias column vector is assumed to be ( b^l). The forward propagation is then obtained as
</p>
We introduce a new vector
To simplify the procedure, we assume a simple 3 layer network as

In this network, we have
- Input:
- Layer1:
and - Layer2:
and
where the dimensions are
We start by finding the derivatives of the weights w.r.t
Notice that the dimensions work out perfectly as
Furthermore, we need to calculate the derivatives with respect to
Now, we take the derivatives with respect to
To summarize we have
In terms of dimensions we have
Similarly taking the derivatives w.r.t the bias vector
To summarize, we obtained the following result
Note that if we increase the number of layers, a similar pattern shows up.