Training neural networks with backpropagation
Calculating the activation of a neuron, the forward part, or what we call feed-forward propagation, is quite straightforward to process. The complexity we encounter now is training the errors back through the network. When we train the network now, we start at the last output layer and determine the total error, just as we did with a single perceptron, but now we need to sum up all errors across the output layer. Then we need to use this value to backpropagate the error back through the network, updating each of the weights based on their contribution to the total error. Understanding the contribution of a single weight in a network with thousands or millions of weights could be quite complicated, except thankfully for the help of differentiation and the chain rule. Before we get to the complicated math, we first need to discuss the Cost function and how we calculate errors in the next section.