上QQ阅读APP看书，第一时间看更新

Forward and backward passes

In the forward pass, a number of operations are performed to obtain some predictions or scores. In such an operation, a graph is created, connecting all dependent operations in a top-to-bottom fashion. Then the network's error is computed, which is the difference between the predicted output and the actual output.

On the other hand, the backward pass is involved mainly with mathematical operations, such as creating derivatives for all differential operations (that is auto-differentiation methods), top to bottom (for example, measuring the loss function to update the network weights), for all the operations in the graph, and then using them in chain rule.

In this pass, for all layers starting with the output layer back to the input layer, it shows the network layer's output with the correct input (error function). Then it adapts the weights in the current layer to minimize the error function. This is backpropagation's optimization step. By the way, there are two types of auto-differentiation methods:

Reverse mode: Derivation of a single output with respect to all inputs
Forward mode: Derivation of all outputs with respect to one input

The backpropagation algorithm processes the information in such a way that the network decreases the global error during the learning iterations; however, this does not guarantee that the global minimum is reached. The presence of hidden units and the nonlinearity of the output function mean that the behavior of the error is very complex and has many local minimas.

This backpropagation step is typically performed thousands or millions of times, using many training batches, until the model parameters converge to values that minimize the cost function. The training process ends when the error on the validation set begins to increase, because this could mark the beginning of a phase overfitting.