Key Notes

Training

Minimize mean square error until convergence using gradient descent

How it is written in the book:

Or more familiar

And applying the bias trick


We write

Where is the ground truth (label).

We also need to squish the results between and Sigmoid (which is an activation function)


No explicit solution of the minima of MSE exists. We therefore use a gradient based technique called Gradient Descent.

From page=79 we know that

Where

Note that means elementwise multiplication.

Combining these results with Gradient Descent, we get

Where

    • is the ground truth
  • is the iteration number
  • is the step factor