Gradient Descent

MIni-Batch Stochastic Gradient Descent:

    • Prevents local minima, computationally efficient, and a smoother convergence.
  • Done by randomly splitting dataset into mini-batches.
    • Then possible to update the weights for each sample.

Remember to plot you training with LOSS PLOTS

  • y-axis: average loss
  • x-axis: number of trainings steps or number of batches

Task 3: Softmax multi-class classification

  • Now classify all 10 classes
    • 28 * 28 input nodes, 10 output nodes

Softmax activation function for multi class classification

Dataset splitting

  1. Training
  2. Validation (during)
  3. Test (after)

Overfitting signs

  • Large gap between training loss and validation loss
  • How to prevent? methods:
    • Early stopping
    • ..
    • .. Task 4: L2 regularization:
  • Prevents overfitting on training data by reducing model complexity