Gradient Descent
MIni-Batch Stochastic Gradient Descent:
-
- Prevents local minima, computationally efficient, and a smoother convergence.
- Done by randomly splitting dataset into mini-batches.
- Then possible to update the weights for each sample.
Remember to plot you training with LOSS PLOTS
- y-axis: average loss
- x-axis: number of trainings steps or number of batches
Task 3: Softmax multi-class classification
- Now classify all 10 classes
- 28 * 28 input nodes, 10 output nodes
Softmax activation function for multi class classification
Dataset splitting
- Training
- Validation (during)
- Test (after)
Overfitting signs
- Large gap between training loss and validation loss
- How to prevent? methods:
- Early stopping
- ..
- .. Task 4: L2 regularization:
- Prevents overfitting on training data by reducing model complexity