The Iris Task

Key Notes

The problem is almost a Linear Separable Problem
- We can design an (almost) error free Linear Classifier

Minimize mean square error until convergence using gradient descent

How it is written in the book:

g = [W w_{0}] [x^{T} 1]

Or more familiar

z = W x + b

And applying the bias trick

z = [W b] [x^{T} 1]

We write

MSE = \frac{1}{2} k = 1 \sum N (g_{k} - t_{k})^{T} (g_{k} - t_{k})

Where $t_{k}$ is the ground truth (label).

We also need to squish the results between $0$ and $1$ → Sigmoid (which is an activation function)

No explicit solution of the minima of MSE exists. We therefore use a gradient based technique called Gradient Descent.

From page=79 we know that

\nabla_{W} MSE = k = 1 \sum N \nabla_{g_{k}} MSE \nabla_{z_{k}} g_{k} \nabla_{W} z_{k}

Where

\nabla_{g_{k}} MSE \nabla_{z_{k}} g \nabla_{W} z_{k} = g_{k} - t_{k} = g_{k} \circ (1 - g_{k}) = x_{k}^{T}

Note that $\circ$ means elementwise multiplication.

Combining these results with Gradient Descent, we get

W (m) = W (m - 1) - α \nabla_{W} MSE

Where

$\nabla_{W} MSE = \sum_{k = 1}^{N} [(g_{k} - t_{k}) \circ g_{k} \circ (1 - g_{k})] x_{k}^{T}$
- $t_{k}$ is the ground truth
- $z_{k} = [W w_{0}] [x^{T} 1]$
- $g_{k} = s i g m o i d (z_{k})$
$m$ is the iteration number
$α$ is the step factor