We do not train at all, we simply group a bunch of train data, and when we see a new data point, we look at the “distance” between our train data and the new data point. However, this requires a ton of processing. Therefore, one can use a subset of the training data (even though this is not always representative).