Skip to content

Latest commit

 

History

History
62 lines (42 loc) · 3.01 KB

File metadata and controls

62 lines (42 loc) · 3.01 KB

Ensemble

Different classifiers make up for one another's weaknesses.

Core idea is to handle a set of predictors instead of a single one.

When ensembling, you want to bring together many dissimilar models: it doesn't help much to build a parliament of very similarly thinking people. It would end up in unanimous voting most of the time.

If you do ensembling of neural networks trained from different random initializations, then you will get little performance increase due to low model diversity:

  • because the hypothesis space is the same
  • if SGD is good enough, it will reach the same global minimum.

Ensembling is interesting when combining different model architectures that are varied in their outputs and their inductive biases.

Bagging (Bootstrap aggregating)

  1. sample a subset $P$ of the dataset $S$ uniformly with replacement (bootstrapping)
  2. train a predictor on $P$
  3. Go back to 1. and train $N$ predictors
  4. merge predictions (aggregating)

Bagging reduces variance in the predictions, thanks to the averaging of the predictions of the individual predictors.

Individual predictors can be trained in parallel!

Boosting

Iterative process:

  1. set equal weights to all samples in dataset $S$ (each sample of a dataset has a weight reflecting the importance given to that sample)
  2. train a predictor on $S$ and save it
  3. evaluate the predictor on $S$ and compute the error rate $e$
  4. if a sample has been wrongly classified, increase the weight of this sample
  5. train a second predictor with the new weights
  6. repeat N times
  7. make a weighted averaging of the individual predictions. The weight of each predictor is proportional to its accuracy on the weighted training set.

Boosting aims at reducing the bias of a large number of "small" models with low variance but high bias. They are "weak" models.

Infographic

PDF doc

TTA: Test Time Augmentation

For each individual model, compute the predictions across various augmented versions of the input images and average them.

More