Peter dev blog: Ensemble

* Use diversified classifier so that each result in independent of each other.

Voting
* Combine few algorithm. Predict the class with the highest votes.
* Law of large number.
** As the number increase, it will start closer and closer to the probability value it supposed to have.
* Soft Voting vs Hard Voting
** 0.3 0.3 0.9
** Hard Voting: No (0.3 win)
** Soft Voting: Yes (0.5 average)

Bagging and Pasting
* Instead of using different algorithm, use same algorithm as predictor
* train on different subset of training set.
* Bagging take the subset, but the instance can be the same (it has replacement)
* Pasting take the subset without the replacement
* It can be paralleled using different core.
* It is good because it scaled well.
* Also supports sampling of features as well.
** Sampling both feature and instances is called Random Patches
** Sampling feature only is called Random Subspaces

RandomTreeForest
* Bagging type
* Introduce randomness by searching best feature in each subset.
* Extremely Randomized Tree (Extra Tree) by choosing random threshold for each feature.
* Compare both normal RandomTree and ExtraTree to see which one perform better.
* Feature Importance, it is also good to measure the relative importance of each feature.
** Measured by how much the feature reduce the impurity on average.
** each node weight = number of training sample associated with it.

Boosting
* Combine several weaker predictor become stronger predictor

Adaptive Boosting
* Train the data -> compute the error rate -> put the weight on instance that has the error -> Train -> Rinse and Repeat until number of predictor is reached or perfect predictor is found.
* The prediction made is the probability according to the weight calculated.
* Error Rate -> Weight -> instance weight updated -> Retrain

Gradient Boosting
* Instead of weight, it feed the residual error from one predictor to another predictor.
* Combined it with early stopping
* Instead of full training set, we can use subset. This is called Stochastic Gradient Boosting.

Stacking
* Use another predictor as aggregator for all the predictor
* Common strategy is hold-out set (Blending).
** Split the training set to be one for the predictors and one for hold-out set.
** Get value for each predictor
** These values will be the new input features to hold-out set
** Use another predictor to act at this new training set
* The hold-out set can be applied so that it has multiple blending layers

Peter dev blog

Friday, November 8, 2019

Ensemble

No comments:

Post a Comment

Artificial Neural Network

Report Abuse