* Use diversified classifier so that each result in independent of each other.
Voting
* Combine few algorithm. Predict the class with the highest votes.
* Law of large number.
** As the number increase, it will start closer and closer to the probability value it supposed to have.
* Soft Voting vs Hard Voting
** 0.3 0.3 0.9
** Hard Voting: No (0.3 win)
** Soft Voting: Yes (0.5 average)
Bagging and Pasting
* Instead of using different algorithm, use same algorithm as predictor
* train on different subset of training set.
* Bagging take the subset, but the instance can be the same (it has replacement)
* Pasting take the subset without the replacement
* It can be paralleled using different core.
* It is good because it scaled well.
* Also supports sampling of features as well.
** Sampling both feature and instances is called Random Patches
** Sampling feature only is called Random Subspaces
RandomTreeForest
* Bagging type
* Introduce randomness by searching best feature in each subset.
* Extremely Randomized Tree (Extra Tree) by choosing random threshold for each feature.
* Compare both normal RandomTree and ExtraTree to see which one perform better.
* Feature Importance, it is also good to measure the relative importance of each feature.
** Measured by how much the feature reduce the impurity on average.
** each node weight = number of training sample associated with it.
Boosting
* Combine several weaker predictor become stronger predictor
Adaptive Boosting
* Train the data -> compute the error rate -> put the weight on instance that has the error -> Train -> Rinse and Repeat until number of predictor is reached or perfect predictor is found.
* The prediction made is the probability according to the weight calculated.
* Error Rate -> Weight -> instance weight updated -> Retrain
Gradient Boosting
* Instead of weight, it feed the residual error from one predictor to another predictor.
* Combined it with early stopping
* Instead of full training set, we can use subset. This is called Stochastic Gradient Boosting.
Stacking
* Use another predictor as aggregator for all the predictor
* Common strategy is hold-out set (Blending).
** Split the training set to be one for the predictors and one for hold-out set.
** Get value for each predictor
** These values will be the new input features to hold-out set
** Use another predictor to act at this new training set
* The hold-out set can be applied so that it has multiple blending layers
Friday, November 8, 2019
Subscribe to:
Post Comments (Atom)
Artificial Neural Network
Logical Computation With Neuron * It has one or more binary input and one output. * Activate output when certain number of input is active...
-
Instead of value like linear regression, it calculate probability out of the value. p = logistic(X.theta) * logistic is inverse of logit f...
-
Linear SVM Classification * create decision boundary that separates the instances * the edge of decision boundary is called support vector...
-
* Gini impurity => 1 - sigma ratio of k among the training instances. * Pure node will have zero impurity. * Classification and Regre...
No comments:
Post a Comment