* Pure node will have zero impurity.
* Classification and Regression Tree algorithm is used.
* Classifier cost function:
** (m_left/m * G_left) + (m_right/m * G_right)
** G is the impurity
* sometimes entropy is being used instead of Gini.
** Gini isolate most frequent class in its own branch
** entropy tends to produce slightly balanced tree
** Gini is faster.
* Regression cost function
** (m_left/m * MSE_left) + (m_right/m * MSE_right)
* It is nonparametric model because the model structure is free to stick closely to the data.
* Parametric model has predefined number of parameters
* Nonparametric model tends to overfitting the data, while parametric tends to underfitting
* Decision tree is also love orthogonal decision boundary. This has problem with data that has no orthogonal boundary, like being rotated.
No comments:
Post a Comment