Machine learning study notes, contains Math behind all the mainstream tree-based machine learning models, covering basic decision tree models (ID3, C4.5, CART), boosted models (GBM, AdaBoost, Xgboost, LightGBM), bagging models (Bagging Tree, Random Forest, ExtraTrees).
Extremely Randomized Trees (ExtraTrees)
One Sentence Summary:
Train multiple strong base learners on the whole dataset parallelly and aggregate their results as the final predictions, but in each splits inside each base learners, only allow to use separate subsets of features and possible split points are drawn randomly.
-
a. Difference between Random Forest and Extremely Randomized Trees
The main difference between Random Forest and Extremely Randomized Trees is that in Extremely Randomized Trees, it uses random search to find the best split points & best split features of each node of a base learner whereas, in Random Forest, it uses the exact greedy method.Aspects Bagging Tree Random Forest ExtraTrees Subset of samples Yes Yes No (But can) Subset of features No Yes Yes Exact Greedy Method Yes Yes No (Random Search instead) -
b. The Extremely Randomized Trees Classification Algorithm
Suppose we also introduce subsets of samples.Model Input:
- Dataset with M features:
- Base Learner:
- Number of base learner: T
- Number of samples in each data subset: n
- Number of features allowed in each split inside a base learner: m
Model Output: Final classifier: G(x)
Steps:
- For t = 1, 2, 3, ..., T:
- Select random subsets
from the Dataset D with replacement. Each random subsets
contains n samples.
- Based on the random subsets
, train a base learner
. Specifically, in each split of a base learner, we select m attributes
. Then for each attribute
, we draw a random cut-point
uniformly in [
,
], where
and
are the minimal and maximal value of attribute
in
. So now we will have m splits points
, each corresponds with one attribute. Among these m splits, we select the feature that reduce the gini or entropy the most at corresponding cut point.
- Select random subsets
- Output the final model
using majority vote.
- Dataset with M features:
-
c. The Extremely Randomized Trees Regression Algorithm
Suppose we also introduce subsets of samples.Model Input:
- Dataset with M features:
- Base Learner:
- Number of base learner: T
- Number of samples in each data subset: n
- Number of features allowed in each split inside a base learner: m
Model Output: Final regressor: G(x)
Steps:
- For t = 1, 2, 3, ..., T:
- Select random subsets
from the Dataset D with replacement. Each random subsets
contains n samples.
- Based on the random subsets
, train a base learner
. Specifically, in each split of a base learner, we select m attributes
. Then for each attribute
, we draw a random cut-point
uniformly in [
,
], where
and
are the minimal and maximal value of attribute
in
. So now we will have m splits points
, each corresponds with one attribute. Among these m splits, we select the feature that reduce the MSE the most at corresponding cut point.
- Select random subsets
- Output the final model
by taking take the average prediction of each base learners.
- Dataset with M features:
Reference
- Geurts, Pierre, Damien Ernst, and Louis Wehenkel. "Extremely randomized trees." Machine learning 63.1 (2006): 3-42.
- https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
- https://scikit-learn.org/stable/modules/ensemble.html