Tree-Math

Machine learning study notes, contains Math behind all the mainstream tree-based machine learning models, covering basic decision tree models (ID3, C4.5, CART), boosted models (GBM, AdaBoost, Xgboost, LightGBM), bagging models (Bagging Tree, Random Forest, ExtraTrees).

Bagging Tree Models

Extremely Randomized Trees (ExtraTrees)

One Sentence Summary:
Train multiple strong base learners on the whole dataset parallelly and aggregate their results as the final predictions, but in each splits inside each base learners, only allow to use separate subsets of features and possible split points are drawn randomly.

a. Difference between Random Forest and Extremely Randomized Trees
The main difference between Random Forest and Extremely Randomized Trees is that in Extremely Randomized Trees, it uses random search to find the best split points & best split features of each node of a base learner whereas, in Random Forest, it uses the exact greedy method.

Aspects Bagging Tree Random Forest ExtraTrees

Subset of samples Yes Yes No (But can)

Subset of features No Yes Yes

Exact Greedy Method Yes Yes No (Random Search instead)
b. The Extremely Randomized Trees Classification Algorithm
Suppose we also introduce subsets of samples.

Model Input:
- Dataset with M features:
- Base Learner:
- Number of base learner: T
- Number of samples in each data subset: n
- Number of features allowed in each split inside a base learner: m
Model Output: Final classifier: G(x)

Steps:
- For t = 1, 2, 3, ..., T:
  - Select random subsets from the Dataset D with replacement. Each random subsets contains n samples.
  - Based on the random subsets , train a base learner . Specifically, in each split of a base learner, we select m attributes . Then for each attribute , we draw a random cut-point uniformly in [, ], where and are the minimal and maximal value of attribute in . So now we will have m splits points , each corresponds with one attribute. Among these m splits, we select the feature that reduce the gini or entropy the most at corresponding cut point.
- Output the final model using majority vote.
c. The Extremely Randomized Trees Regression Algorithm
Suppose we also introduce subsets of samples.

Model Input:
- Dataset with M features:
- Base Learner:
- Number of base learner: T
- Number of samples in each data subset: n
- Number of features allowed in each split inside a base learner: m
Model Output: Final regressor: G(x)

Steps:
- For t = 1, 2, 3, ..., T:
  - Select random subsets from the Dataset D with replacement. Each random subsets contains n samples.
  - Based on the random subsets , train a base learner . Specifically, in each split of a base learner, we select m attributes . Then for each attribute , we draw a random cut-point uniformly in [, ], where and are the minimal and maximal value of attribute in . So now we will have m splits points , each corresponds with one attribute. Among these m splits, we select the feature that reduce the MSE the most at corresponding cut point.
- Output the final model by taking take the average prediction of each base learners.

Reference

Geurts, Pierre, Damien Ernst, and Louis Wehenkel. "Extremely randomized trees." Machine learning 63.1 (2006): 3-42.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
https://scikit-learn.org/stable/modules/ensemble.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExtraTrees.md

ExtraTrees.md

Tree-Math

Bagging Tree Models

Aspects	Bagging Tree	Random Forest	ExtraTrees
Subset of samples	Yes	Yes	No (But can)
Subset of features	No	Yes	Yes
Exact Greedy Method	Yes	Yes	No (Random Search instead)

Files

ExtraTrees.md

Latest commit

History

ExtraTrees.md

File metadata and controls

Tree-Math

Bagging Tree Models