Skip to content

Latest commit

 

History

History
58 lines (46 loc) · 4.2 KB

Bagging.md

File metadata and controls

58 lines (46 loc) · 4.2 KB

Tree-Math

Machine learning study notes, contains Math behind all the mainstream tree-based machine learning models, covering basic decision tree models (ID3, C4.5, CART), boosted models (GBM, AdaBoost, Xgboost, LightGBM), bagging models (Bagging Tree, Random Forest, ExtraTrees).

Bagging Tree Models

Bagging Tree

One Sentence Summary:
Train multiple strong base learners on different subsets of dataset parallelly and take the average or majority vote as the final predictions.

  • a. Difference between Bagging and Boosting
    The logic behind the boosting method is adding weak base learners step by step to form a strong learner and correct previous mistakes. But the core idea behind bagging is training and aggregating multiple strong base learners at the same time to prevent overfitting.  

    Aspects Boosting Bagging
    Ensemble Category Sequential Ensembling: weak base learners are generated sequentially Parallel Ensembling: Strong base learners are generated parallelly
    Overall Target Reduce Bias Reduce Variance
    Target of individual base learners Reduce previous weak learners' error Reduce the overfitting of each strong base learners
    Parallel Computing Parallel computing within a single tree (XGboost) Parallel computing within a single tree & across running different trees
  • b. The Bagging Tree Classification Algorithm
    Model Input:

    • Dataset: img
    • Base Learner: img
    • Number of base learner: T
    • Number of samples in each data subset: n

    Model Output: Final classifier: G(x)

    Steps:

    • For t = 1, 2, 3, ..., T:
      • Select random subsets img from the Dataset D with replacement. Each random subsets img contains n samples.
      • Based on the random subsets img, train a base learner img
    • Output the final model img using majority vote.
  • c. The Bagging Tree Regression Algorithm
    Model Input:

    • Dataset: img
    • Base Learner: img
    • Number of base learner: T
    • Number of samples in each data subset: n

    Model Output: Final regressor: G(x)

    Steps:

    • For t = 1, 2, 3, ..., T:
      • Select random subsets img from the Dataset D with replacement. Each random subsets img contains n samples.
      • Based on the random subsets img, train a base learner img
    • Output the final model img by taking take the average prediction of each base learners.

Reference

  1. Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140.
  2. Zhihua Zhou. Machine Learning[M]. Tsinghua University Press, 2018. [Chinese]
  3. https://towardsdatascience.com/decision-tree-ensembles-bagging-and-boosting-266a8ba60fd9
  4. https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/
  5. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html