The StackingEnsemble
class is designed to build multi-layered stacking and blending models, providing a robust framework for ensemble learning, particularly suited for regression tasks. This class allows users to implement two distinct ensemble strategies: stacking (with K-fold out-of-fold predictions) and blending (with a hold-out validation set). It also includes extensive input validation and error handling to guide the user in case of incorrect inputs or issues during fitting and predicting.
-
layers
:list of lists
- A list of lists, where each inner list contains models (i.e., estimators) for a particular layer in the ensemble.
- Each model should be a scikit-learn compatible model, meaning it must implement the
fit()
andpredict()
methods. - Example:
[[Model1, Model2], [Model3, Model4]]
would define two layers, with two models in each layer. - Note: The order of layers matters. Models in later layers will use predictions from models in earlier layers as input features.
-
meta_model
:estimator
- A single scikit-learn compatible model that combines the predictions from the final layer into a final prediction.
- This model typically performs regression or classification on the predictions from the previous layer's models (depending on the task).
- Example: A
LinearRegression()
orRandomForestRegressor()
might serve as a good meta-model for regression tasks.
-
n_folds
:int
, default=5- Specifies the number of folds for K-fold cross-validation, which is used for generating out-of-fold predictions during the stacking process.
- The default is 5, but users can choose any value greater than or equal to 2.
- Note: Only used when
blending=False
.
-
blending
:bool
, default=False- If
True
, the model uses a hold-out validation set for blending instead of K-fold cross-validation. - In blending mode, a portion of the training data is reserved as a hold-out set (specified by
blend_size
) and used for training the base models, while predictions for the final meta-model are made on this hold-out set. - Default: False (indicating stacking mode).
- If
-
blend_size
:float
, default=0.2- Specifies the proportion of the training data to hold out for blending (i.e., used as a validation set in blending mode).
- The value must be between 0 and 1, where a value of 0.2 means 20% of the data is used as the hold-out set.
- Required: Only used if
blending=True
.
-
random_state
:int
, default=None- A seed value for controlling the randomness in splitting the dataset (for cross-validation in stacking or train/hold-out split in blending).
- Default: None (which means the random state is not fixed).
layer_models_
:list
- A list that stores the fitted models for each layer after the
fit()
method is called. - This includes the base models from each layer and their predictions used as inputs for the subsequent layers.
- A list that stores the fitted models for each layer after the
This method initializes the ensemble class and validates input parameters.
layers
,meta_model
,n_folds
,blending
,blend_size
,random_state
(See Parameters section above for detailed descriptions.)
ValueError
: Iflayers
is not a non-empty list of non-empty lists, or if any model inlayers
doesn't havefit
orpredict
methods.ValueError
: Ifmeta_model
doesn't havefit
orpredict
methods.ValueError
: Ifn_folds
is less than 2 or not an integer.ValueError
: Ifblend_size
is not between 0 and 1 whenblending=True
.
Fits the stacking ensemble model to the provided training data (X
, y
). This method processes each layer of models and trains them accordingly using either stacking (K-fold CV) or blending (hold-out set).
X
:pandas.DataFrame
ornumpy.ndarray
- The feature matrix containing training data.
y
:pandas.Series
,numpy.ndarray
, orlist
- The target vector containing the labels or outputs for each sample.
self
:object
- The fitted
StackingEnsemble
object.
- The fitted
TypeError
: IfX
is not a pandas DataFrame or numpy array, or ify
is not a pandas Series, numpy array, or list.ValueError
: If the number of samples inX
andy
does not match.RuntimeError
: If an error occurs during the fitting process, such as failure to split data correctly, errors in model training, or predictions.
- Input Validation: Ensures
X
andy
are of correct types and dimensions. - Layer-wise Training:
- For each layer, the method either generates out-of-fold predictions using K-fold cross-validation (stacking) or trains models on the training set and generates predictions on a hold-out set (blending).
- Final Model Training: The meta-model is trained using the predictions from the final layer as input.
Uses the fitted ensemble model to make predictions on new data (X
).
X
:pandas.DataFrame
ornumpy.ndarray
- The feature matrix for which predictions are needed.
y_pred
:numpy.ndarray
- The predicted values based on the ensemble model.
TypeError
: IfX
is not a pandas DataFrame or numpy array.RuntimeError
: If an error occurs during prediction (e.g., failure in model predictions).
- Layer-wise Prediction: For each layer, predictions are made using the models from that layer.
- Meta-Model Prediction: The final predictions are obtained by passing the predictions from the last layer through the meta-model.
Prints the entire structure of the ensemble model, including each layer, the models within each layer, and the meta-model, in a detailed tree format.
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
# Define models for the ensemble layers
layer_1_models = [LinearRegression(), RandomForestRegressor(n_estimators=50)]
layer_2_models = [SVR(kernel='rbf', C=1.0, epsilon=0.1)]
# Meta model
meta_model = LinearRegression()
# Create an instance of the StackingEnsemble
ensemble = StackingEnsemble(layers=[layer_1_models, layer_2_models], meta_model=meta_model)
# Example training data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit the ensemble
ensemble.fit(X_train, y_train)
# Make predictions
y_pred = ensemble.predict(X_test)
# Print the ensemble structure
ensemble.print_structure()
Example Output:
Stacking Model Structure:
Meta Model: LinearRegression
- Parameters:
{'fit_intercept': True, 'normalize': False}
Layer 1:
- Model 1:
LinearRegression
- Parameters:
{'fit_intercept': True, 'normalize': False}
- Parameters:
- Model 2:
RandomForestRegressor
- Parameters:
{'n_estimators': 50}
- Parameters:
Layer 2:
- Model 1:
SVR
- Parameters:
{'kernel': 'rbf', 'C': 1.0, 'epsilon': 0.1}
- Parameters:
Blending Enabled: False
Returns only the parameters that were explicitly changed by the user for a given model.
-
Parameters:
model
: The model whose parameters you want to check.
-
Returns:
- A dictionary of changed parameters or
"No changes (using defaults)"
if no changes were made.
- A dictionary of changed parameters or
model
:sklearn
model- The model to inspect for changed parameters.
dict
orstr
: A dictionary of changed parameters (key-value pairs), or a string indicating that no changes were made (i.e., using the default parameters).
{
'n_estimators': 50
}