Ensembles Guide¶
Endgame provides a full suite of ensemble methods, from classic stacking and blending
to advanced techniques like hill climbing, optimal weight search, and knowledge
distillation. All ensemble classes follow the sklearn interface (fit, predict,
predict_proba).
SuperLearner¶
SuperLearner combines arbitrary base learners using non-negative least squares (NNLS)
weighting, producing a convex combination that cannot perform worse than the best
individual model on the training data.
from endgame.ensemble import SuperLearner
from endgame.models import LGBMWrapper, XGBWrapper, CatBoostWrapper
base_learners = [
LGBMWrapper(preset='endgame'),
XGBWrapper(preset='endgame'),
CatBoostWrapper(preset='endgame'),
]
sl = SuperLearner(
base_estimators=[
("lgbm", LGBMWrapper(preset='endgame')),
("xgb", XGBWrapper(preset='endgame')),
("cb", CatBoostWrapper(preset='endgame')),
],
meta_learner="nnls", # non-negative least squares
cv=5, # inner cross-validation folds for meta-features
)
sl.fit(X_train, y_train)
proba = sl.predict_proba(X_test)
preds = sl.predict(X_test)
# Inspect learned weights
print(sl.coef_) # non-negative, sum to 1
The meta-features are out-of-fold predictions from each base learner. The NNLS solver finds the weight vector that minimises squared error on those meta-features, guaranteeing non-negative weights without requiring regularisation.
HillClimbingEnsemble¶
HillClimbingEnsemble uses greedy forward selection to build an ensemble that
directly optimises an arbitrary metric. At each step it adds the candidate model
(with repetition allowed) that most improves the ensemble score on the hold-out
fold. This mirrors the approach used in many competition-winning solutions.
from endgame.ensemble import HillClimbingEnsemble
from sklearn.metrics import roc_auc_score
hc = HillClimbingEnsemble(
metric=roc_auc_score,
maximize=True,
n_iterations=100, # maximum greedy steps
random_state=42,
)
# Pass a list of OOF prediction arrays
oof_preds = [lgbm_oof, xgb_oof, cb_oof, ft_oof]
hc.fit(oof_preds, y_train)
# Apply the discovered weights to test predictions
test_preds = [lgbm_test, xgb_test, cb_test, ft_test]
final = hc.predict(test_preds)
print(hc.weights_) # float weights, sums to 1
print(hc.best_score_) # best metric achieved on OOF
StackingEnsemble¶
StackingEnsemble trains base learners and a meta-learner in a single fit call.
Base learner out-of-fold predictions become features for the meta-learner.
from endgame.ensemble import StackingEnsemble
from endgame.models import LGBMWrapper, XGBWrapper
from sklearn.linear_model import LogisticRegression
stack = StackingEnsemble(
estimators=[
('lgbm', LGBMWrapper(preset='endgame')),
('xgb', XGBWrapper(preset='endgame')),
],
meta_learner=LogisticRegression(),
cv=5,
passthrough=True, # also pass original features to meta-learner
use_proba=True, # use predict_proba outputs as meta-features
)
stack.fit(X_train, y_train)
preds = stack.predict(X_test)
proba = stack.predict_proba(X_test)
BlendingEnsemble¶
BlendingEnsemble uses a fixed hold-out split rather than cross-validation to
generate meta-features. This is faster but uses less data for training base learners.
from endgame.ensemble import BlendingEnsemble
from endgame.models import LGBMWrapper, XGBWrapper, CatBoostWrapper
blend = BlendingEnsemble(
estimators=[
('lgbm', LGBMWrapper()),
('xgb', XGBWrapper()),
('cb', CatBoostWrapper()),
],
meta_learner=LGBMWrapper(n_estimators=200),
holdout_size=0.2,
random_state=42,
)
blend.fit(X_train, y_train)
preds = blend.predict(X_test)
OptimizedBlender¶
OptimizedBlender finds continuous blend weights by minimising a loss function
over the provided out-of-fold predictions using scipy optimisation (L-BFGS-B with
a simplex constraint).
from endgame.ensemble import OptimizedBlender
from sklearn.metrics import log_loss
blender = OptimizedBlender(
metric=log_loss,
maximize=False, # log_loss should be minimised
bounds=(0.0, 1.0), # weight bounds per model
)
blender.fit(oof_preds_matrix, y_train) # shape (n_samples, n_models)
final = blender.predict(test_preds_matrix)
print(blender.weights_)
RankAverageBlender¶
RankAverageBlender converts each model’s predictions to ranks before averaging.
This is robust to scale differences between models and often outperforms simple
averaging when models produce predictions on different scales.
from endgame.ensemble import RankAverageBlender
blender = RankAverageBlender(weights=[0.4, 0.35, 0.25])
final = blender.predict(test_preds_matrix) # shape (n_samples, n_models)
ThresholdOptimizer¶
ThresholdOptimizer finds the optimal classification threshold by searching over
a grid of cutoffs and maximising a target metric on out-of-fold predictions. This
is particularly useful for imbalanced datasets where the default 0.5 threshold is
suboptimal.
from endgame.ensemble import ThresholdOptimizer
from sklearn.metrics import f1_score
optimizer = ThresholdOptimizer(
metric=f1_score,
maximize=True,
thresholds=100, # number of candidate thresholds to evaluate
)
optimizer.fit(oof_probabilities, y_train)
print(f"Optimal threshold: {optimizer.threshold_:.4f}")
hard_preds = optimizer.predict(test_probabilities)
Knowledge Distillation¶
Endgame supports training a lightweight student model to mimic a heavier teacher model. This is useful when you need a fast inference model that approximates an expensive ensemble.
from endgame.ensemble import KnowledgeDistiller
from endgame.models import LGBMWrapper
from endgame.models.baselines import LinearClassifier
teacher = LGBMWrapper(preset='endgame')
teacher.fit(X_train, y_train)
student = LinearClassifier()
kd = KnowledgeDistiller(
teacher=teacher,
student=student,
temperature=3.0, # softens teacher's probability distribution
alpha=0.5, # blend of hard labels vs soft labels
)
kd.fit(X_train, y_train)
preds = kd.student_.predict(X_test)
The temperature parameter controls how much the teacher’s soft probabilities are
smoothed before being used as targets. Higher temperature produces softer, more
informative targets. alpha controls the trade-off between learning from the hard
ground-truth labels and the soft teacher labels.
Choosing an Ensemble Strategy¶
Strategy |
When to use |
|---|---|
|
Strong diverse base learners; want theoretically-grounded weighting |
|
Have OOF predictions; want to directly optimise a target metric |
|
Standard competition workflow; enough data for CV-based stacking |
|
Limited time; large datasets where full CV is expensive |
|
OOF predictions already computed; want continuous weight optimisation |
|
Models have incompatible prediction scales |
|
Binary classification with imbalanced classes or custom metric |
|
Need fast inference; ensemble too slow for production |
See Also¶
Models Guide for base learner options
Calibration Guide for post-hoc probability calibration