Quickstart¶
This guide walks through the core Endgame workflow from data loading to model evaluation. After completing it you will have seen the main API entry points and know where to look for deeper documentation on each topic.
Import Convention¶
import endgame as eg
Heavy sub-modules (models, vision, nlp, audio, benchmark, kaggle, quick) are lazy-loaded, so the import is fast even though the library is large.
End-to-End Example¶
The example below uses scikit-learn’s breast cancer dataset and covers the complete workflow: split, train, evaluate, and visualize.
import endgame as eg
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
# Load data
X, y = load_breast_cancer(return_X_y=True)
feature_names = load_breast_cancer().feature_names.tolist()
class_names = load_breast_cancer().target_names.tolist()
# Split: train / calibration / test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
# Train with competition-winning defaults
model = eg.models.LGBMWrapper(preset="endgame")
model.fit(X_train, y_train)
# Evaluate
proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, proba)
print(f"Test ROC-AUC: {auc:.4f}")
# Visualize feature importances
from endgame.visualization import BarChartVisualizer
bar = BarChartVisualizer.from_importances(model, feature_names=feature_names)
bar.save("feature_importances.html")
All Endgame estimators implement the standard scikit-learn interface:
fit, predict, predict_proba (classifiers), and transform
(transformers). They drop into any sklearn Pipeline without modification.
Model Families¶
GBDTs¶
LGBMWrapper, XGBWrapper, and CatBoostWrapper expose a unified interface
with competition-tuned hyperparameter presets.
import endgame as eg
# LightGBM with competition defaults
lgbm = eg.models.LGBMWrapper(preset="endgame")
lgbm.fit(X_train, y_train)
proba_lgbm = lgbm.predict_proba(X_test)
# XGBoost
xgb = eg.models.XGBWrapper(preset="endgame")
xgb.fit(X_train, y_train)
# CatBoost (handles categoricals natively)
cat = eg.models.CatBoostWrapper(preset="endgame")
cat.fit(X_train, y_train)
Available presets: "fast", "endgame" (competition defaults).
See models guide for the full list of supported parameters.
Deep Tabular Models¶
Deep tabular models are imported directly from their submodule to keep the top-level namespace lean.
from endgame.models.tabular import FTTransformerClassifier
ft = FTTransformerClassifier(
n_blocks=3,
d_token=192,
n_heads=8,
n_epochs=100,
)
ft.fit(X_train, y_train)
proba_ft = ft.predict_proba(X_test)
Other deep tabular classifiers follow the same import pattern:
from endgame.models.tabular import SAINTClassifier, NODEClassifier, GANDALFClassifier
Interpretable Models¶
Endgame first-class supports interpretable models that match or approach GBDT accuracy while remaining explainable.
from endgame.models import EBMClassifier
ebm = EBMClassifier()
ebm.fit(X_train, y_train)
proba_ebm = ebm.predict_proba(X_test)
# EBM exposes per-feature contributions
print(ebm.feature_importances_)
Other interpretable options:
from endgame.models.rules import RuleFitClassifier
from endgame.models.symbolic import SymbolicRegressor # PySR backend
from endgame.models.baselines import LinearClassifier
Quick API¶
The Quick API provides one-line training and multi-model comparison. It runs stratified cross-validation internally and returns out-of-fold predictions alongside a fitted model.
Single Model¶
import endgame as eg
result = eg.quick.classify(X, y, preset="default", metric="roc_auc")
print(result) # QuickResult(cv_score=0.9912, metric='roc_auc')
# Access the fitted model and OOF predictions
model = result.model
oof_preds = result.oof_predictions
importances = result.feature_importances
Available presets:
Preset |
Models included |
CV folds |
Typical runtime |
|---|---|---|---|
|
LightGBM, Linear |
3 |
~1 min |
|
LightGBM, XGBoost, CatBoost, Linear |
5 |
~5 min |
|
GBDT trio + KNN + ELM + RotationForest |
5 |
~30 min |
|
Linear, EBM, NAM |
5 |
~5 min |
Model Comparison¶
comparison = eg.quick.compare(X, y, task="classification", preset="default")
print(comparison)
# ComparisonResult:
# 1. lgbm: 0.9934
# 2. xgb: 0.9921
# 3. catboost: 0.9908
# 4. linear: 0.9743
best_model = comparison.best_model
leaderboard = comparison.leaderboard # List[Dict[str, Any]]
See quick API reference for regression support
(eg.quick.regress) and additional options.
Ensemble Methods¶
Once you have several models trained, combine them with the ensemble module.
Super Learner¶
The Super Learner finds the optimal convex combination of base models using cross-validated out-of-fold predictions (NNLS by default). It is asymptotically at least as good as the single best base learner.
from endgame.ensemble import SuperLearner
from endgame.models import EBMClassifier
from endgame.models.baselines import LinearClassifier
sl = SuperLearner(
base_estimators=[
("lgbm", eg.models.LGBMWrapper(preset="endgame")),
("ebm", EBMClassifier()),
("lr", LinearClassifier()),
],
meta_learner="nnls", # non-negative least squares
cv=5,
)
sl.fit(X_train, y_train)
proba_sl = sl.predict_proba(X_test)
Hill Climbing Ensemble¶
Forward-selection ensemble that greedily adds models while a metric improves.
from endgame.ensemble import HillClimbingEnsemble
hc = HillClimbingEnsemble(metric="roc_auc", n_iterations=20)
hc.fit(oof_preds_list, y_train) # list of OOF prediction arrays
final_proba = hc.predict(test_preds_list)
See ensemble guide for StackingEnsemble,
BlendingEnsemble, RankAverageBlender, and ThresholdOptimizer.
Conformal Prediction¶
Conformal prediction wraps any classifier and produces prediction sets with
a guaranteed coverage probability. At alpha=0.1, the true label is contained
in the returned set for at least 90% of new examples (under exchangeability).
from endgame.calibration import ConformalClassifier
import endgame as eg
# Split off a calibration set (separate from test)
X_tr, X_temp, y_tr, y_temp = train_test_split(X_train, y_train, test_size=0.3)
X_cal, X_val, y_cal, y_val = train_test_split(X_temp, y_temp, test_size=0.5)
base_model = eg.models.LGBMWrapper(preset="endgame")
cp = ConformalClassifier(base_model, method="aps", alpha=0.1)
cp.fit(X_tr, y_tr, X_cal, y_cal) # calibrate on held-out calibration set
# Returns a list of sets — each set contains the plausible class labels
prediction_sets = cp.predict(X_val)
print(prediction_sets[:5])
# [{0}, {1}, {0, 1}, {1}, {0}]
# Verify empirical coverage
coverage = cp.coverage_score(X_val, y_val)
print(f"Empirical coverage: {coverage:.3f}") # >= 0.90
Passing X_cal=None causes ConformalClassifier to automatically split a
cal_size fraction from the training data for calibration.
See calibration guide for ConformalRegressor,
ConformizedQuantileRegressor, and VennABERS.
Adversarial Validation¶
Adversarial validation detects distribution shift between train and test data before you ever submit a model. A high AUC means a classifier can easily distinguish the two splits — indicating that your CV will not correlate with the leaderboard.
import endgame as eg
av = eg.validation.AdversarialValidator(threshold=0.6)
result = av.check_drift(X_train, X_test)
print(f"Drift AUC: {result.auc_score:.3f}")
print(f"Severity: {result.drift_severity}") # 'none', 'mild', or 'severe'
print(f"Top drifting features: {result.drifted_features[:5]}")
if result.drift_severity == "severe":
# Drop the most drifting features
drop_cols = result.drifted_features[:5]
print(f"Consider dropping: {drop_cols}")
The default classifier is LightGBM when available, RandomForest otherwise.
You can supply any sklearn-compatible classifier via the estimator parameter.
See validation guide for PurgedTimeSeriesSplit,
StratifiedGroupKFold, and CombinatorialPurgedKFold.
Interactive Visualization¶
All Endgame visualizations produce self-contained HTML files with no external CDN dependencies. Open them in any browser.
Decision Tree¶
from sklearn.tree import DecisionTreeClassifier
from endgame.visualization import TreeVisualizer
clf = DecisionTreeClassifier(max_depth=4).fit(X_train, y_train)
viz = TreeVisualizer(
clf,
feature_names=feature_names,
class_names=class_names,
title="Breast Cancer Decision Tree",
)
viz.save("tree.html") # click nodes to expand/collapse
ROC Curve¶
from endgame.visualization import ROCCurveVisualizer
roc = ROCCurveVisualizer.from_predictions(y_test, proba_lgbm, label="LightGBM")
roc.add_predictions(y_test, proba_ft, label="FT-Transformer")
roc.save("roc_curves.html")
Classification Report¶
The ClassificationReport bundles confusion matrix, ROC/PR curves, calibration
plot, and lift chart into a single interactive HTML page.
from endgame.visualization import ClassificationReport
report = ClassificationReport(
y_true=y_test,
y_proba=proba_lgbm,
feature_names=feature_names,
class_names=class_names,
)
report.save("classification_report.html")
See the visualization guide for the complete chart catalogue (42 chart types including PDP, waterfall / SHAP, parallel coordinates, and calibration plots).
Next Steps¶
Topic |
Guide |
|---|---|
Preprocessing (encoding, feature engineering, balancing) |
|
Full model catalogue (100+ estimators) |
|
Hyperparameter tuning with Optuna |
|
SHAP, LIME, counterfactuals |
|
Fairness metrics and mitigation |
|
Anomaly detection |
|
Time series forecasting and classification |
|
Signal processing |
|
Visualization catalogue |
|
MCP server (AI assistant integration) |