Quickstart¶

This guide walks through the core Endgame workflow from data loading to model evaluation. After completing it you will have seen the main API entry points and know where to look for deeper documentation on each topic.

Import Convention¶

import endgame as eg

Heavy sub-modules (models, vision, nlp, audio, benchmark, kaggle, quick) are lazy-loaded, so the import is fast even though the library is large.

End-to-End Example¶

The example below uses scikit-learn’s breast cancer dataset and covers the complete workflow: split, train, evaluate, and visualize.

import endgame as eg
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Load data
X, y = load_breast_cancer(return_X_y=True)
feature_names = load_breast_cancer().feature_names.tolist()
class_names = load_breast_cancer().target_names.tolist()

# Split: train / calibration / test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# Train with competition-winning defaults
model = eg.models.LGBMWrapper(preset="endgame")
model.fit(X_train, y_train)

# Evaluate
proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, proba)
print(f"Test ROC-AUC: {auc:.4f}")

# Visualize feature importances
from endgame.visualization import BarChartVisualizer

bar = BarChartVisualizer.from_importances(model, feature_names=feature_names)
bar.save("feature_importances.html")

All Endgame estimators implement the standard scikit-learn interface: fit, predict, predict_proba (classifiers), and transform (transformers). They drop into any sklearn Pipeline without modification.

Model Families¶

GBDTs¶

LGBMWrapper, XGBWrapper, and CatBoostWrapper expose a unified interface with competition-tuned hyperparameter presets.

import endgame as eg

# LightGBM with competition defaults
lgbm = eg.models.LGBMWrapper(preset="endgame")
lgbm.fit(X_train, y_train)
proba_lgbm = lgbm.predict_proba(X_test)

# XGBoost
xgb = eg.models.XGBWrapper(preset="endgame")
xgb.fit(X_train, y_train)

# CatBoost (handles categoricals natively)
cat = eg.models.CatBoostWrapper(preset="endgame")
cat.fit(X_train, y_train)

Available presets: "fast", "endgame" (competition defaults).

See models guide for the full list of supported parameters.

Deep Tabular Models¶

Deep tabular models are imported directly from their submodule to keep the top-level namespace lean.

from endgame.models.tabular import FTTransformerClassifier

ft = FTTransformerClassifier(
    n_blocks=3,
    d_token=192,
    n_heads=8,
    n_epochs=100,
)
ft.fit(X_train, y_train)
proba_ft = ft.predict_proba(X_test)

Other deep tabular classifiers follow the same import pattern:

from endgame.models.tabular import SAINTClassifier, NODEClassifier, GANDALFClassifier

Interpretable Models¶

Endgame first-class supports interpretable models that match or approach GBDT accuracy while remaining explainable.

from endgame.models import EBMClassifier

ebm = EBMClassifier()
ebm.fit(X_train, y_train)
proba_ebm = ebm.predict_proba(X_test)

# EBM exposes per-feature contributions
print(ebm.feature_importances_)

Other interpretable options:

from endgame.models.rules import RuleFitClassifier
from endgame.models.symbolic import SymbolicRegressor  # PySR backend
from endgame.models.baselines import LinearClassifier

Quick API¶

The Quick API provides one-line training and multi-model comparison. It runs stratified cross-validation internally and returns out-of-fold predictions alongside a fitted model.

Single Model¶

import endgame as eg

result = eg.quick.classify(X, y, preset="default", metric="roc_auc")
print(result)  # QuickResult(cv_score=0.9912, metric='roc_auc')

# Access the fitted model and OOF predictions
model = result.model
oof_preds = result.oof_predictions
importances = result.feature_importances

Available presets:

Preset	Models included	CV folds	Typical runtime
`"fast"`	LightGBM, Linear	3	~1 min
`"default"`	LightGBM, XGBoost, CatBoost, Linear	5	~5 min
`"competition"`	GBDT trio + KNN + ELM + RotationForest	5	~30 min
`"interpretable"`	Linear, EBM, NAM	5	~5 min

Model Comparison¶

comparison = eg.quick.compare(X, y, task="classification", preset="default")
print(comparison)
# ComparisonResult:
#   1. lgbm:     0.9934
#   2. xgb:      0.9921
#   3. catboost: 0.9908
#   4. linear:   0.9743

best_model = comparison.best_model
leaderboard = comparison.leaderboard  # List[Dict[str, Any]]

See quick API reference for regression support (eg.quick.regress) and additional options.

Ensemble Methods¶

Once you have several models trained, combine them with the ensemble module.

Super Learner¶

The Super Learner finds the optimal convex combination of base models using cross-validated out-of-fold predictions (NNLS by default). It is asymptotically at least as good as the single best base learner.

from endgame.ensemble import SuperLearner
from endgame.models import EBMClassifier
from endgame.models.baselines import LinearClassifier

sl = SuperLearner(
    base_estimators=[
        ("lgbm", eg.models.LGBMWrapper(preset="endgame")),
        ("ebm", EBMClassifier()),
        ("lr", LinearClassifier()),
    ],
    meta_learner="nnls",  # non-negative least squares
    cv=5,
)
sl.fit(X_train, y_train)
proba_sl = sl.predict_proba(X_test)

Hill Climbing Ensemble¶

Forward-selection ensemble that greedily adds models while a metric improves.

from endgame.ensemble import HillClimbingEnsemble

hc = HillClimbingEnsemble(metric="roc_auc", n_iterations=20)
hc.fit(oof_preds_list, y_train)   # list of OOF prediction arrays
final_proba = hc.predict(test_preds_list)

See ensemble guide for StackingEnsemble, BlendingEnsemble, RankAverageBlender, and ThresholdOptimizer.

Conformal Prediction¶

Conformal prediction wraps any classifier and produces prediction sets with a guaranteed coverage probability. At alpha=0.1, the true label is contained in the returned set for at least 90% of new examples (under exchangeability).

from endgame.calibration import ConformalClassifier
import endgame as eg

# Split off a calibration set (separate from test)
X_tr, X_temp, y_tr, y_temp = train_test_split(X_train, y_train, test_size=0.3)
X_cal, X_val, y_cal, y_val = train_test_split(X_temp, y_temp, test_size=0.5)

base_model = eg.models.LGBMWrapper(preset="endgame")

cp = ConformalClassifier(base_model, method="aps", alpha=0.1)
cp.fit(X_tr, y_tr, X_cal, y_cal)   # calibrate on held-out calibration set

# Returns a list of sets — each set contains the plausible class labels
prediction_sets = cp.predict(X_val)
print(prediction_sets[:5])
# [{0}, {1}, {0, 1}, {1}, {0}]

# Verify empirical coverage
coverage = cp.coverage_score(X_val, y_val)
print(f"Empirical coverage: {coverage:.3f}")  # >= 0.90

Passing X_cal=None causes ConformalClassifier to automatically split a cal_size fraction from the training data for calibration.

See calibration guide for ConformalRegressor, ConformizedQuantileRegressor, and VennABERS.

Adversarial Validation¶

Adversarial validation detects distribution shift between train and test data before you ever submit a model. A high AUC means a classifier can easily distinguish the two splits — indicating that your CV will not correlate with the leaderboard.

import endgame as eg

av = eg.validation.AdversarialValidator(threshold=0.6)
result = av.check_drift(X_train, X_test)

print(f"Drift AUC: {result.auc_score:.3f}")
print(f"Severity:  {result.drift_severity}")   # 'none', 'mild', or 'severe'
print(f"Top drifting features: {result.drifted_features[:5]}")

if result.drift_severity == "severe":
    # Drop the most drifting features
    drop_cols = result.drifted_features[:5]
    print(f"Consider dropping: {drop_cols}")

The default classifier is LightGBM when available, RandomForest otherwise. You can supply any sklearn-compatible classifier via the estimator parameter.

See validation guide for PurgedTimeSeriesSplit, StratifiedGroupKFold, and CombinatorialPurgedKFold.

Interactive Visualization¶

All Endgame visualizations produce self-contained HTML files with no external CDN dependencies. Open them in any browser.

Decision Tree¶

from sklearn.tree import DecisionTreeClassifier
from endgame.visualization import TreeVisualizer

clf = DecisionTreeClassifier(max_depth=4).fit(X_train, y_train)
viz = TreeVisualizer(
    clf,
    feature_names=feature_names,
    class_names=class_names,
    title="Breast Cancer Decision Tree",
)
viz.save("tree.html")   # click nodes to expand/collapse

ROC Curve¶

from endgame.visualization import ROCCurveVisualizer

roc = ROCCurveVisualizer.from_predictions(y_test, proba_lgbm, label="LightGBM")
roc.add_predictions(y_test, proba_ft, label="FT-Transformer")
roc.save("roc_curves.html")

Classification Report¶

The ClassificationReport bundles confusion matrix, ROC/PR curves, calibration plot, and lift chart into a single interactive HTML page.

from endgame.visualization import ClassificationReport

report = ClassificationReport(
    y_true=y_test,
    y_proba=proba_lgbm,
    feature_names=feature_names,
    class_names=class_names,
)
report.save("classification_report.html")

See the visualization guide for the complete chart catalogue (42 chart types including PDP, waterfall / SHAP, parallel coordinates, and calibration plots).

Next Steps¶

Topic	Guide
Preprocessing (encoding, feature engineering, balancing)	preprocessing guide
Full model catalogue (100+ estimators)	models guide
Hyperparameter tuning with Optuna	tune API
SHAP, LIME, counterfactuals	explain API
Fairness metrics and mitigation	fairness API
Anomaly detection	anomaly API
Time series forecasting and classification	timeseries API
Signal processing	signal API
Visualization catalogue	visualization guide
MCP server (AI assistant integration)	MCP server guide