Quickstart

This guide walks through the core Endgame workflow from data loading to model evaluation. After completing it you will have seen the main API entry points and know where to look for deeper documentation on each topic.

Import Convention

import endgame as eg

Heavy sub-modules (models, vision, nlp, audio, benchmark, kaggle, quick) are lazy-loaded, so the import is fast even though the library is large.


End-to-End Example

The example below uses scikit-learn’s breast cancer dataset and covers the complete workflow: split, train, evaluate, and visualize.

import endgame as eg
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Load data
X, y = load_breast_cancer(return_X_y=True)
feature_names = load_breast_cancer().feature_names.tolist()
class_names = load_breast_cancer().target_names.tolist()

# Split: train / calibration / test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# Train with competition-winning defaults
model = eg.models.LGBMWrapper(preset="endgame")
model.fit(X_train, y_train)

# Evaluate
proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, proba)
print(f"Test ROC-AUC: {auc:.4f}")

# Visualize feature importances
from endgame.visualization import BarChartVisualizer

bar = BarChartVisualizer.from_importances(model, feature_names=feature_names)
bar.save("feature_importances.html")

All Endgame estimators implement the standard scikit-learn interface: fit, predict, predict_proba (classifiers), and transform (transformers). They drop into any sklearn Pipeline without modification.


Model Families

GBDTs

LGBMWrapper, XGBWrapper, and CatBoostWrapper expose a unified interface with competition-tuned hyperparameter presets.

import endgame as eg

# LightGBM with competition defaults
lgbm = eg.models.LGBMWrapper(preset="endgame")
lgbm.fit(X_train, y_train)
proba_lgbm = lgbm.predict_proba(X_test)

# XGBoost
xgb = eg.models.XGBWrapper(preset="endgame")
xgb.fit(X_train, y_train)

# CatBoost (handles categoricals natively)
cat = eg.models.CatBoostWrapper(preset="endgame")
cat.fit(X_train, y_train)

Available presets: "fast", "endgame" (competition defaults).

See models guide for the full list of supported parameters.

Deep Tabular Models

Deep tabular models are imported directly from their submodule to keep the top-level namespace lean.

from endgame.models.tabular import FTTransformerClassifier

ft = FTTransformerClassifier(
    n_blocks=3,
    d_token=192,
    n_heads=8,
    n_epochs=100,
)
ft.fit(X_train, y_train)
proba_ft = ft.predict_proba(X_test)

Other deep tabular classifiers follow the same import pattern:

from endgame.models.tabular import SAINTClassifier, NODEClassifier, GANDALFClassifier

Interpretable Models

Endgame first-class supports interpretable models that match or approach GBDT accuracy while remaining explainable.

from endgame.models import EBMClassifier

ebm = EBMClassifier()
ebm.fit(X_train, y_train)
proba_ebm = ebm.predict_proba(X_test)

# EBM exposes per-feature contributions
print(ebm.feature_importances_)

Other interpretable options:

from endgame.models.rules import RuleFitClassifier
from endgame.models.symbolic import SymbolicRegressor  # PySR backend
from endgame.models.baselines import LinearClassifier

Quick API

The Quick API provides one-line training and multi-model comparison. It runs stratified cross-validation internally and returns out-of-fold predictions alongside a fitted model.

Single Model

import endgame as eg

result = eg.quick.classify(X, y, preset="default", metric="roc_auc")
print(result)  # QuickResult(cv_score=0.9912, metric='roc_auc')

# Access the fitted model and OOF predictions
model = result.model
oof_preds = result.oof_predictions
importances = result.feature_importances

Available presets:

Preset

Models included

CV folds

Typical runtime

"fast"

LightGBM, Linear

3

~1 min

"default"

LightGBM, XGBoost, CatBoost, Linear

5

~5 min

"competition"

GBDT trio + KNN + ELM + RotationForest

5

~30 min

"interpretable"

Linear, EBM, NAM

5

~5 min

Model Comparison

comparison = eg.quick.compare(X, y, task="classification", preset="default")
print(comparison)
# ComparisonResult:
#   1. lgbm:     0.9934
#   2. xgb:      0.9921
#   3. catboost: 0.9908
#   4. linear:   0.9743

best_model = comparison.best_model
leaderboard = comparison.leaderboard  # List[Dict[str, Any]]

See quick API reference for regression support (eg.quick.regress) and additional options.


Ensemble Methods

Once you have several models trained, combine them with the ensemble module.

Super Learner

The Super Learner finds the optimal convex combination of base models using cross-validated out-of-fold predictions (NNLS by default). It is asymptotically at least as good as the single best base learner.

from endgame.ensemble import SuperLearner
from endgame.models import EBMClassifier
from endgame.models.baselines import LinearClassifier

sl = SuperLearner(
    base_estimators=[
        ("lgbm", eg.models.LGBMWrapper(preset="endgame")),
        ("ebm", EBMClassifier()),
        ("lr", LinearClassifier()),
    ],
    meta_learner="nnls",  # non-negative least squares
    cv=5,
)
sl.fit(X_train, y_train)
proba_sl = sl.predict_proba(X_test)

Hill Climbing Ensemble

Forward-selection ensemble that greedily adds models while a metric improves.

from endgame.ensemble import HillClimbingEnsemble

hc = HillClimbingEnsemble(metric="roc_auc", n_iterations=20)
hc.fit(oof_preds_list, y_train)   # list of OOF prediction arrays
final_proba = hc.predict(test_preds_list)

See ensemble guide for StackingEnsemble, BlendingEnsemble, RankAverageBlender, and ThresholdOptimizer.


Conformal Prediction

Conformal prediction wraps any classifier and produces prediction sets with a guaranteed coverage probability. At alpha=0.1, the true label is contained in the returned set for at least 90% of new examples (under exchangeability).

from endgame.calibration import ConformalClassifier
import endgame as eg

# Split off a calibration set (separate from test)
X_tr, X_temp, y_tr, y_temp = train_test_split(X_train, y_train, test_size=0.3)
X_cal, X_val, y_cal, y_val = train_test_split(X_temp, y_temp, test_size=0.5)

base_model = eg.models.LGBMWrapper(preset="endgame")

cp = ConformalClassifier(base_model, method="aps", alpha=0.1)
cp.fit(X_tr, y_tr, X_cal, y_cal)   # calibrate on held-out calibration set

# Returns a list of sets — each set contains the plausible class labels
prediction_sets = cp.predict(X_val)
print(prediction_sets[:5])
# [{0}, {1}, {0, 1}, {1}, {0}]

# Verify empirical coverage
coverage = cp.coverage_score(X_val, y_val)
print(f"Empirical coverage: {coverage:.3f}")  # >= 0.90

Passing X_cal=None causes ConformalClassifier to automatically split a cal_size fraction from the training data for calibration.

See calibration guide for ConformalRegressor, ConformizedQuantileRegressor, and VennABERS.


Adversarial Validation

Adversarial validation detects distribution shift between train and test data before you ever submit a model. A high AUC means a classifier can easily distinguish the two splits — indicating that your CV will not correlate with the leaderboard.

import endgame as eg

av = eg.validation.AdversarialValidator(threshold=0.6)
result = av.check_drift(X_train, X_test)

print(f"Drift AUC: {result.auc_score:.3f}")
print(f"Severity:  {result.drift_severity}")   # 'none', 'mild', or 'severe'
print(f"Top drifting features: {result.drifted_features[:5]}")

if result.drift_severity == "severe":
    # Drop the most drifting features
    drop_cols = result.drifted_features[:5]
    print(f"Consider dropping: {drop_cols}")

The default classifier is LightGBM when available, RandomForest otherwise. You can supply any sklearn-compatible classifier via the estimator parameter.

See validation guide for PurgedTimeSeriesSplit, StratifiedGroupKFold, and CombinatorialPurgedKFold.


Interactive Visualization

All Endgame visualizations produce self-contained HTML files with no external CDN dependencies. Open them in any browser.

Decision Tree

from sklearn.tree import DecisionTreeClassifier
from endgame.visualization import TreeVisualizer

clf = DecisionTreeClassifier(max_depth=4).fit(X_train, y_train)
viz = TreeVisualizer(
    clf,
    feature_names=feature_names,
    class_names=class_names,
    title="Breast Cancer Decision Tree",
)
viz.save("tree.html")   # click nodes to expand/collapse

ROC Curve

from endgame.visualization import ROCCurveVisualizer

roc = ROCCurveVisualizer.from_predictions(y_test, proba_lgbm, label="LightGBM")
roc.add_predictions(y_test, proba_ft, label="FT-Transformer")
roc.save("roc_curves.html")

Classification Report

The ClassificationReport bundles confusion matrix, ROC/PR curves, calibration plot, and lift chart into a single interactive HTML page.

from endgame.visualization import ClassificationReport

report = ClassificationReport(
    y_true=y_test,
    y_proba=proba_lgbm,
    feature_names=feature_names,
    class_names=class_names,
)
report.save("classification_report.html")

See the visualization guide for the complete chart catalogue (42 chart types including PDP, waterfall / SHAP, parallel coordinates, and calibration plots).


Next Steps

Topic

Guide

Preprocessing (encoding, feature engineering, balancing)

preprocessing guide

Full model catalogue (100+ estimators)

models guide

Hyperparameter tuning with Optuna

tune API

SHAP, LIME, counterfactuals

explain API

Fairness metrics and mitigation

fairness API

Anomaly detection

anomaly API

Time series forecasting and classification

timeseries API

Signal processing

signal API

Visualization catalogue

visualization guide

MCP server (AI assistant integration)

MCP server guide