# Quickstart This guide walks through the core Endgame workflow from data loading to model evaluation. After completing it you will have seen the main API entry points and know where to look for deeper documentation on each topic. ## Import Convention ```python import endgame as eg ``` Heavy sub-modules (models, vision, nlp, audio, benchmark, kaggle, quick) are lazy-loaded, so the import is fast even though the library is large. --- ## End-to-End Example The example below uses scikit-learn's breast cancer dataset and covers the complete workflow: split, train, evaluate, and visualize. ```python import endgame as eg from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score # Load data X, y = load_breast_cancer(return_X_y=True) feature_names = load_breast_cancer().feature_names.tolist() class_names = load_breast_cancer().target_names.tolist() # Split: train / calibration / test X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=42 ) # Train with competition-winning defaults model = eg.models.LGBMWrapper(preset="endgame") model.fit(X_train, y_train) # Evaluate proba = model.predict_proba(X_test)[:, 1] auc = roc_auc_score(y_test, proba) print(f"Test ROC-AUC: {auc:.4f}") # Visualize feature importances from endgame.visualization import BarChartVisualizer bar = BarChartVisualizer.from_importances(model, feature_names=feature_names) bar.save("feature_importances.html") ``` All Endgame estimators implement the standard scikit-learn interface: `fit`, `predict`, `predict_proba` (classifiers), and `transform` (transformers). They drop into any sklearn `Pipeline` without modification. --- ## Model Families ### GBDTs `LGBMWrapper`, `XGBWrapper`, and `CatBoostWrapper` expose a unified interface with competition-tuned hyperparameter presets. ```python import endgame as eg # LightGBM with competition defaults lgbm = eg.models.LGBMWrapper(preset="endgame") lgbm.fit(X_train, y_train) proba_lgbm = lgbm.predict_proba(X_test) # XGBoost xgb = eg.models.XGBWrapper(preset="endgame") xgb.fit(X_train, y_train) # CatBoost (handles categoricals natively) cat = eg.models.CatBoostWrapper(preset="endgame") cat.fit(X_train, y_train) ``` Available presets: `"fast"`, `"endgame"` (competition defaults). See [models guide](models.md) for the full list of supported parameters. ### Deep Tabular Models Deep tabular models are imported directly from their submodule to keep the top-level namespace lean. ```python from endgame.models.tabular import FTTransformerClassifier ft = FTTransformerClassifier( n_blocks=3, d_token=192, n_heads=8, n_epochs=100, ) ft.fit(X_train, y_train) proba_ft = ft.predict_proba(X_test) ``` Other deep tabular classifiers follow the same import pattern: ```python from endgame.models.tabular import SAINTClassifier, NODEClassifier, GANDALFClassifier ``` ### Interpretable Models Endgame first-class supports interpretable models that match or approach GBDT accuracy while remaining explainable. ```python from endgame.models import EBMClassifier ebm = EBMClassifier() ebm.fit(X_train, y_train) proba_ebm = ebm.predict_proba(X_test) # EBM exposes per-feature contributions print(ebm.feature_importances_) ``` Other interpretable options: ```python from endgame.models.rules import RuleFitClassifier from endgame.models.symbolic import SymbolicRegressor # PySR backend from endgame.models.baselines import LinearClassifier ``` --- ## Quick API The Quick API provides one-line training and multi-model comparison. It runs stratified cross-validation internally and returns out-of-fold predictions alongside a fitted model. ### Single Model ```python import endgame as eg result = eg.quick.classify(X, y, preset="default", metric="roc_auc") print(result) # QuickResult(cv_score=0.9912, metric='roc_auc') # Access the fitted model and OOF predictions model = result.model oof_preds = result.oof_predictions importances = result.feature_importances ``` Available presets: | Preset | Models included | CV folds | Typical runtime | |---|---|---|---| | `"fast"` | LightGBM, Linear | 3 | ~1 min | | `"default"` | LightGBM, XGBoost, CatBoost, Linear | 5 | ~5 min | | `"competition"` | GBDT trio + KNN + ELM + RotationForest | 5 | ~30 min | | `"interpretable"` | Linear, EBM, NAM | 5 | ~5 min | ### Model Comparison ```python comparison = eg.quick.compare(X, y, task="classification", preset="default") print(comparison) # ComparisonResult: # 1. lgbm: 0.9934 # 2. xgb: 0.9921 # 3. catboost: 0.9908 # 4. linear: 0.9743 best_model = comparison.best_model leaderboard = comparison.leaderboard # List[Dict[str, Any]] ``` See [quick API reference](../api/quick.rst) for regression support (`eg.quick.regress`) and additional options. --- ## Ensemble Methods Once you have several models trained, combine them with the ensemble module. ### Super Learner The Super Learner finds the optimal convex combination of base models using cross-validated out-of-fold predictions (NNLS by default). It is asymptotically at least as good as the single best base learner. ```python from endgame.ensemble import SuperLearner from endgame.models import EBMClassifier from endgame.models.baselines import LinearClassifier sl = SuperLearner( base_estimators=[ ("lgbm", eg.models.LGBMWrapper(preset="endgame")), ("ebm", EBMClassifier()), ("lr", LinearClassifier()), ], meta_learner="nnls", # non-negative least squares cv=5, ) sl.fit(X_train, y_train) proba_sl = sl.predict_proba(X_test) ``` ### Hill Climbing Ensemble Forward-selection ensemble that greedily adds models while a metric improves. ```python from endgame.ensemble import HillClimbingEnsemble hc = HillClimbingEnsemble(metric="roc_auc", n_iterations=20) hc.fit(oof_preds_list, y_train) # list of OOF prediction arrays final_proba = hc.predict(test_preds_list) ``` See [ensemble guide](../api/ensemble.rst) for `StackingEnsemble`, `BlendingEnsemble`, `RankAverageBlender`, and `ThresholdOptimizer`. --- ## Conformal Prediction Conformal prediction wraps any classifier and produces prediction *sets* with a guaranteed coverage probability. At `alpha=0.1`, the true label is contained in the returned set for at least 90% of new examples (under exchangeability). ```python from endgame.calibration import ConformalClassifier import endgame as eg # Split off a calibration set (separate from test) X_tr, X_temp, y_tr, y_temp = train_test_split(X_train, y_train, test_size=0.3) X_cal, X_val, y_cal, y_val = train_test_split(X_temp, y_temp, test_size=0.5) base_model = eg.models.LGBMWrapper(preset="endgame") cp = ConformalClassifier(base_model, method="aps", alpha=0.1) cp.fit(X_tr, y_tr, X_cal, y_cal) # calibrate on held-out calibration set # Returns a list of sets — each set contains the plausible class labels prediction_sets = cp.predict(X_val) print(prediction_sets[:5]) # [{0}, {1}, {0, 1}, {1}, {0}] # Verify empirical coverage coverage = cp.coverage_score(X_val, y_val) print(f"Empirical coverage: {coverage:.3f}") # >= 0.90 ``` Passing `X_cal=None` causes `ConformalClassifier` to automatically split a `cal_size` fraction from the training data for calibration. See [calibration guide](../api/calibration.rst) for `ConformalRegressor`, `ConformizedQuantileRegressor`, and `VennABERS`. --- ## Adversarial Validation Adversarial validation detects distribution shift between train and test data before you ever submit a model. A high AUC means a classifier can easily distinguish the two splits — indicating that your CV will not correlate with the leaderboard. ```python import endgame as eg av = eg.validation.AdversarialValidator(threshold=0.6) result = av.check_drift(X_train, X_test) print(f"Drift AUC: {result.auc_score:.3f}") print(f"Severity: {result.drift_severity}") # 'none', 'mild', or 'severe' print(f"Top drifting features: {result.drifted_features[:5]}") if result.drift_severity == "severe": # Drop the most drifting features drop_cols = result.drifted_features[:5] print(f"Consider dropping: {drop_cols}") ``` The default classifier is LightGBM when available, RandomForest otherwise. You can supply any sklearn-compatible classifier via the `estimator` parameter. See [validation guide](../api/validation.rst) for `PurgedTimeSeriesSplit`, `StratifiedGroupKFold`, and `CombinatorialPurgedKFold`. --- ## Interactive Visualization All Endgame visualizations produce self-contained HTML files with no external CDN dependencies. Open them in any browser. ### Decision Tree ```python from sklearn.tree import DecisionTreeClassifier from endgame.visualization import TreeVisualizer clf = DecisionTreeClassifier(max_depth=4).fit(X_train, y_train) viz = TreeVisualizer( clf, feature_names=feature_names, class_names=class_names, title="Breast Cancer Decision Tree", ) viz.save("tree.html") # click nodes to expand/collapse ``` ### ROC Curve ```python from endgame.visualization import ROCCurveVisualizer roc = ROCCurveVisualizer.from_predictions(y_test, proba_lgbm, label="LightGBM") roc.add_predictions(y_test, proba_ft, label="FT-Transformer") roc.save("roc_curves.html") ``` ### Classification Report The `ClassificationReport` bundles confusion matrix, ROC/PR curves, calibration plot, and lift chart into a single interactive HTML page. ```python from endgame.visualization import ClassificationReport report = ClassificationReport( y_true=y_test, y_proba=proba_lgbm, feature_names=feature_names, class_names=class_names, ) report.save("classification_report.html") ``` See the [visualization guide](visualization.md) for the complete chart catalogue (42 chart types including PDP, waterfall / SHAP, parallel coordinates, and calibration plots). --- ## Next Steps | Topic | Guide | |---|---| | Preprocessing (encoding, feature engineering, balancing) | [preprocessing guide](preprocessing.md) | | Full model catalogue (100+ estimators) | [models guide](models.md) | | Hyperparameter tuning with Optuna | [tune API](../api/tune) | | SHAP, LIME, counterfactuals | [explain API](../api/explain) | | Fairness metrics and mitigation | [fairness API](../api/fairness) | | Anomaly detection | [anomaly API](../api/anomaly) | | Time series forecasting and classification | [timeseries API](../api/timeseries) | | Signal processing | [signal API](../api/signal) | | Visualization catalogue | [visualization guide](visualization.md) | | MCP server (AI assistant integration) | [MCP server guide](mcp_server.md) |