Models Guide¶
Endgame provides 100+ estimators organized into families. All models follow the
scikit-learn interface: fit, predict, and predict_proba (classifiers) or
transform (transformers). Every estimator is pipeline-compatible and accepts
sample_weight where applicable.
Model Family Overview¶
Family |
Key Classes |
Best For |
|---|---|---|
GBDTs |
|
General tabular, competitions |
Deep Tabular |
|
Large datasets, categorical embeddings |
Custom Trees |
|
Structured data, diverse ensembles |
Rules |
|
Interpretable rule extraction |
Bayesian |
|
Probabilistic, small data |
Kernel |
|
Small to medium datasets |
Interpretable |
|
Regulatory compliance, auditability |
Neural |
|
Custom architectures, entity embeddings |
Probabilistic |
|
Uncertainty quantification |
Baselines |
|
Benchmarking, ensemble diversity |
Preset System¶
The preset parameter loads competition-winning hyperparameter configurations.
Three presets are available across all GBDT wrappers:
'endgame'— competition-tuned defaults (low learning rate, many trees, early stopping). This is the default.'fast'— higher learning rate, fewer trees. Useful for rapid iteration.'overfit'— aggressively deep trees, no regularization. Use only for ensembling experiments.'custom'— no preset applied; pass all hyperparameters explicitly.
from endgame.models import LGBMWrapper
# Competition-ready defaults
model = LGBMWrapper(preset='endgame')
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
# Fast iteration during feature engineering
quick_model = LGBMWrapper(preset='fast')
quick_model.fit(X_train, y_train)
# Override specific parameters within a preset
model = LGBMWrapper(preset='endgame', num_leaves=63, min_child_samples=50)
The 'endgame' preset sets learning_rate=0.01, n_estimators=10000, and
relies on early stopping to find the optimal number of rounds. Always pass a
validation set when using this preset.
GBDTs¶
Gradient boosted decision trees are the default choice for tabular competitions.
All three wrappers share the same interface via GBDTWrapper.
from endgame.models import LGBMWrapper, XGBWrapper, CatBoostWrapper
# LightGBM — fastest training, best default performance
lgbm = LGBMWrapper(preset='endgame')
lgbm.fit(X_train, y_train, eval_set=[(X_val, y_val)])
proba = lgbm.predict_proba(X_test)
# XGBoost — strong GPU support, wider ecosystem integration
xgb = XGBWrapper(preset='endgame', use_gpu=True)
xgb.fit(X_train, y_train)
# CatBoost — native categorical feature handling, often best out of the box
catboost = CatBoostWrapper(preset='endgame', categorical_features=['city', 'product'])
catboost.fit(X_train, y_train)
Feature importances are available via model.feature_importances_ after fitting.
Deep Tabular Models¶
Deep learning models for tabular data. These require PyTorch and are imported
from endgame.models.tabular. They tend to shine on datasets with many
categorical features or when pre-trained representations are available.
FT-Transformer¶
Feature Tokenizer + Transformer. Strong general-purpose deep tabular model.
from endgame.models.tabular import FTTransformerClassifier
ft = FTTransformerClassifier(
d_token=192,
n_blocks=3,
attention_dropout=0.2,
n_epochs=100,
batch_size=512,
)
ft.fit(X_train, y_train)
proba = ft.predict_proba(X_test)
SAINT¶
Self-Attention and Intersample Attention Transformer. Captures both feature-level and sample-level interactions.
from endgame.models.tabular import SAINTClassifier
saint = SAINTClassifier(depth=6, heads=8, n_epochs=50)
saint.fit(X_train, y_train)
NODE¶
Neural Oblivious Decision Ensembles. Differentiable tree structure — fast and competitive with GBDTs on structured data.
from endgame.models.tabular import NODEClassifier
node = NODEClassifier(num_trees=2048, tree_depth=6, n_epochs=50)
node.fit(X_train, y_train)
NAM¶
Neural Additive Models. Each feature is modeled by an independent neural network, enabling per-feature shape functions with neural expressiveness.
from endgame.models.tabular import NAMClassifier
nam = NAMClassifier(hidden_units=[64, 64], n_epochs=100)
nam.fit(X_train, y_train)
# Access per-feature shape functions
contributions = nam.feature_contributions(X_test)
GANDALF¶
Gated Adaptive Network for Deep Automated Learning of Features. Requires the
pytorch-tabular package and should be imported directly.
from endgame.models.tabular.gandalf import GANDALFClassifier
gandalf = GANDALFClassifier(gflu_stages=6, n_epochs=100)
gandalf.fit(X_train, y_train)
TabularResNet¶
Residual network architecture adapted for tabular data. Straightforward and reliable with normalization and skip connections.
from endgame.models.tabular import TabularResNetClassifier
resnet = TabularResNetClassifier(
hidden_dim=256,
n_layers=4,
dropout=0.1,
n_epochs=100,
)
resnet.fit(X_train, y_train)
Custom Trees¶
Rotation Forest¶
Applies PCA rotations to random feature subsets before building decision trees. Increases diversity substantially over standard random forests.
from endgame.models import RotationForestClassifier
rf = RotationForestClassifier(n_estimators=100, n_features_per_subset=3)
rf.fit(X_train, y_train)
C5.0¶
The classic C5.0 decision tree algorithm. Includes rule extraction, pruning, and boosting.
from endgame.models import C50Classifier
c50 = C50Classifier(n_trials=10, pruning=True)
c50.fit(X_train, y_train)
rules = c50.get_rules() # Human-readable rule set
Oblique Random Forest¶
Uses linear combinations of features at each split, rather than axis-aligned splits. Captures diagonal decision boundaries.
from endgame.models import ObliqueRandomForestClassifier
orf = ObliqueRandomForestClassifier(n_estimators=100, max_depth=10)
orf.fit(X_train, y_train)
Quantile Regressor Forest¶
Provides prediction intervals via quantile regression. Each leaf stores the full empirical distribution of training targets.
from endgame.models import QuantileRegressorForest
qrf = QuantileRegressorForest(n_estimators=200)
qrf.fit(X_train, y_train)
lower, median, upper = qrf.predict_quantiles(X_test, quantiles=[0.1, 0.5, 0.9])
Evolutionary Tree¶
Optimizes tree structure via evolutionary algorithms rather than greedy splitting. Finds globally better splits at the cost of training time.
from endgame.models.trees.evtree import EvolutionaryTreeClassifier
evt = EvolutionaryTreeClassifier(population_size=100, n_generations=50)
evt.fit(X_train, y_train)
Rule-Based Models¶
RuleFit¶
Extracts linear rules from an ensemble of trees, then fits a sparse linear model over those rules. The result is a human-readable list of weighted conditions.
from endgame.models import RuleFitClassifier
rulefit = RuleFitClassifier(tree_size=4, max_rules=2000)
rulefit.fit(X_train, y_train)
rules_df = rulefit.get_rules()
print(rules_df[rules_df['importance'] > 0.01])
FURIA¶
Fuzzy Unordered Rule Induction Algorithm. Produces fuzzy rule sets that handle overlapping class regions gracefully.
from endgame.models import FURIAClassifier
furia = FURIAClassifier(n_rules=20)
furia.fit(X_train, y_train)
rule_list = furia.rules_ # List of FuzzyRule objects
Bayesian Network Classifiers¶
Bayesian classifiers are well-suited for small datasets where probabilistic structure is meaningful and calibrated probabilities are important.
TAN (Tree Augmented Naive Bayes)¶
Extends Naive Bayes by allowing each feature to have one additional parent (a single dependency tree over features).
from endgame.models import TANClassifier
tan = TANClassifier()
tan.fit(X_train, y_train)
proba = tan.predict_proba(X_test)
KDB (k-Dependence Bayesian)¶
Generalizes TAN by allowing each feature to depend on up to k other features.
Higher k captures more complex dependencies at the cost of data requirements.
from endgame.models import KDBClassifier
kdb = KDBClassifier(k=2)
kdb.fit(X_train, y_train)
ESKDB (Ensemble Smoothed KDB)¶
Ensemble of KDB classifiers with Laplace smoothing and random structure perturbation for improved accuracy.
from endgame.models import ESKDBClassifier
eskdb = ESKDBClassifier(k=2, n_estimators=50)
eskdb.fit(X_train, y_train)
Kernel Methods¶
Gaussian Process Classifier¶
Provides well-calibrated probabilistic predictions with uncertainty estimates. Exact GP scales as O(n^3), so use on datasets below ~5,000 samples.
from endgame.models import GPClassifier
from sklearn.gaussian_process.kernels import RBF, Matern
gp = GPClassifier(kernel=Matern(nu=2.5), n_restarts_optimizer=5)
gp.fit(X_train, y_train)
proba = gp.predict_proba(X_test) # Well-calibrated probabilities
SVM Classifier¶
Support Vector Machine with kernel selection. Competitive on medium-sized datasets with fewer than ~50,000 samples.
from endgame.models import SVMClassifier
svm = SVMClassifier(kernel='rbf', C=10.0, probability=True)
svm.fit(X_train, y_train)
Interpretable Models¶
These models are suitable for regulated industries where predictions must be auditable or explained to non-technical stakeholders.
EBM (Explainable Boosting Machine)¶
EBMs are generalized additive models trained with gradient boosting. They achieve near-GBDT accuracy while remaining fully interpretable via shape functions for each feature and pairwise interaction.
from endgame.models import EBMClassifier
ebm = EBMClassifier(interactions=15, max_bins=256)
ebm.fit(X_train, y_train)
# Inspect global explanation
ebm.explain_global()
# Local explanation for a single prediction
ebm.explain_local(X_test[:5])
EBMs support both classification and regression via EBMRegressor.
MARS (Multivariate Adaptive Regression Splines)¶
Fits piecewise linear splines with automatic knot selection. Produces explicit mathematical expressions for each prediction.
from endgame.models import MARSClassifier
mars = MARSClassifier(max_degree=2, max_terms=20)
mars.fit(X_train, y_train)
print(mars.summary()) # Equation with hinge functions
Symbolic Regression¶
Discovers explicit mathematical formulas via genetic programming. Best for scientific applications where the functional form matters.
from endgame.models import SymbolicRegressor
sr = SymbolicRegressor(
population_size=1000,
generations=20,
function_set=['add', 'mul', 'sqrt', 'log'],
)
sr.fit(X_train, y_train)
print(sr.best_program_) # e.g., "0.42 * x1 + sqrt(x2) - 1.7"
Neural Models¶
ELM (Extreme Learning Machine)¶
Single hidden-layer network where input weights are randomly assigned and only the output layer is trained. Extremely fast, useful as a cheap ensemble member.
from endgame.models import ELMClassifier
elm = ELMClassifier(n_hidden=1000, activation='relu')
elm.fit(X_train, y_train)
Embedding MLP¶
MLP with learned entity embeddings for categorical features. Effective when categorical cardinality is high (cities, products, user IDs).
from endgame.models.neural import EmbeddingMLPClassifier
mlp = EmbeddingMLPClassifier(
cat_features=['city', 'product'],
hidden_layers=[256, 128, 64],
dropout=0.3,
n_epochs=100,
)
mlp.fit(X_train, y_train)
TabNet¶
Attention-based neural network using sequential attention to select features at each decision step. Provides built-in feature importance.
from endgame.models.neural import TabNetClassifier
tabnet = TabNetClassifier(n_steps=5, gamma=1.5, n_epochs=100)
tabnet.fit(X_train, y_train)
importances = tabnet.feature_importances_
Probabilistic Models¶
NGBoost¶
Natural Gradient Boosting outputs full probability distributions rather than point estimates. Use when calibrated uncertainty is required.
from endgame.models import NGBoostClassifier
ngb = NGBoostClassifier(n_estimators=500, learning_rate=0.01)
ngb.fit(X_train, y_train)
# Returns probability distributions, not just point estimates
distributions = ngb.pred_dist(X_test)
proba = ngb.predict_proba(X_test)
BART (Bayesian Additive Regression Trees)¶
Fully Bayesian nonparametric model providing posterior distributions over
predictions. Requires pymc and pymc-bart.
from endgame.models import BARTClassifier
bart = BARTClassifier(m=50, n_samples=1000, tune=500)
bart.fit(X_train, y_train)
proba = bart.predict_proba(X_test)
credible_intervals = bart.predict_interval(X_test, hdi_prob=0.94)
Foundation Models¶
TabPFN¶
TabPFN is a prior-fitted network trained on millions of synthetic tabular datasets. It performs in-context learning — no gradient-based training is needed at inference time.
from endgame.models.tabular import TabPFNClassifier
# No training loop — model uses the dataset as context directly
tabpfn = TabPFNClassifier(n_ensemble_configurations=32)
tabpfn.fit(X_train, y_train) # Stores context, no gradient updates
proba = tabpfn.predict_proba(X_test)
TabPFN works best on datasets with fewer than 10,000 samples and fewer than 100 features. For larger datasets, use TabPFNv2 or TabPFN25:
from endgame.models.tabular import TabPFNv2Classifier, TabPFN25Classifier
# v2 — extended context window, improved accuracy
tabpfn_v2 = TabPFNv2Classifier()
tabpfn_v2.fit(X_train, y_train)
Because TabPFN has large optional dependencies, import it directly from the
submodule rather than from endgame.models.
Baseline Models¶
Lightweight models useful for ensemble diversity and benchmarking.
from endgame.models import (
NaiveBayesClassifier,
LDAClassifier,
QDAClassifier,
RDAClassifier,
KNNClassifier,
LinearClassifier,
)
# Linear discriminant analysis — fast, good baseline for linearly separable data
lda = LDAClassifier(solver='svd')
lda.fit(X_train, y_train)
# Regularized discriminant analysis — blend of LDA and QDA
rda = RDAClassifier(alpha=0.5)
rda.fit(X_train, y_train)
# KNN — strong baseline, no training required
knn = KNNClassifier(n_neighbors=15, weights='distance')
knn.fit(X_train, y_train)
Model Selection Guidance¶
Use the following heuristics as a starting point:
Situation |
Recommended approach |
|---|---|
Small dataset (< 1,000 samples) |
|
Medium dataset (1K–100K samples) |
|
Large dataset (> 100K samples) |
|
High-cardinality categoricals |
|
Interpretability required |
|
Regulatory compliance |
|
Calibrated uncertainty |
|
No training time budget |
|
Ensembling diversity |
Mix families: GBDT + rotation forest + ELM + KNN |
Time series classification |
See |
A practical workflow for competitions:
Start with
LGBMWrapper(preset='endgame')as your baseline.Run
eg.benchmarkoreg.quick.compare()to survey model families.Build a diverse set of out-of-fold predictions from multiple families.
Use
eg.ensemble.HillClimbingEnsembleoreg.ensemble.StackingEnsembleto combine them.Calibrate probabilities with
eg.calibrationif log-loss is the metric.
See Also¶
API Reference: models — complete parameter documentation
Ensemble Guide — combining multiple models
Calibration Guide — probability calibration and conformal prediction
Explainability Guide — SHAP, LIME, and partial dependence
Tuning Guide — Optuna integration with preset search spaces