# Models Guide Endgame provides 100+ estimators organized into families. All models follow the scikit-learn interface: `fit`, `predict`, and `predict_proba` (classifiers) or `transform` (transformers). Every estimator is pipeline-compatible and accepts `sample_weight` where applicable. ## Model Family Overview | Family | Key Classes | Best For | |--------|-------------|----------| | **GBDTs** | `LGBMWrapper`, `XGBWrapper`, `CatBoostWrapper` | General tabular, competitions | | **Deep Tabular** | `FTTransformerClassifier`, `SAINTClassifier`, `NODEClassifier`, `TabPFNClassifier`, `NAMClassifier`, `GANDALFClassifier`, `TabularResNetClassifier` | Large datasets, categorical embeddings | | **Custom Trees** | `RotationForestClassifier`, `C50Classifier`, `ObliqueRandomForestClassifier`, `QuantileRegressorForest`, `EvolutionaryTreeClassifier` | Structured data, diverse ensembles | | **Rules** | `RuleFitClassifier`, `FURIAClassifier` | Interpretable rule extraction | | **Bayesian** | `TANClassifier`, `KDBClassifier`, `ESKDBClassifier` | Probabilistic, small data | | **Kernel** | `GPClassifier`, `SVMClassifier` | Small to medium datasets | | **Interpretable** | `EBMClassifier`, `MARSClassifier`, `SymbolicRegressor` | Regulatory compliance, auditability | | **Neural** | `ELMClassifier`, `EmbeddingMLPClassifier`, `TabNetClassifier` | Custom architectures, entity embeddings | | **Probabilistic** | `NGBoostClassifier`, `BARTClassifier` | Uncertainty quantification | | **Baselines** | `NaiveBayesClassifier`, `LDAClassifier`, `QDAClassifier`, `RDAClassifier`, `KNNClassifier`, `LinearClassifier` | Benchmarking, ensemble diversity | --- ## Preset System The `preset` parameter loads competition-winning hyperparameter configurations. Three presets are available across all GBDT wrappers: - `'endgame'` — competition-tuned defaults (low learning rate, many trees, early stopping). This is the default. - `'fast'` — higher learning rate, fewer trees. Useful for rapid iteration. - `'overfit'` — aggressively deep trees, no regularization. Use only for ensembling experiments. - `'custom'` — no preset applied; pass all hyperparameters explicitly. ```python from endgame.models import LGBMWrapper # Competition-ready defaults model = LGBMWrapper(preset='endgame') model.fit(X_train, y_train, eval_set=[(X_val, y_val)]) # Fast iteration during feature engineering quick_model = LGBMWrapper(preset='fast') quick_model.fit(X_train, y_train) # Override specific parameters within a preset model = LGBMWrapper(preset='endgame', num_leaves=63, min_child_samples=50) ``` The `'endgame'` preset sets `learning_rate=0.01`, `n_estimators=10000`, and relies on early stopping to find the optimal number of rounds. Always pass a validation set when using this preset. --- ## GBDTs Gradient boosted decision trees are the default choice for tabular competitions. All three wrappers share the same interface via `GBDTWrapper`. ```python from endgame.models import LGBMWrapper, XGBWrapper, CatBoostWrapper # LightGBM — fastest training, best default performance lgbm = LGBMWrapper(preset='endgame') lgbm.fit(X_train, y_train, eval_set=[(X_val, y_val)]) proba = lgbm.predict_proba(X_test) # XGBoost — strong GPU support, wider ecosystem integration xgb = XGBWrapper(preset='endgame', use_gpu=True) xgb.fit(X_train, y_train) # CatBoost — native categorical feature handling, often best out of the box catboost = CatBoostWrapper(preset='endgame', categorical_features=['city', 'product']) catboost.fit(X_train, y_train) ``` Feature importances are available via `model.feature_importances_` after fitting. --- ## Deep Tabular Models Deep learning models for tabular data. These require PyTorch and are imported from `endgame.models.tabular`. They tend to shine on datasets with many categorical features or when pre-trained representations are available. ### FT-Transformer Feature Tokenizer + Transformer. Strong general-purpose deep tabular model. ```python from endgame.models.tabular import FTTransformerClassifier ft = FTTransformerClassifier( d_token=192, n_blocks=3, attention_dropout=0.2, n_epochs=100, batch_size=512, ) ft.fit(X_train, y_train) proba = ft.predict_proba(X_test) ``` ### SAINT Self-Attention and Intersample Attention Transformer. Captures both feature-level and sample-level interactions. ```python from endgame.models.tabular import SAINTClassifier saint = SAINTClassifier(depth=6, heads=8, n_epochs=50) saint.fit(X_train, y_train) ``` ### NODE Neural Oblivious Decision Ensembles. Differentiable tree structure — fast and competitive with GBDTs on structured data. ```python from endgame.models.tabular import NODEClassifier node = NODEClassifier(num_trees=2048, tree_depth=6, n_epochs=50) node.fit(X_train, y_train) ``` ### NAM Neural Additive Models. Each feature is modeled by an independent neural network, enabling per-feature shape functions with neural expressiveness. ```python from endgame.models.tabular import NAMClassifier nam = NAMClassifier(hidden_units=[64, 64], n_epochs=100) nam.fit(X_train, y_train) # Access per-feature shape functions contributions = nam.feature_contributions(X_test) ``` ### GANDALF Gated Adaptive Network for Deep Automated Learning of Features. Requires the `pytorch-tabular` package and should be imported directly. ```python from endgame.models.tabular.gandalf import GANDALFClassifier gandalf = GANDALFClassifier(gflu_stages=6, n_epochs=100) gandalf.fit(X_train, y_train) ``` ### TabularResNet Residual network architecture adapted for tabular data. Straightforward and reliable with normalization and skip connections. ```python from endgame.models.tabular import TabularResNetClassifier resnet = TabularResNetClassifier( hidden_dim=256, n_layers=4, dropout=0.1, n_epochs=100, ) resnet.fit(X_train, y_train) ``` --- ## Custom Trees ### Rotation Forest Applies PCA rotations to random feature subsets before building decision trees. Increases diversity substantially over standard random forests. ```python from endgame.models import RotationForestClassifier rf = RotationForestClassifier(n_estimators=100, n_features_per_subset=3) rf.fit(X_train, y_train) ``` ### C5.0 The classic C5.0 decision tree algorithm. Includes rule extraction, pruning, and boosting. ```python from endgame.models import C50Classifier c50 = C50Classifier(n_trials=10, pruning=True) c50.fit(X_train, y_train) rules = c50.get_rules() # Human-readable rule set ``` ### Oblique Random Forest Uses linear combinations of features at each split, rather than axis-aligned splits. Captures diagonal decision boundaries. ```python from endgame.models import ObliqueRandomForestClassifier orf = ObliqueRandomForestClassifier(n_estimators=100, max_depth=10) orf.fit(X_train, y_train) ``` ### Quantile Regressor Forest Provides prediction intervals via quantile regression. Each leaf stores the full empirical distribution of training targets. ```python from endgame.models import QuantileRegressorForest qrf = QuantileRegressorForest(n_estimators=200) qrf.fit(X_train, y_train) lower, median, upper = qrf.predict_quantiles(X_test, quantiles=[0.1, 0.5, 0.9]) ``` ### Evolutionary Tree Optimizes tree structure via evolutionary algorithms rather than greedy splitting. Finds globally better splits at the cost of training time. ```python from endgame.models.trees.evtree import EvolutionaryTreeClassifier evt = EvolutionaryTreeClassifier(population_size=100, n_generations=50) evt.fit(X_train, y_train) ``` --- ## Rule-Based Models ### RuleFit Extracts linear rules from an ensemble of trees, then fits a sparse linear model over those rules. The result is a human-readable list of weighted conditions. ```python from endgame.models import RuleFitClassifier rulefit = RuleFitClassifier(tree_size=4, max_rules=2000) rulefit.fit(X_train, y_train) rules_df = rulefit.get_rules() print(rules_df[rules_df['importance'] > 0.01]) ``` ### FURIA Fuzzy Unordered Rule Induction Algorithm. Produces fuzzy rule sets that handle overlapping class regions gracefully. ```python from endgame.models import FURIAClassifier furia = FURIAClassifier(n_rules=20) furia.fit(X_train, y_train) rule_list = furia.rules_ # List of FuzzyRule objects ``` --- ## Bayesian Network Classifiers Bayesian classifiers are well-suited for small datasets where probabilistic structure is meaningful and calibrated probabilities are important. ### TAN (Tree Augmented Naive Bayes) Extends Naive Bayes by allowing each feature to have one additional parent (a single dependency tree over features). ```python from endgame.models import TANClassifier tan = TANClassifier() tan.fit(X_train, y_train) proba = tan.predict_proba(X_test) ``` ### KDB (k-Dependence Bayesian) Generalizes TAN by allowing each feature to depend on up to `k` other features. Higher `k` captures more complex dependencies at the cost of data requirements. ```python from endgame.models import KDBClassifier kdb = KDBClassifier(k=2) kdb.fit(X_train, y_train) ``` ### ESKDB (Ensemble Smoothed KDB) Ensemble of KDB classifiers with Laplace smoothing and random structure perturbation for improved accuracy. ```python from endgame.models import ESKDBClassifier eskdb = ESKDBClassifier(k=2, n_estimators=50) eskdb.fit(X_train, y_train) ``` --- ## Kernel Methods ### Gaussian Process Classifier Provides well-calibrated probabilistic predictions with uncertainty estimates. Exact GP scales as O(n^3), so use on datasets below ~5,000 samples. ```python from endgame.models import GPClassifier from sklearn.gaussian_process.kernels import RBF, Matern gp = GPClassifier(kernel=Matern(nu=2.5), n_restarts_optimizer=5) gp.fit(X_train, y_train) proba = gp.predict_proba(X_test) # Well-calibrated probabilities ``` ### SVM Classifier Support Vector Machine with kernel selection. Competitive on medium-sized datasets with fewer than ~50,000 samples. ```python from endgame.models import SVMClassifier svm = SVMClassifier(kernel='rbf', C=10.0, probability=True) svm.fit(X_train, y_train) ``` --- ## Interpretable Models These models are suitable for regulated industries where predictions must be auditable or explained to non-technical stakeholders. ### EBM (Explainable Boosting Machine) EBMs are generalized additive models trained with gradient boosting. They achieve near-GBDT accuracy while remaining fully interpretable via shape functions for each feature and pairwise interaction. ```python from endgame.models import EBMClassifier ebm = EBMClassifier(interactions=15, max_bins=256) ebm.fit(X_train, y_train) # Inspect global explanation ebm.explain_global() # Local explanation for a single prediction ebm.explain_local(X_test[:5]) ``` EBMs support both classification and regression via `EBMRegressor`. ### MARS (Multivariate Adaptive Regression Splines) Fits piecewise linear splines with automatic knot selection. Produces explicit mathematical expressions for each prediction. ```python from endgame.models import MARSClassifier mars = MARSClassifier(max_degree=2, max_terms=20) mars.fit(X_train, y_train) print(mars.summary()) # Equation with hinge functions ``` ### Symbolic Regression Discovers explicit mathematical formulas via genetic programming. Best for scientific applications where the functional form matters. ```python from endgame.models import SymbolicRegressor sr = SymbolicRegressor( population_size=1000, generations=20, function_set=['add', 'mul', 'sqrt', 'log'], ) sr.fit(X_train, y_train) print(sr.best_program_) # e.g., "0.42 * x1 + sqrt(x2) - 1.7" ``` --- ## Neural Models ### ELM (Extreme Learning Machine) Single hidden-layer network where input weights are randomly assigned and only the output layer is trained. Extremely fast, useful as a cheap ensemble member. ```python from endgame.models import ELMClassifier elm = ELMClassifier(n_hidden=1000, activation='relu') elm.fit(X_train, y_train) ``` ### Embedding MLP MLP with learned entity embeddings for categorical features. Effective when categorical cardinality is high (cities, products, user IDs). ```python from endgame.models.neural import EmbeddingMLPClassifier mlp = EmbeddingMLPClassifier( cat_features=['city', 'product'], hidden_layers=[256, 128, 64], dropout=0.3, n_epochs=100, ) mlp.fit(X_train, y_train) ``` ### TabNet Attention-based neural network using sequential attention to select features at each decision step. Provides built-in feature importance. ```python from endgame.models.neural import TabNetClassifier tabnet = TabNetClassifier(n_steps=5, gamma=1.5, n_epochs=100) tabnet.fit(X_train, y_train) importances = tabnet.feature_importances_ ``` --- ## Probabilistic Models ### NGBoost Natural Gradient Boosting outputs full probability distributions rather than point estimates. Use when calibrated uncertainty is required. ```python from endgame.models import NGBoostClassifier ngb = NGBoostClassifier(n_estimators=500, learning_rate=0.01) ngb.fit(X_train, y_train) # Returns probability distributions, not just point estimates distributions = ngb.pred_dist(X_test) proba = ngb.predict_proba(X_test) ``` ### BART (Bayesian Additive Regression Trees) Fully Bayesian nonparametric model providing posterior distributions over predictions. Requires `pymc` and `pymc-bart`. ```python from endgame.models import BARTClassifier bart = BARTClassifier(m=50, n_samples=1000, tune=500) bart.fit(X_train, y_train) proba = bart.predict_proba(X_test) credible_intervals = bart.predict_interval(X_test, hdi_prob=0.94) ``` --- ## Foundation Models ### TabPFN TabPFN is a prior-fitted network trained on millions of synthetic tabular datasets. It performs in-context learning — no gradient-based training is needed at inference time. ```python from endgame.models.tabular import TabPFNClassifier # No training loop — model uses the dataset as context directly tabpfn = TabPFNClassifier(n_ensemble_configurations=32) tabpfn.fit(X_train, y_train) # Stores context, no gradient updates proba = tabpfn.predict_proba(X_test) ``` TabPFN works best on datasets with fewer than 10,000 samples and fewer than 100 features. For larger datasets, use TabPFNv2 or TabPFN25: ```python from endgame.models.tabular import TabPFNv2Classifier, TabPFN25Classifier # v2 — extended context window, improved accuracy tabpfn_v2 = TabPFNv2Classifier() tabpfn_v2.fit(X_train, y_train) ``` Because TabPFN has large optional dependencies, import it directly from the submodule rather than from `endgame.models`. --- ## Baseline Models Lightweight models useful for ensemble diversity and benchmarking. ```python from endgame.models import ( NaiveBayesClassifier, LDAClassifier, QDAClassifier, RDAClassifier, KNNClassifier, LinearClassifier, ) # Linear discriminant analysis — fast, good baseline for linearly separable data lda = LDAClassifier(solver='svd') lda.fit(X_train, y_train) # Regularized discriminant analysis — blend of LDA and QDA rda = RDAClassifier(alpha=0.5) rda.fit(X_train, y_train) # KNN — strong baseline, no training required knn = KNNClassifier(n_neighbors=15, weights='distance') knn.fit(X_train, y_train) ``` --- ## Model Selection Guidance Use the following heuristics as a starting point: | Situation | Recommended approach | |-----------|---------------------| | Small dataset (< 1,000 samples) | `TANClassifier`, `ESKDBClassifier`, `TabPFNClassifier`, `GPClassifier` | | Medium dataset (1K–100K samples) | `LGBMWrapper` or `XGBWrapper` with `preset='endgame'` | | Large dataset (> 100K samples) | `LGBMWrapper`, `FTTransformerClassifier`, `TabularResNetClassifier` | | High-cardinality categoricals | `CatBoostWrapper`, `EmbeddingMLPClassifier`, `SAINTClassifier` | | Interpretability required | `EBMClassifier`, `RuleFitClassifier`, `MARSClassifier` | | Regulatory compliance | `EBMClassifier`, `SymbolicRegressor`, `C50Classifier` (with `get_rules()`) | | Calibrated uncertainty | `NGBoostClassifier`, `BARTClassifier`, `GPClassifier` | | No training time budget | `TabPFNClassifier` (in-context learning), `ELMClassifier` | | Ensembling diversity | Mix families: GBDT + rotation forest + ELM + KNN | | Time series classification | See `eg.timeseries` (`MiniRocketClassifier`, `HydraClassifier`) | A practical workflow for competitions: 1. Start with `LGBMWrapper(preset='endgame')` as your baseline. 2. Run `eg.benchmark` or `eg.quick.compare()` to survey model families. 3. Build a diverse set of out-of-fold predictions from multiple families. 4. Use `eg.ensemble.HillClimbingEnsemble` or `eg.ensemble.StackingEnsemble` to combine them. 5. Calibrate probabilities with `eg.calibration` if log-loss is the metric. --- ## See Also - [API Reference: models](../api/models) — complete parameter documentation - [Ensemble Guide](ensembles.md) — combining multiple models - [Calibration Guide](calibration.md) — probability calibration and conformal prediction - [Explainability Guide](explainability.md) — SHAP, LIME, and partial dependence - [Tuning Guide](automl.md) — Optuna integration with preset search spaces