AutoML¶
- class endgame.automl.AutoMLPredictor(label, problem_type='auto', eval_metric='auto', presets='medium_quality', time_limit=None, search_strategy='portfolio', track_experiments=True, output_path=None, random_state=42, verbosity=2)[source]¶
Bases:
objectUnified AutoML predictor that automatically selects the right domain.
This is the main entry point for AutoML in Endgame. It provides a simple 3-line interface that matches AutoGluon’s simplicity while leveraging Endgame’s full capabilities.
- Parameters:
label (str) – Name of the target column.
problem_type (str, default="auto") – Type of problem: “classification”, “regression”, “multiclass”, or “auto”.
eval_metric (str, default="auto") – Evaluation metric. “auto” selects based on problem type.
presets (str, default="medium_quality") – Quality preset: “best_quality”, “high_quality”, “good_quality”, “medium_quality”, “fast”, “interpretable”.
time_limit (int, optional) – Time limit in seconds. If None, uses preset default.
search_strategy (str, default="portfolio") – Search strategy: “portfolio”, “heuristic”, “genetic”, “random”, “bayesian”.
track_experiments (bool, default=True) – Whether to track experiments to the meta-learning database.
output_path (str, optional) – Path to save outputs (models, logs, etc.).
random_state (int, default=42) – Random seed for reproducibility.
verbosity (int, default=2) – Verbosity level (0=silent, 1=progress, 2=detailed, 3=debug).
- predictor_¶
The underlying domain-specific predictor.
- Type:
Examples
3-line usage (matches AutoGluon):
>>> from endgame.automl import AutoMLPredictor >>> predictor = AutoMLPredictor(label="target").fit("train.csv") >>> predictions = predictor.predict("test.csv")
With more options:
>>> predictor = AutoMLPredictor( ... label="price", ... presets="best_quality", ... time_limit=3600, ... ) >>> predictor.fit(train_df) >>> predictions = predictor.predict(test_df)
Using different presets:
>>> # Fast training for prototyping >>> predictor = AutoMLPredictor(label="target", presets="fast") >>> >>> # High quality for production >>> predictor = AutoMLPredictor(label="target", presets="high_quality") >>> >>> # Best quality for competitions >>> predictor = AutoMLPredictor(label="target", presets="best_quality")
Different search strategies:
>>> # Default portfolio search >>> predictor = AutoMLPredictor(label="target", search_strategy="portfolio") >>> >>> # Genetic algorithm search >>> predictor = AutoMLPredictor(label="target", search_strategy="genetic") >>> >>> # Bayesian optimization >>> predictor = AutoMLPredictor(label="target", search_strategy="bayesian")
- fit(train_data, tuning_data=None, time_limit=None, presets=None, hyperparameters=None, domain=None, **kwargs)[source]¶
Fit the AutoML predictor.
- Parameters:
train_data (str, Path, DataFrame, or ndarray) – Training data. Can be a file path, DataFrame, or array.
tuning_data (optional) – Validation/tuning data. If None, uses internal holdout.
time_limit (int, optional) – Override the time limit.
presets (str, optional) – Override the preset.
hyperparameters (dict, optional) – Override hyperparameters for specific models.
domain (str, optional) – Data domain: “tabular”, “text”, “vision”, “timeseries”, “audio”. If None, auto-detects from data.
**kwargs – Additional arguments passed to the domain-specific predictor.
- Return type:
- Returns:
AutoMLPredictor – The fitted predictor.
- evaluate(data, metrics=None, silent=False)[source]¶
Evaluate the predictor on data.
- Parameters:
- Return type:
- Returns:
dict – Dictionary mapping metric names to scores.
- classmethod load(path)[source]¶
Load a predictor from disk.
- Parameters:
path (str) – Path to load from.
- Return type:
- Returns:
AutoMLPredictor – The loaded predictor.
- property fit_summary_¶
Get the fit summary.
- class endgame.automl.TabularPredictor(label, problem_type='auto', eval_metric='auto', presets='medium_quality', time_limit=None, search_strategy='portfolio', track_experiments=True, output_path=None, random_state=42, verbosity=2, logger=None, constraints=None, guardrails_strict=False, checkpoint_dir=None, keep_training=False, patience=5, min_improvement=0.0001, min_model_time=300.0, max_model_time=600.0, excluded_models=None, early_stopping_rounds=50, use_gpu=False)[source]¶
Bases:
BasePredictorAutoML predictor for tabular data.
This predictor automates the complete machine learning pipeline for structured/tabular data, including preprocessing, model selection, hyperparameter tuning, and ensemble building.
- Parameters:
label (str) – Name of the target column.
problem_type (str, default="auto") – Type of problem: “classification”, “regression”, “multiclass”, or “auto”.
eval_metric (str, default="auto") – Evaluation metric. “auto” selects based on problem type.
presets (str, default="medium_quality") – Quality preset: “best_quality”, “high_quality”, “good_quality”, “medium_quality”, “fast”, “exhaustive”, “interpretable”. “exhaustive” uses all 100+ models with evolutionary search.
time_limit (int, optional) – Time limit in seconds. If None, uses preset default.
search_strategy (str, default="portfolio") – Search strategy: “portfolio”, “heuristic”, “genetic”, “random”, “bayesian”, “bandit”, “adaptive”.
track_experiments (bool, default=True) – Whether to track experiments to the meta-learning database.
output_path (str, optional) – Path to save outputs (models, logs, etc.).
random_state (int, default=42) – Random seed for reproducibility.
verbosity (int, default=2) – Verbosity level (0=silent, 1=progress, 2=detailed, 3=debug).
min_model_time (float, default=300.0) – Minimum time budget (in seconds) for each individual model. If the remaining stage budget is less than this value (and at least one model has already been trained), training stops rather than giving models an inadequately short budget.
max_model_time (float, default=600.0) – Hard ceiling (in seconds) for any single model. If a model exceeds this time, its training is abandoned and the pipeline moves on. Prevents slow models from monopolizing the budget.
early_stopping_rounds (int, default=50) – Early stopping patience for GBDT models (LightGBM, XGBoost, CatBoost, NGBoost) during cross-validation. Training halts when no improvement is seen for this many consecutive rounds. Only applies during CV scoring — the final refit uses all boosting rounds.
use_gpu (bool, default=False) – Enable GPU acceleration for supported models. When True, training uses thread-based execution (instead of fork) to avoid CUDA re-initialization issues. Models that encounter CUDA out-of-memory errors automatically fall back to CPU.
logger (ExperimentLogger | None)
constraints (DeploymentConstraints | None)
guardrails_strict (bool)
checkpoint_dir (str | None)
keep_training (bool)
patience (int)
min_improvement (float)
- fit_summary_¶
Summary of the fitting process.
- Type:
- classes_¶
Class labels for classification problems.
- Type:
np.ndarray
- leaderboard_¶
Model performance leaderboard.
- Type:
pd.DataFrame
Examples
Basic usage with CSV file:
>>> predictor = TabularPredictor(label="target") >>> predictor.fit("train.csv") >>> predictions = predictor.predict("test.csv")
Using presets:
>>> predictor = TabularPredictor(label="price", presets="best_quality") >>> predictor.fit(train_df, time_limit=3600) >>> predictions = predictor.predict(test_df)
With explicit problem type:
>>> predictor = TabularPredictor( ... label="survived", ... problem_type="binary", ... eval_metric="roc_auc", ... ) >>> predictor.fit(train_df) >>> proba = predictor.predict_proba(test_df)
- fit(train_data, tuning_data=None, time_limit=None, presets=None, hyperparameters=None, interpretable_only=False, **kwargs)[source]¶
Fit the AutoML predictor on tabular data.
- Parameters:
train_data (str, Path, DataFrame, or ndarray) – Training data. Can be a file path, DataFrame, or array.
tuning_data (optional) – Validation/tuning data. If None, uses internal holdout.
time_limit (int, optional) – Override the time limit.
presets (str, optional) – Override the preset.
hyperparameters (dict, optional) – Override hyperparameters for specific models.
interpretable_only (bool, default=False) – If True, only use interpretable models. This limits the model search to glass-box models that provide human-understandable explanations, including: - GAM-style models: EBM, GAM, NAM, NODE-GAM, GAMI-Net - Rule-based models: CORELS, RuleFit, FURIA, Symbolic Regression - Sparse linear models: SLIM, FasterRisk, Linear, MARS - Interpretable trees: GOSDT, C5.0 - Bayesian models: Naive Bayes, TAN, LDA
**kwargs – Additional arguments.
- Return type:
- Returns:
TabularPredictor – The fitted predictor.
- predict_distilled(data)[source]¶
Generate predictions using the distilled model.
The distilled model is a lightweight student model trained via knowledge distillation from the ensemble teacher. It provides faster inference while approximating ensemble accuracy.
- Parameters:
data (str, Path, DataFrame, or ndarray) – Input data to predict on.
- Return type:
- Returns:
np.ndarray – Predictions from the distilled model.
- Raises:
RuntimeError – If predictor is not fitted or no distilled model is available.
- predict_sets(data, alpha=0.1)[source]¶
Generate prediction sets/intervals using conformal prediction.
For classification, returns prediction sets (sets of plausible labels). For regression, returns prediction intervals.
- Parameters:
- Return type:
- Returns:
np.ndarray – For classification: boolean array of shape (n_samples, n_classes) indicating which classes are in each prediction set. For regression: array of shape (n_samples, 2) with [lower, upper] bounds.
- Raises:
RuntimeError – If predictor is not fitted or no conformal predictor is available.
- explain()[source]¶
Get model explanations computed during fitting.
Returns the SHAP-based feature importances and optionally interaction effects that were computed by the explainability stage.
- Return type:
- Returns:
dict – Explanation results including
feature_importance_df,top_features, and optionallyshap_explanationandfeature_interactions.- Raises:
RuntimeError – If predictor is not fitted or no explanations available.
- display_models(*, top_rules=15, top_features=10, print_output=True)[source]¶
Display learned structures for all trained interpretable models.
Prints rules, trees, equations, scorecards, coefficients, and feature importances for every model that was trained.
- Parameters:
- Return type:
- Returns:
str – Complete formatted text for all models.
Example
>>> predictor = TabularPredictor(label="target", presets="interpretable") >>> predictor.fit(train_df) >>> predictor.display_models()
- display_model(model_name, *, top_rules=15, top_features=10, print_output=True)[source]¶
Display the learned structure of a single trained model.
- Parameters:
- Return type:
- Returns:
str – Formatted display text.
Example
>>> predictor.display_model("ebm")
- report()[source]¶
Get the AutoML performance report.
- Return type:
- Returns:
AutoMLReport – Structured report with summary, leaderboard, warnings, etc.
- Raises:
RuntimeError – If predictor is not fitted or no report available.
- get_model(name)[source]¶
Get a specific trained model.
- Parameters:
name (str) – Model name.
- Returns:
estimator – The trained model.
- refit_full(data=None)[source]¶
Retrain best model(s) on all available data (train + validation).
After cross-validation identifies the best model and hyperparameters, this method retrains on the full dataset for maximum deployment performance.
- Parameters:
data (DataInput, optional) – Full dataset including the label column. If None, uses the training data from the last
fit()call.- Return type:
- Returns:
TabularPredictor – Self with models retrained on full data.
- set_fit_request(*, hyperparameters='$UNCHANGED$', interpretable_only='$UNCHANGED$', presets='$UNCHANGED$', time_limit='$UNCHANGED$', train_data='$UNCHANGED$', tuning_data='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
hyperparameters (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
hyperparametersparameter infit.interpretable_only (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
interpretable_onlyparameter infit.presets (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
presetsparameter infit.time_limit (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
time_limitparameter infit.train_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
train_dataparameter infit.tuning_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
tuning_dataparameter infit.self (TabularPredictor)
- Returns:
self (object) – The updated object.
- Return type:
- set_predict_proba_request(*, data='$UNCHANGED$', model='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
predict_probamethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredict_probaif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict_proba.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
dataparameter inpredict_proba.model (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
modelparameter inpredict_proba.self (TabularPredictor)
- Returns:
self (object) – The updated object.
- Return type:
- set_predict_request(*, data='$UNCHANGED$', model='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
dataparameter inpredict.model (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
modelparameter inpredict.self (TabularPredictor)
- Returns:
self (object) – The updated object.
- Return type:
- class endgame.automl.BasePredictor(label, problem_type='auto', eval_metric='auto', presets='medium_quality', time_limit=None, search_strategy='portfolio', track_experiments=True, output_path=None, random_state=42, verbosity=2, logger=None)[source]¶
Bases:
EndgameEstimator,ABCBase class for all AutoML predictors.
This class provides the common interface and functionality for domain-specific predictors (Tabular, Vision, Text, etc.).
- Parameters:
label (str) – Name of the target column.
problem_type (str, default="auto") – Type of problem: “classification”, “regression”, “multiclass”, or “auto”.
eval_metric (str, default="auto") – Evaluation metric. “auto” selects based on problem type.
presets (str, default="medium_quality") – Quality preset: “best_quality”, “high_quality”, “good_quality”, “medium_quality”, “fast”, “interpretable”.
time_limit (int, optional) – Time limit in seconds. If None, uses preset default.
search_strategy (str, default="portfolio") – Search strategy: “portfolio”, “heuristic”, “genetic”, “random”, “bayesian”.
track_experiments (bool, default=True) – Whether to track experiments to the meta-learning database.
output_path (str, optional) – Path to save outputs (models, logs, etc.).
random_state (int, default=42) – Random seed for reproducibility.
verbosity (int, default=2) – Verbosity level (0=silent, 1=progress, 2=detailed, 3=debug).
logger (ExperimentLogger, optional) – Experiment logger for tracking params, metrics, and artifacts. When provided, fit() automatically logs training configuration and results. When None (default), no tracking overhead is added.
- fit_summary_¶
Summary of the fitting process.
- Type:
- classes_¶
Class labels for classification problems.
- Type:
np.ndarray
- abstractmethod fit(train_data, tuning_data=None, time_limit=None, presets=None, hyperparameters=None, **kwargs)[source]¶
Fit the AutoML predictor.
- Parameters:
train_data (str, Path, DataFrame, or ndarray) – Training data. Can be a file path, DataFrame, or array.
tuning_data (optional) – Validation/tuning data. If None, uses internal holdout.
time_limit (int, optional) – Override the time limit.
presets (str, optional) – Override the preset.
hyperparameters (dict, optional) – Override hyperparameters for specific models.
**kwargs – Additional arguments.
- Return type:
- Returns:
BasePredictor – The fitted predictor.
- evaluate(data, metrics=None, silent=False)[source]¶
Evaluate the predictor on data.
- Parameters:
- Return type:
- Returns:
dict – Dictionary mapping metric names to scores.
- save(path=None)[source]¶
Save the predictor to disk.
Uses the endgame persistence module for individual components while preserving the existing directory layout for backwards compatibility.
- classmethod load(path)[source]¶
Load a predictor from disk.
Supports both the legacy pickle format and the new endgame persistence format.
- Parameters:
path (str) – Path to load from.
- Return type:
- Returns:
BasePredictor – The loaded predictor.
- refit_full(data=None)[source]¶
Retrain best model(s) on all available data (train + validation).
After cross-validation identifies the best model and hyperparameters, this method retrains on the full dataset for maximum deployment performance. The refitted model cannot be evaluated (no holdout).
- Parameters:
data (DataInput, optional) – Full dataset. If None, uses the training data from the last fit() call (subclasses must store it).
- Return type:
- Returns:
BasePredictor – Self with models retrained on full data.
- Raises:
RuntimeError – If the predictor has not been fitted.
- set_fit_request(*, hyperparameters='$UNCHANGED$', presets='$UNCHANGED$', time_limit='$UNCHANGED$', train_data='$UNCHANGED$', tuning_data='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
hyperparameters (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
hyperparametersparameter infit.presets (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
presetsparameter infit.time_limit (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
time_limitparameter infit.train_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
train_dataparameter infit.tuning_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
tuning_dataparameter infit.self (BasePredictor)
- Returns:
self (object) – The updated object.
- Return type:
- set_predict_proba_request(*, data='$UNCHANGED$', model='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
predict_probamethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredict_probaif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict_proba.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
dataparameter inpredict_proba.model (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
modelparameter inpredict_proba.self (BasePredictor)
- Returns:
self (object) – The updated object.
- Return type:
- set_predict_request(*, data='$UNCHANGED$', model='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
dataparameter inpredict.model (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
modelparameter inpredict.self (BasePredictor)
- Returns:
self (object) – The updated object.
- Return type:
- class endgame.automl.FitSummary(total_time=0.0, n_models_trained=0, n_models_failed=0, best_model='', best_score=0.0, cv_score=0.0, stage_times=<factory>)[source]¶
Bases:
objectSummary of the fitting process.
- Parameters:
- endgame.automl.display_model(name, model, feature_names=None, X_sample=None, *, top_rules=15, top_features=10, print_output=True)[source]¶
Display the learned structure of a fitted interpretable model.
- Parameters:
name (str) – Display name for the model (e.g. “EBM”, “RuleFit”).
model (estimator) – A fitted sklearn-compatible estimator.
feature_names (list of str, optional) – Feature names. If None, generic names are not replaced.
X_sample (ndarray, optional) – Sample data for computing per-sample contributions.
top_rules (int, default=15) – Maximum number of rules/terms to display.
top_features (int, default=10) – Maximum number of features in importance displays.
print_output (bool, default=True) – If True, print to stdout. Always returns the full text.
- Return type:
- Returns:
str – The complete formatted display text.
- endgame.automl.display_models(models, feature_names=None, X_sample=None, *, top_rules=15, top_features=10, print_output=True)[source]¶
Display learned structures for multiple models.
- Parameters:
models (dict[str, estimator]) – Mapping of model names to fitted estimators.
feature_names (list of str, optional) – Feature names for readable output.
X_sample (ndarray, optional) – Sample data for per-sample contribution displays.
top_rules (int, default=15) – Max rules/terms per model.
top_features (int, default=10) – Max features per importance display.
print_output (bool, default=True) – If True, print to stdout.
- Return type:
- Returns:
str – Complete formatted text for all models.
- class endgame.automl.PresetConfig(name, description, default_time_limit, cv_folds, num_bag_folds, num_stack_levels, hyperparameter_tune, tune_trials, ensemble_method, calibrate, use_holdout, holdout_frac, feature_engineering, model_pool, time_allocations=<factory>)[source]¶
Bases:
objectConfiguration for an AutoML preset.
- Parameters:
- feature_engineering¶
Feature engineering level: “none”, “light”, “moderate”, “aggressive”.
- Type:
- endgame.automl.get_preset(name)[source]¶
Get a preset configuration by name.
- Parameters:
name (str) – Name of the preset.
- Return type:
- Returns:
PresetConfig – The preset configuration.
- Raises:
ValueError – If preset name is not recognized.
- class endgame.automl.BaseSearchStrategy(task_type='classification', eval_metric='auto', random_state=None, verbose=0, excluded_models=None)[source]¶
Bases:
ABCBase class for pipeline search strategies.
A search strategy is responsible for suggesting pipeline configurations to try and updating its internal state based on the results.
- Parameters:
- results_¶
Results from all evaluated configurations.
- Type:
list of SearchResult
- best_result_¶
Best result found so far.
- Type:
SearchResult or None
- abstractmethod suggest(meta_features=None, n_suggestions=1)[source]¶
Suggest pipeline configurations to try.
- Parameters:
- Return type:
- Returns:
list of PipelineConfig – Suggested configurations.
- update(result)[source]¶
Update the search strategy with a new result.
- Parameters:
result (SearchResult) – Result from evaluating a configuration.
- Return type:
- get_best(n=1)[source]¶
Get the best results found so far.
- Parameters:
n (int, default=1) – Number of best results to return.
- Return type:
- Returns:
list of SearchResult – Top n results sorted by score (descending).
- class endgame.automl.PipelineConfig(model_name, model_params=<factory>, preprocessing=<factory>, feature_engineering=<factory>, ensemble_weight=1.0, config_id=None, metadata=<factory>)[source]¶
Bases:
objectRepresents a complete ML pipeline configuration.
A pipeline config specifies everything needed to train a model: preprocessing steps, model choice, and hyperparameters.
- Parameters:
Examples
>>> config = PipelineConfig( ... model_name="lgbm", ... model_params={"n_estimators": 1000, "learning_rate": 0.05}, ... preprocessing=[ ... ("imputer", {"strategy": "median"}), ... ("encoder", {"method": "target"}), ... ], ... )
- class endgame.automl.SearchResult(config, score, scores=<factory>, fit_time=0.0, predict_time=0.0, oof_predictions=None, feature_importances=None, success=True, error=None, metadata=<factory>)[source]¶
Bases:
objectResult from evaluating a pipeline configuration.
- Parameters:
- config¶
The configuration that was evaluated.
- Type:
- oof_predictions¶
Out-of-fold predictions.
- Type:
np.ndarray, optional
- config: PipelineConfig¶
- class endgame.automl.PortfolioSearch(task_type='classification', eval_metric='auto', model_pool=None, preset='medium_quality', ensure_diversity=True, max_models=None, min_models=1, meta_learner=None, random_state=None, verbose=0, interpretable_only=False)[source]¶
Bases:
BaseSearchStrategyPortfolio-based search strategy with iterative HPO.
Phase 1 (initial sweep): suggests a diverse set of model types. Phase 2 (HPO variants): once all model types are trained, generates hyperparameter variants of the top performers so the continuous optimization loop never runs out of candidates.
- Parameters:
task_type (str) – Task type (“classification” or “regression”).
eval_metric (str) – Evaluation metric to optimize.
model_pool (list of str, optional) – Explicit list of models to consider. If None, uses preset.
preset (str, default="medium_quality") – Preset to use for model pool if model_pool not specified.
ensure_diversity (bool, default=True) – Whether to ensure at least one model from each family.
max_models (int, optional) – Maximum number of models to suggest.
min_models (int, default=1) – Minimum number of models to suggest.
meta_learner (MetaLearner, optional) – Pre-trained meta-learner for model ranking.
random_state (int, optional) – Random seed.
verbose (int, default=0) – Verbosity level.
interpretable_only (bool)
- suggest(meta_features=None, n_suggestions=1)[source]¶
Suggest pipeline configurations to try.
During the initial sweep, suggests new model types. After all model types have been tried, switches to HPO variant generation (perturbing hyperparameters of the best-performing models).
- update(result)[source]¶
Update strategy with a new result.
- Return type:
- Parameters:
result (SearchResult)
- class endgame.automl.PipelineOrchestrator(preset='medium_quality', time_limit=None, search_strategy=None, verbose=1, checkpoint_callback=None, keep_training=False, patience=5, min_improvement=0.0001, min_model_time=300.0, max_model_time=600.0, eval_metric='auto', excluded_models=None, early_stopping_rounds=50, use_gpu=False)[source]¶
Bases:
objectCoordinates AutoML pipeline stages with time budget management.
The orchestrator manages the execution of all pipeline stages, handles time allocation, and provides graceful degradation when stages fail or time runs out.
- Parameters:
preset (str or PresetConfig, default="medium_quality") – Preset configuration to use.
time_limit (int, optional) – Total time budget in seconds. Overrides preset default.
search_strategy (BaseSearchStrategy, optional) – Search strategy for model selection.
verbose (int, default=1) – Verbosity level.
keep_training (bool)
patience (int)
min_improvement (float)
min_model_time (float)
max_model_time (float)
eval_metric (str)
early_stopping_rounds (int)
use_gpu (bool)
- time_manager_¶
Time budget manager.
- Type:
Examples
>>> orchestrator = PipelineOrchestrator(preset="medium_quality") >>> result = orchestrator.run(X_train, y_train, X_val, y_val) >>> print(result.score)
- DEFAULT_STAGES = [('profiling', 0.01), ('quality_guardrails', 0.02), ('data_cleaning', 0.02), ('preprocessing', 0.05), ('feature_engineering', 0.03), ('data_augmentation', 0.02), ('model_selection', 0.04), ('model_training', 0.4), ('constraint_check', 0.01), ('hyperparameter_tuning', 0.2), ('ensembling', 0.06), ('threshold_opt', 0.02), ('calibration', 0.03), ('post_training', 0.02), ('explainability', 0.02), ('persistence', 0.01)]¶
- run(X, y, X_val=None, y_val=None, task_type='classification')[source]¶
Execute the full AutoML pipeline.
- Parameters:
X (array-like) – Training feature matrix.
y (array-like) – Training target vector.
X_val (array-like, optional) – Validation feature matrix.
y_val (array-like, optional) – Validation target vector.
task_type (str, default="classification") – Task type.
- Return type:
PipelineResult
- Returns:
PipelineResult – Complete pipeline results.
- class endgame.automl.PipelineResult(best_model, ensemble, score, scores, stage_results, total_time, metadata=<factory>)[source]¶
Bases:
objectComplete result from running the AutoML pipeline.
- Parameters:
- best_model¶
Best trained model.
- Type:
Any
- ensemble¶
Final ensemble (if built).
- Type:
Any
- stage_results: dict[str, StageResult]¶
- class endgame.automl.StageResult(stage_name, success, duration, output=None, error=None, metadata=<factory>)[source]¶
Bases:
objectResult from executing a pipeline stage.
- Parameters:
- output¶
Stage output data.
- Type:
Any
- class endgame.automl.TimeBudgetManager(total_budget, allocations, min_stage_time=5.0)[source]¶
Bases:
objectManages time allocation across AutoML pipeline stages.
This class tracks time spent in each stage and allows for dynamic reallocation of unused time to other stages.
- Parameters:
Examples
>>> allocations = { ... "profiling": 0.05, ... "training": 0.70, ... "ensemble": 0.25, ... } >>> mgr = TimeBudgetManager(total_budget=600, allocations=allocations) >>> mgr.start() >>> mgr.begin_stage("profiling") >>> # ... do profiling ... >>> mgr.end_stage() >>> print(mgr.remaining_budget("training")) # Training time remaining
- start()[source]¶
Start the overall timer.
- Return type:
- Returns:
TimeBudgetManager – Self for chaining.
- begin_stage(stage_name)[source]¶
Begin a new pipeline stage.
- Parameters:
stage_name (str) – Name of the stage to begin.
- Return type:
- Returns:
float – Budget available for this stage in seconds.
- Raises:
ValueError – If a stage is already in progress.
- end_stage()[source]¶
End the current stage.
- Return type:
- Returns:
float – Duration of the stage in seconds.
- Raises:
ValueError – If no stage is in progress.
- elapsed()[source]¶
Get total elapsed time since start.
- Return type:
- Returns:
float – Elapsed time in seconds.
- is_overtime()[source]¶
Check if overall time budget has been exceeded.
- Return type:
- Returns:
bool – True if overtime.
- redistribute(from_stage, to_stage, fraction=1.0)[source]¶
Redistribute unused time from one stage to another.
- redistribute_remaining(to_stage)[source]¶
Redistribute all unused time from completed stages to a target stage.
- class endgame.automl.ModelInfo(name, display_name, family, class_path, task_types=<factory>, supports_sample_weight=True, supports_feature_importance=True, supports_gpu=False, requires_torch=False, requires_julia=False, typical_fit_time='medium', memory_usage='medium', interpretable=False, handles_categorical=False, handles_missing=False, max_samples=None, min_samples=10, default_params=<factory>, tuning_space=None, required_packages=<factory>, notes='')[source]¶
Bases:
objectInformation about a model in the registry.
- Parameters:
name (str)
display_name (str)
family (str)
class_path (str)
supports_sample_weight (bool)
supports_feature_importance (bool)
supports_gpu (bool)
requires_torch (bool)
requires_julia (bool)
typical_fit_time (str)
memory_usage (str)
interpretable (bool)
handles_categorical (bool)
handles_missing (bool)
max_samples (int | None)
min_samples (int)
tuning_space (str | None)
notes (str)
- supports_feature_importance¶
Whether the model provides feature_importances_.
- Type:
- endgame.automl.register_model(name, display_name, family, class_path, task_types=None, supports_sample_weight=True, supports_feature_importance=True, supports_gpu=False, requires_torch=False, requires_julia=False, typical_fit_time='medium', memory_usage='medium', interpretable=False, handles_categorical=False, handles_missing=False, max_samples=None, min_samples=10, default_params=None, tuning_space=None, notes='', overwrite=False)[source]¶
Register a new model in the registry.
This function provides a convenient way to add new models to the registry at runtime, useful for plugins or custom model extensions.
- Parameters:
name (str) – Short name for the model (used as key).
display_name (str) – Human-readable name.
family (str) – Model family (gbdt, neural, linear, tree, kernel, rules, bayesian, foundation, ensemble).
class_path (str) – Full import path for the model class.
task_types (list of str, optional) – Supported task types. Default is [“classification”, “regression”].
supports_sample_weight (bool, default=True) – Whether the model supports sample_weight in fit().
supports_feature_importance (bool, default=True) – Whether the model provides feature_importances_.
supports_gpu (bool, default=False) – Whether the model can use GPU acceleration.
requires_torch (bool, default=False) – Whether PyTorch is required.
requires_julia (bool, default=False) – Whether Julia is required.
typical_fit_time (str, default="medium") – Expected fit time category: “fast”, “medium”, “slow”, “very_slow”.
memory_usage (str, default="medium") – Expected memory usage: “low”, “medium”, “high”.
interpretable (bool, default=False) – Whether the model is considered interpretable.
handles_categorical (bool, default=False) – Whether the model natively handles categorical features.
handles_missing (bool, default=False) – Whether the model natively handles missing values.
max_samples (int, optional) – Recommended maximum samples (None = no limit).
min_samples (int, default=10) – Recommended minimum samples.
default_params (dict, optional) – Default hyperparameters for the model.
tuning_space (str, optional) – Name of the predefined tuning space (if any).
notes (str, default="") – Additional notes about the model.
overwrite (bool, default=False) – If True, overwrite existing entry. If False, raise error if exists.
- Return type:
- Returns:
ModelInfo – The registered model info.
- Raises:
ValueError – If model already exists and overwrite=False.
Examples
>>> register_model( ... name="my_model", ... display_name="My Custom Model", ... family="neural", ... class_path="mypackage.models.MyModelClassifier", ... supports_gpu=True, ... requires_torch=True, ... typical_fit_time="fast", ... notes="My custom neural network model.", ... )
- endgame.automl.get_model_class(name)[source]¶
Get the model class for a given name.
- Parameters:
name (str) – Model name.
- Return type:
- Returns:
type – The model class.
- Raises:
ImportError – If the model class cannot be imported.
- endgame.automl.get_default_portfolio(task_type='classification', n_samples=10000, time_budget='medium', gpu_available=False)[source]¶
Get a recommended portfolio of models based on data characteristics.
- endgame.automl.list_models(family=None, task_type=None, interpretable_only=False, gpu_only=False, exclude_slow=False, max_samples=None)[source]¶
List models matching criteria.
- Parameters:
family (str, optional) – Filter by model family.
task_type (str, optional) – Filter by task type (“classification” or “regression”).
interpretable_only (bool, default=False) – Only include interpretable models.
gpu_only (bool, default=False) – Only include GPU-capable models.
exclude_slow (bool, default=False) – Exclude slow and very_slow models.
max_samples (int, optional) – Only include models that can handle this many samples.
- Return type:
- Returns:
list of str – Model names matching criteria.
- class endgame.automl.DataLoader(label, problem_type='auto', sample_size=None, random_state=42)[source]¶
Bases:
objectUnified data loader for AutoML.
Handles loading from multiple sources and caches loaded data.
- Parameters:
- train_data_¶
Loaded training data.
- Type:
DataFrame
- X_train_¶
Training features.
- Type:
DataFrame
- y_train_¶
Training target.
- Type:
Series
Examples
>>> loader = DataLoader(label="target") >>> loader.load_train("train.csv") >>> X, y = loader.get_train() >>> X_test = loader.load_test("test.csv")
- load_train(data, **kwargs)[source]¶
Load training data.
- Parameters:
data (str, Path, DataFrame, or ndarray) – Training data source.
**kwargs – Additional arguments for data loading.
- Return type:
DataLoader
- Returns:
self
- load_test(data, **kwargs)[source]¶
Load test data.
- Parameters:
data (str, Path, DataFrame, or ndarray) – Test data source.
**kwargs – Additional arguments for data loading.
- Return type:
pd.DataFrame
- Returns:
DataFrame – Test features.
- get_train()[source]¶
Get training features and target.
- Return type:
tuple[DataFrame,Series]- Returns:
tuple of (DataFrame, Series) – Training features and target.
- endgame.automl.load_data(data, label=None, **kwargs)[source]¶
Load data from various sources.
- Parameters:
- Return type:
tuple[pd.DataFrame, pd.Series | None]
- Returns:
tuple of (DataFrame, Series or None) – Features and target (if label specified).
Examples
>>> X, y = load_data("train.csv", label="target") >>> X, y = load_data(df, label="price") >>> X, _ = load_data("test.csv") # No label for test data
- endgame.automl.infer_task_type(y, threshold=10)[source]¶
Infer task type from target variable.
- Parameters:
y (array-like) – Target variable.
threshold (int, default=10) – Maximum number of unique values to consider as classification.
- Return type:
str
- Returns:
str – Task type: “binary”, “multiclass”, or “regression”.
Examples
>>> infer_task_type(pd.Series([0, 1, 0, 1])) 'binary' >>> infer_task_type(pd.Series([0, 1, 2, 3])) 'multiclass' >>> infer_task_type(pd.Series([1.5, 2.3, 4.1, 5.7])) 'regression'
- class endgame.automl.QualityGuardrailsExecutor(strict=False, leakage_threshold=0.95, redundancy_threshold=0.98)[source]¶
Bases:
BaseStageExecutorPerforms data quality checks early in the pipeline.
Checks for target leakage, feature redundancy, and general data health issues. By default issues are logged as warnings; set
strict=Trueto abort on critical problems.- Parameters:
strict (bool, default=False) – If True, sets
fail_fast=Truein context metadata on critical issues, causing the orchestrator to abort early.leakage_threshold (float, default=0.95) – Absolute correlation with target above which a feature is flagged as potential leakage.
redundancy_threshold (float, default=0.98) – Absolute pairwise correlation above which a feature pair is flagged as redundant.
- class endgame.automl.DataQualityWarning(category, severity, message, details=<factory>)[source]¶
Bases:
objectA single data quality issue.
- class endgame.automl.GuardrailsReport(warnings=<factory>, passed=True, n_critical=0, n_warnings=0)[source]¶
Bases:
objectAggregated result from all guardrail checks.
- Parameters:
warnings (list[DataQualityWarning])
passed (bool)
n_critical (int)
n_warnings (int)
- warnings¶
All detected issues.
- Type:
- warnings: list[DataQualityWarning]¶
- add(warning)[source]¶
Add a warning and update counts.
- Return type:
- Parameters:
warning (DataQualityWarning)
- class endgame.automl.DeploymentConstraints(max_predict_latency_ms=None, max_model_size_mb=None, max_memory_mb=None, require_interpretable=False, max_features=None)[source]¶
Bases:
objectConstraints for model deployment.
- Parameters:
max_predict_latency_ms (float, optional) – Maximum prediction latency per batch of 100 samples (ms).
max_model_size_mb (float, optional) – Maximum serialized model size (MB).
max_memory_mb (float, optional) – Maximum memory usage (MB).
require_interpretable (bool, default=False) – If True, only allow interpretable models.
max_features (int, optional) – Maximum number of features the model can use.
- class endgame.automl.AutoMLReport(summary=<factory>, stage_summary=<factory>, model_leaderboard=<factory>, quality_warnings=<factory>, feature_importances=None, tuning_summary=<factory>, constraint_violations=<factory>)[source]¶
Bases:
objectStructured report from an AutoML run.
- Parameters:
- stage_summary¶
Per-stage timing and success status.
- Type:
pd.DataFrame
- model_leaderboard¶
Trained models sorted by score.
- Type:
pd.DataFrame
- feature_importances¶
Feature importance from explainability stage.
- Type:
pd.DataFrame or None
- stage_summary: pd.DataFrame¶
- model_leaderboard: pd.DataFrame¶
- to_html(title='AutoML Report')[source]¶
Render the report as a self-contained HTML page.
Returns a single HTML string with embedded CSS — no external dependencies required. Suitable for saving as a standalone
.htmlfile or embedding in a dashboard.
- class endgame.automl.ReportGenerator[source]¶
Bases:
objectGenerates an AutoMLReport from pipeline results.
- generate(pipeline_result, orchestrator, models=None)[source]¶
Generate a report from pipeline execution results.
- Parameters:
pipeline_result (PipelineResult) – Result from orchestrator.run().
orchestrator (PipelineOrchestrator) – The orchestrator that executed the pipeline.
models (dict, optional) – Model info dict from TabularPredictor._models.
- Return type:
- Returns:
AutoMLReport – The generated report.