Models

class endgame.models.GBDTWrapper(backend='lightgbm', task='auto', preset='endgame', use_gpu='auto', categorical_features=None, early_stopping_rounds=100, random_state=None, verbose=False, **kwargs)[source]

Bases: EndgameEstimator

Unified interface for XGBoost, LightGBM, and CatBoost.

Provides consistent API across gradient boosting frameworks with competition-tuned default parameters.

Parameters:
  • backend (str, default='lightgbm') – Boosting library: ‘xgboost’, ‘lightgbm’, ‘catboost’.

  • task (str, default='auto') – Task type: ‘auto’, ‘classification’, ‘regression’.

  • preset (str, default='endgame') – Hyperparameter preset: ‘endgame’, ‘fast’, ‘overfit’, ‘custom’.

  • use_gpu (bool or str, default='auto') – Enable GPU: True, False, or ‘auto’ (auto-detect).

  • categorical_features (List[str], optional) – Columns to treat as categorical.

  • early_stopping_rounds (int, default=100) – Early stopping patience.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

  • **kwargs – Override preset parameters.

model_

Fitted underlying model.

Type:

estimator

feature_importances_

Feature importance dictionary.

Type:

Dict[str, float]

best_iteration_

Best iteration (with early stopping).

Type:

int

Examples

>>> from endgame.models import GBDTWrapper
>>> model = GBDTWrapper(backend='lightgbm', preset='endgame')
>>> model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
>>> predictions = model.predict(X_test)
get_params(deep=True)[source]

Get parameters including kwargs for sklearn clone compatibility.

Return type:

WSGIEnvironment[Text, Any]

Parameters:

deep (bool)

set_params(**params)[source]

Set parameters including kwargs.

Return type:

GBDTWrapper

fit(X, y, eval_set=None, sample_weight=None, **fit_params)[source]

Fit the model.

Parameters:
  • X (array-like) – Training features.

  • y (array-like) – Target values.

  • eval_set (List[Tuple], optional) – Validation set(s) for early stopping.

  • sample_weight (array-like, optional) – Sample weights.

  • **fit_params – Additional fit parameters.

Return type:

GBDTWrapper

Returns:

self

predict(X)[source]

Predict target values.

Parameters:

X (array-like) – Features to predict.

Return type:

ndarray

Returns:

ndarray – Predictions.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like) – Features to predict.

Return type:

ndarray

Returns:

ndarray – Class probabilities.

property feature_importances_: dict[str, float]

Feature importance dictionary.

property best_iteration_: int | None

Best iteration from early stopping.

score(X, y, sample_weight=None)[source]

Return the score on the given data.

For classification, returns accuracy. For regression, returns R² score.

Parameters:
  • X (array-like) – Test features.

  • y (array-like) – True labels or target values.

  • sample_weight (array-like, optional) – Sample weights.

Return type:

float

Returns:

float – Score.

set_fit_request(*, eval_set='$UNCHANGED$', sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (GBDTWrapper)

Returns:

self (object) – The updated object.

Return type:

GBDTWrapper

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (GBDTWrapper)

Returns:

self (object) – The updated object.

Return type:

GBDTWrapper

class endgame.models.LGBMWrapper(preset='endgame', task='auto', use_goss=False, use_gpu='auto', categorical_features=None, early_stopping_rounds=100, random_state=None, verbose=False, **kwargs)[source]

Bases: GBDTWrapper

LightGBM-specific wrapper with additional features.

Parameters:
  • preset (str, default='endgame') – Hyperparameter preset.

  • task (str, default='auto') – Task type: ‘auto’, ‘classification’, ‘regression’.

  • use_goss (bool, default=False) – Use Gradient-based One-Side Sampling.

  • **kwargs – Additional parameters.

  • use_gpu (bool | str)

  • categorical_features (list[str] | None)

  • early_stopping_rounds (int)

  • random_state (int | None)

  • verbose (bool)

Examples

>>> model = LGBMWrapper(preset='endgame')
>>> model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
set_fit_request(*, eval_set='$UNCHANGED$', sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (LGBMWrapper)

Returns:

self (object) – The updated object.

Return type:

LGBMWrapper

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (LGBMWrapper)

Returns:

self (object) – The updated object.

Return type:

LGBMWrapper

class endgame.models.XGBWrapper(preset='endgame', task='auto', use_dart=False, use_gpu='auto', categorical_features=None, early_stopping_rounds=100, random_state=None, verbose=False, **kwargs)[source]

Bases: GBDTWrapper

XGBoost-specific wrapper with additional features.

Parameters:
  • preset (str, default='endgame') – Hyperparameter preset.

  • task (str, default='auto') – Task type: ‘auto’, ‘classification’, ‘regression’.

  • use_dart (bool, default=False) – Use DART boosting.

  • **kwargs – Additional parameters.

  • use_gpu (bool | str)

  • categorical_features (list[str] | None)

  • early_stopping_rounds (int)

  • random_state (int | None)

  • verbose (bool)

Examples

>>> model = XGBWrapper(preset='endgame')
>>> model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
set_fit_request(*, eval_set='$UNCHANGED$', sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (XGBWrapper)

Returns:

self (object) – The updated object.

Return type:

XGBWrapper

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (XGBWrapper)

Returns:

self (object) – The updated object.

Return type:

XGBWrapper

class endgame.models.CatBoostWrapper(preset='endgame', task='auto', auto_class_weights=None, use_gpu='auto', categorical_features=None, early_stopping_rounds=100, random_state=None, verbose=False, **kwargs)[source]

Bases: GBDTWrapper

CatBoost-specific wrapper with native categorical handling.

Parameters:
  • preset (str, default='endgame') – Hyperparameter preset.

  • task (str, default='auto') – Task type: ‘auto’, ‘classification’, ‘regression’.

  • auto_class_weights (str, optional) – Auto class weighting: ‘Balanced’, ‘SqrtBalanced’.

  • **kwargs – Additional parameters.

  • use_gpu (bool | str)

  • categorical_features (list[str] | None)

  • early_stopping_rounds (int)

  • random_state (int | None)

  • verbose (bool)

Examples

>>> model = CatBoostWrapper(preset='endgame', categorical_features=['category'])
>>> model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
set_fit_request(*, eval_set='$UNCHANGED$', sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (CatBoostWrapper)

Returns:

self (object) – The updated object.

Return type:

CatBoostWrapper

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (CatBoostWrapper)

Returns:

self (object) – The updated object.

Return type:

CatBoostWrapper

class endgame.models.RotationForestClassifier(n_estimators=10, n_subsets=3, max_features=0.5, base_estimator=None, bootstrap=True, random_state=None, n_jobs=1, verbose=False)[source]

Bases: ClassifierMixin, BaseRotationForest

Rotation Forest for classification.

Parameters:
  • n_estimators (int, default=10) – Number of trees.

  • n_subsets (int, default=3) – Number of feature subsets per tree.

  • max_features (float, default=0.5) – Fraction of features per subset.

  • base_estimator (estimator, optional) – Base tree. Default: DecisionTreeClassifier.

  • bootstrap (bool, default=True) – Bootstrap samples.

  • random_state (int, optional) – Random seed.

  • n_jobs (int)

  • verbose (bool)

Examples

>>> from endgame.models import RotationForestClassifier
>>> clf = RotationForestClassifier(n_estimators=20)
>>> clf.fit(X_train, y_train)
>>> predictions = clf.predict(X_test)
fit(X, y, **fit_params)[source]

Fit the classifier.

Return type:

RotationForestClassifier

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

ndarray of shape (n_samples,) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

ndarray of shape (n_samples, n_classes) – Class probabilities.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (RotationForestClassifier)

Returns:

self (object) – The updated object.

Return type:

RotationForestClassifier

class endgame.models.RotationForestRegressor(n_estimators=10, n_subsets=3, max_features=0.5, base_estimator=None, bootstrap=True, random_state=None, n_jobs=1, verbose=False)[source]

Bases: BaseRotationForest, RegressorMixin

Rotation Forest for regression.

Parameters:
  • n_estimators (int, default=10) – Number of trees.

  • n_subsets (int, default=3) – Number of feature subsets per tree.

  • max_features (float, default=0.5) – Fraction of features per subset.

  • base_estimator (estimator, optional) – Base tree. Default: DecisionTreeRegressor.

  • bootstrap (bool, default=True) – Bootstrap samples.

  • random_state (int, optional) – Random seed.

  • n_jobs (int)

  • verbose (bool)

Examples

>>> from endgame.models import RotationForestRegressor
>>> reg = RotationForestRegressor(n_estimators=20)
>>> reg.fit(X_train, y_train)
>>> predictions = reg.predict(X_test)
predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

ndarray of shape (n_samples,) – Predicted values.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (RotationForestRegressor)

Returns:

self (object) – The updated object.

Return type:

RotationForestRegressor

class endgame.models.C50Classifier(min_cases=2, cf=0.25, use_subset=True, global_pruning=True, use_rust=False, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

C5.0 Decision Tree Classifier.

A high-performance implementation of the C5.0 decision tree algorithm with support for continuous and categorical features, missing values, and sophisticated pruning.

Parameters:
  • min_cases (int, default=2) – Minimum number of cases in a branch.

  • cf (float, default=0.25) – Confidence factor for pruning. Lower values = more pruning.

  • use_subset (bool, default=True) – Use subset splits for categorical attributes.

  • global_pruning (bool, default=True) – Apply global pruning in addition to local pruning.

  • use_rust (bool, default=False) – Use Rust backend if available. Disabled by default due to a classification routing bug in the current Rust extension.

  • random_state (int or None, default=None) – Random state for reproducibility.

tree_

The fitted decision tree.

Type:

TreeNode

n_classes_

Number of classes.

Type:

int

n_features_in_

Number of features.

Type:

int

classes_

Unique class labels.

Type:

ndarray

feature_importances_

Feature importances based on split gains.

Type:

ndarray

Examples

>>> from endgame.models.trees import C50Classifier
>>> clf = C50Classifier()
>>> clf.fit(X_train, y_train)
>>> predictions = clf.predict(X_test)
fit(X, y, sample_weight=None, categorical_features=None)[source]

Fit the C5.0 classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

  • categorical_features (list of int, optional) – Indices of categorical features.

Return type:

C50Classifier

Returns:

self (C50Classifier) – Fitted classifier.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

y (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

property feature_importances_: ndarray[tuple[Any, ...], dtype[_ScalarT]]

Feature importances.

get_structure(feature_names=None)[source]

Get a human-readable representation of the decision tree structure.

Parameters:

feature_names (list of str, optional) – Names for the features. If None, uses feature indices.

Return type:

Text

Returns:

structure (str) – Text representation of the tree.

summary(feature_names=None)[source]

Alias for get_structure for API consistency.

Return type:

Text

Parameters:

feature_names (list[str] | None)

set_fit_request(*, categorical_features='$UNCHANGED$', sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • categorical_features (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for categorical_features parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (C50Classifier)

Returns:

self (object) – The updated object.

Return type:

C50Classifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (C50Classifier)

Returns:

self (object) – The updated object.

Return type:

C50Classifier

class endgame.models.C50Ensemble(n_trials=10, min_cases=2, cf=0.25, use_subset=True, use_rust=False, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

Boosted C5.0 Ensemble Classifier.

Uses AdaBoost-style boosting to combine multiple C5.0 trees.

Parameters:
  • n_trials (int, default=10) – Number of boosting iterations.

  • min_cases (int, default=2) – Minimum cases per branch.

  • cf (float, default=0.25) – Confidence factor for pruning.

  • use_subset (bool, default=True) – Use subset splits for categorical attributes.

  • use_rust (bool, default=False) – Use Rust backend if available. Disabled by default due to a classification routing bug in the current Rust extension.

  • random_state (int or None, default=None) – Random state for reproducibility.

estimators_

The fitted trees.

Type:

list of C50Classifier

estimator_weights_

Weights for each tree in voting.

Type:

ndarray

classes_

Unique class labels.

Type:

ndarray

fit(X, y, categorical_features=None)[source]

Fit the boosted ensemble.

Return type:

C50Ensemble

Parameters:
  • X (ArrayLike)

  • y (ArrayLike)

  • categorical_features (list[int] | None)

predict(X)[source]

Predict class labels.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:

X (ArrayLike)

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:

X (ArrayLike)

get_structure(feature_names=None)[source]

Get a summary of the boosted ensemble structure.

Parameters:

feature_names (list of str, optional) – Names for the features.

Return type:

Text

Returns:

structure (str) – Text representation of the ensemble.

summary(feature_names=None)[source]

Alias for get_structure for API consistency.

Return type:

Text

Parameters:

feature_names (list[str] | None)

set_fit_request(*, categorical_features='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • categorical_features (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for categorical_features parameter in fit.

  • self (C50Ensemble)

Returns:

self (object) – The updated object.

Return type:

C50Ensemble

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (C50Ensemble)

Returns:

self (object) – The updated object.

Return type:

C50Ensemble

class endgame.models.CubistRegressor(committees=1, neighbors=0, min_cases=10, max_rules=0, sample=1.0, extrapolation=0.05, unbiased=False, use_rust=True, random_state=None)[source]

Bases: BaseEstimator, RegressorMixin

Cubist Regression Model.

A high-performance implementation of the Cubist algorithm that combines decision trees with linear regression models. The resulting model consists of a set of rules, where each rule has conditions and a linear model for prediction.

Parameters:
  • committees (int, default=1) – Number of committee members (trees) to build. Using multiple committees creates a boosted ensemble where each subsequent model focuses on the residuals from previous models.

  • neighbors (int, default=0) – Number of nearest neighbors to use for instance-based correction. Set to 0 to disable instance-based correction. When enabled, predictions are adjusted based on the residuals of nearby training instances.

  • min_cases (int, default=2) – Minimum number of cases in a node before splitting is considered.

  • max_rules (int, default=0) – Maximum number of rules to generate (0 = unlimited).

  • sample (float, default=1.0) – Fraction of training data to use in each committee member.

  • extrapolation (float, default=0.05) – Amount of extrapolation allowed beyond training range (as fraction).

  • unbiased (bool, default=False) – If True, use unbiased splitting criterion.

  • use_rust (bool, default=True) – Use the Rust backend if available for better performance.

  • random_state (int or None, default=None) – Random state for reproducibility.

n_features_in_

Number of features seen during fit.

Type:

int

n_rules_

Number of rules in the final model.

Type:

int

feature_importances_

Feature importances based on usage in splits.

Type:

ndarray of shape (n_features,)

Examples

>>> from endgame.models.trees import CubistRegressor
>>> import numpy as np
>>> X = np.random.randn(100, 5)
>>> y = X[:, 0] * 2 + X[:, 1] * 3 + np.random.randn(100) * 0.1
>>> reg = CubistRegressor(committees=5, neighbors=5)
>>> reg.fit(X, y)
>>> predictions = reg.predict(X[:10])

Notes

Cubist was developed by Ross Quinlan as a commercial product. This implementation is based on the algorithm described in the open-source C code released under GPL.

The algorithm works as follows: 1. Build a regression tree by recursively splitting data to minimize

variance in the target variable.

  1. At each leaf node, fit a linear model using the cases at that node.

  2. Extract rules from the tree paths.

  3. Prune rules to remove redundant conditions.

  4. Optionally build multiple trees (committees) using boosting.

  5. Optionally apply instance-based correction using k-NN.

fit(X, y, sample_weight=None)[source]

Fit the Cubist regression model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights. Currently not fully supported.

Return type:

CubistRegressor

Returns:

self (CubistRegressor) – Fitted regressor.

predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

y (ndarray of shape (n_samples,)) – Predicted values.

property feature_importances_: ndarray[tuple[Any, ...], dtype[_ScalarT]]

Feature importances based on split usage.

property n_rules_: int

Number of rules in the model.

get_params(deep=True)[source]

Get parameters for this estimator.

Return type:

WSGIEnvironment[Text, Any]

Parameters:

deep (bool)

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (CubistRegressor)

Returns:

self (object) – The updated object.

Return type:

CubistRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (CubistRegressor)

Returns:

self (object) – The updated object.

Return type:

CubistRegressor

set_params(**params)[source]

Set parameters for this estimator.

Return type:

CubistRegressor

class endgame.models.ObliqueRandomForestRegressor(n_estimators=100, oblique_method='ridge', criterion='squared_error', max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features='sqrt', max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, feature_combinations=2, ridge_alpha=1.0)[source]

Bases: BaseEstimator, RegressorMixin

Oblique Random Forest for regression.

Same as ObliqueRandomForestClassifier but for continuous targets. Uses variance reduction (MSE) as the splitting criterion.

Parameters:
  • n_estimators (int, default=100) – Number of trees in the forest.

  • oblique_method (str, default='ridge') – Method for finding oblique split directions: - ‘ridge’: Ridge regression (recommended) - ‘pca’: Principal Component Analysis - ‘random’: Random projections (fastest) - ‘householder’: Householder reflections Note: ‘lda’ and ‘svm’ fall back to ‘ridge’ for regression.

  • criterion (str, default='squared_error') – Splitting criterion: - ‘squared_error’: Mean squared error (variance reduction) - ‘absolute_error’: Mean absolute error

  • max_depth (int, default=None) – Maximum depth of each tree.

  • min_samples_split (int or float, default=2) – Minimum samples required to split a node.

  • min_samples_leaf (int or float, default=1) – Minimum samples required at a leaf.

  • max_features (int, float, str, or None, default='sqrt') – Features to consider per split.

  • max_leaf_nodes (int, default=None) – Maximum leaf nodes per tree.

  • min_impurity_decrease (float, default=0.0) – Minimum impurity decrease for split.

  • bootstrap (bool, default=True) – Whether to use bootstrap sampling.

  • oob_score (bool, default=False) – Whether to compute out-of-bag R² score.

  • n_jobs (int, default=None) – Number of parallel jobs.

  • random_state (int, RandomState, or None, default=None) – Random seed.

  • verbose (int, default=0) – Verbosity level.

  • warm_start (bool, default=False) – If True, reuse previous fit and add more trees.

  • feature_combinations (int, default=2) – Features per random combination.

  • ridge_alpha (float, default=1.0) – Ridge regularization strength.

estimators_

The fitted tree estimators.

Type:

list of ObliqueDecisionTreeRegressor

n_features_in_

Number of features seen during fit.

Type:

int

feature_importances_

Impurity-based feature importances.

Type:

ndarray of shape (n_features_in_,)

oob_score_

Out-of-bag R² score (if oob_score=True).

Type:

float

oob_prediction_

OOB predictions (if oob_score=True).

Type:

ndarray of shape (n_samples,)

Examples

>>> from endgame.models.trees import ObliqueRandomForestRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=1000, n_features=10, random_state=42)
>>> reg = ObliqueRandomForestRegressor(n_estimators=100, random_state=42)
>>> reg.fit(X, y)
>>> print(reg.score(X, y))
fit(X, y, sample_weight=None)[source]

Build an oblique random forest from the training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Return type:

ObliqueRandomForestRegressor

Returns:

self (object) – Fitted estimator.

predict(X)[source]

Predict target values for samples in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

apply(X)[source]

Apply trees to X, return leaf indices.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

X_leaves (ndarray of shape (n_samples, n_estimators)) – Leaf indices for each sample in each tree.

property n_estimators_: int

Number of fitted estimators.

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (ObliqueRandomForestRegressor)

Returns:

self (object) – The updated object.

Return type:

ObliqueRandomForestRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (ObliqueRandomForestRegressor)

Returns:

self (object) – The updated object.

Return type:

ObliqueRandomForestRegressor

class endgame.models.ObliqueDecisionTreeClassifier(oblique_method='ridge', criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=None, min_impurity_decrease=0.0, random_state=None, ridge_alpha=1.0, feature_combinations=2)[source]

Bases: ClassifierMixin, BaseEstimator

A single oblique decision tree for classification.

This is the base estimator used by ObliqueRandomForestClassifier. Uses linear combinations of features for splits, enabling better capture of diagonal decision boundaries.

Parameters:
  • oblique_method (str, default='ridge') – Method for finding oblique splits: - ‘ridge’: Ridge regression on class labels (recommended) - ‘pca’: Principal Component Analysis - ‘lda’: Linear Discriminant Analysis - ‘random’: Random projections (fastest) - ‘svm’: Linear SVM hyperplane - ‘householder’: Householder reflections

  • criterion (str, default='gini') – Splitting criterion: ‘gini’ or ‘entropy’.

  • max_depth (int, default=None) – Maximum tree depth. None means unlimited.

  • min_samples_split (int or float, default=2) – Minimum samples required to split a node. If float, fraction of total samples.

  • min_samples_leaf (int or float, default=1) – Minimum samples required at a leaf. If float, fraction of total samples.

  • max_features (int, float, str, or None, default=None) – Features to consider per split: - int: Use exactly max_features - float: Use max_features * n_features (fraction) - ‘sqrt’: Use sqrt(n_features) - ‘log2’: Use log2(n_features) - None: Use all features

  • min_impurity_decrease (float, default=0.0) – Minimum impurity decrease required for split.

  • random_state (int, RandomState, or None, default=None) – Random seed.

  • ridge_alpha (float, default=1.0) – Ridge regularization for ‘ridge’ method.

  • feature_combinations (int, default=2) – Features per random combination (for ‘random’ method).

tree_

The root node of the fitted tree.

Type:

ObliqueTreeNode

classes_

Unique class labels.

Type:

ndarray of shape (n_classes,)

n_classes_

Number of classes.

Type:

int

n_features_in_

Number of features seen during fit.

Type:

int

feature_importances_

Impurity-based feature importances.

Type:

ndarray of shape (n_features_in_,)

n_nodes_

Number of nodes in the tree.

Type:

int

fit(X, y, sample_weight=None)[source]

Build the oblique decision tree.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target class labels.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Return type:

ObliqueDecisionTreeClassifier

Returns:

self (object) – Fitted estimator.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

apply(X)[source]

Return leaf indices for samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

X_leaves (ndarray of shape (n_samples,)) – Leaf node id for each sample.

decision_path(X)[source]

Return decision path through the tree.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

tuple[ndarray, ndarray]

Returns:

indicator (ndarray of shape (n_samples, n_nodes)) – Dense matrix where element [i, j] = 1 if sample i passes through node j.

get_depth()[source]

Return the maximum depth of the tree.

Return type:

int

get_n_leaves()[source]

Return the number of leaves.

Return type:

int

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (ObliqueDecisionTreeClassifier)

Returns:

self (object) – The updated object.

Return type:

ObliqueDecisionTreeClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (ObliqueDecisionTreeClassifier)

Returns:

self (object) – The updated object.

Return type:

ObliqueDecisionTreeClassifier

class endgame.models.ObliqueDecisionTreeRegressor(oblique_method='ridge', criterion='squared_error', max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=None, min_impurity_decrease=0.0, random_state=None, ridge_alpha=1.0, feature_combinations=2)[source]

Bases: BaseEstimator, RegressorMixin

A single oblique decision tree for regression.

This is the base estimator used by ObliqueRandomForestRegressor. Uses linear combinations of features for splits, enabling better capture of diagonal decision boundaries.

Parameters:
  • oblique_method (str, default='ridge') – Method for finding oblique splits: - ‘ridge’: Ridge regression (recommended) - ‘pca’: Principal Component Analysis - ‘random’: Random projections (fastest) - ‘householder’: Householder reflections Note: ‘lda’ and ‘svm’ are not available for regression.

  • criterion (str, default='squared_error') – Splitting criterion: ‘squared_error’ or ‘absolute_error’.

  • max_depth (int, default=None) – Maximum tree depth. None means unlimited.

  • min_samples_split (int or float, default=2) – Minimum samples required to split a node.

  • min_samples_leaf (int or float, default=1) – Minimum samples required at a leaf.

  • max_features (int, float, str, or None, default=None) – Features to consider per split.

  • min_impurity_decrease (float, default=0.0) – Minimum impurity decrease required for split.

  • random_state (int, RandomState, or None, default=None) – Random seed.

  • ridge_alpha (float, default=1.0) – Ridge regularization for ‘ridge’ method.

  • feature_combinations (int, default=2) – Features per random combination (for ‘random’ method).

tree_

The root node of the fitted tree.

Type:

ObliqueTreeNode

n_features_in_

Number of features seen during fit.

Type:

int

feature_importances_

Impurity-based feature importances.

Type:

ndarray of shape (n_features_in_,)

n_nodes_

Number of nodes in the tree.

Type:

int

fit(X, y, sample_weight=None)[source]

Build the oblique decision tree.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Return type:

ObliqueDecisionTreeRegressor

Returns:

self (object) – Fitted estimator.

predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

apply(X)[source]

Return leaf indices for samples.

Return type:

ndarray

Parameters:

X (ndarray)

get_depth()[source]

Return the maximum depth of the tree.

Return type:

int

get_n_leaves()[source]

Return the number of leaves.

Return type:

int

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (ObliqueDecisionTreeRegressor)

Returns:

self (object) – The updated object.

Return type:

ObliqueDecisionTreeRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (ObliqueDecisionTreeRegressor)

Returns:

self (object) – The updated object.

Return type:

ObliqueDecisionTreeRegressor

class endgame.models.QuantileRegressorForest(n_estimators=100, quantiles=0.5, criterion='squared_error', max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=1.0, max_leaf_nodes=None, min_impurity_decrease=0.0, bootstrap=True, max_samples=None, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False)[source]

Bases: BaseEstimator, RegressorMixin

Random Forest for conditional quantile estimation.

Quantile Regression Forests (QRF) estimate the full conditional distribution P(Y|X), allowing prediction of any quantile, not just the mean. This is essential for: - Prediction intervals with coverage guarantees - Uncertainty quantification in regression - Asymmetric loss functions (e.g., inventory optimization)

The forest works by tracking which training samples end up in each leaf of each tree. At prediction time, for a test point x, we collect all training samples from the leaves that x falls into across all trees, then compute empirical quantiles from this collection.

Parameters:
  • n_estimators (int, default=100) – Number of trees in the forest.

  • quantiles (float or array-like of floats, default=0.5) – Quantile(s) to predict in [0, 1]. - Single float: predict that quantile - Array: predict multiple quantiles simultaneously Default is 0.5 (median), which is more robust than mean. Common choices: [0.1, 0.5, 0.9] for prediction intervals.

  • criterion (str, default='squared_error') – Splitting criterion for trees: - ‘squared_error’: Mean squared error (standard) - ‘absolute_error’: Mean absolute error - ‘friedman_mse’: Improved MSE for gradient boosting - ‘poisson’: Poisson deviance

  • max_depth (int, default=None) – Maximum depth of each tree. None means unlimited depth (nodes expand until all leaves are pure or contain fewer than min_samples_split samples).

  • min_samples_split (int or float, default=2) – Minimum samples required to split an internal node. If float, fraction of n_samples.

  • min_samples_leaf (int or float, default=1) – Minimum samples required at a leaf node. If float, fraction of n_samples.

  • max_features (int, float, str, or None, default=1.0) – Number of features to consider for each split: - int: Use exactly max_features - float: Use max_features * n_features - ‘sqrt’: Use sqrt(n_features) - ‘log2’: Use log2(n_features) - None or 1.0: Use all features Note: For QRF, using all features is often preferred to get better leaf distributions.

  • max_leaf_nodes (int, default=None) – Maximum number of leaf nodes per tree.

  • min_impurity_decrease (float, default=0.0) – Minimum impurity decrease required for a split.

  • bootstrap (bool, default=True) – Whether to use bootstrap sampling for each tree.

  • max_samples (int or float, default=None) – Number of samples to draw for each tree (with replacement): - None: Draw n_samples samples - int: Draw max_samples samples - float: Draw max_samples * n_samples samples

  • oob_score (bool, default=False) – Whether to compute out-of-bag score. Note: OOB for QRF uses median prediction for scoring.

  • n_jobs (int, default=None) – Number of parallel jobs for fitting trees. None means 1, -1 means all processors.

  • random_state (int, RandomState, or None, default=None) – Random seed for reproducibility.

  • verbose (int, default=0) – Verbosity level for fitting progress.

  • warm_start (bool, default=False) – If True, reuse previous fit and add more trees.

estimators_

The fitted tree estimators.

Type:

list of DecisionTreeRegressor

leaf_samples_

For each tree, mapping from leaf node ID to y values.

Type:

list of dict

n_features_in_

Number of features seen during fit.

Type:

int

feature_importances_

Impurity-based feature importances.

Type:

ndarray of shape (n_features_in_,)

oob_score_

Out-of-bag R² score (if oob_score=True).

Type:

float

oob_prediction_

OOB predictions (if oob_score=True).

Type:

ndarray of shape (n_samples,)

Examples

>>> from endgame.models.trees import QuantileRegressorForest
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=1000, n_features=10, random_state=42)
>>>
>>> # Predict median (more robust than mean)
>>> qrf = QuantileRegressorForest(n_estimators=100, quantiles=0.5, random_state=42)
>>> qrf.fit(X, y)
>>> y_median = qrf.predict(X[:5])
>>>
>>> # Prediction intervals
>>> qrf = QuantileRegressorForest(n_estimators=100, quantiles=[0.1, 0.5, 0.9])
>>> qrf.fit(X, y)
>>> intervals = qrf.predict(X[:5])  # Shape: (5, 3)
>>> lower, median, upper = intervals[:, 0], intervals[:, 1], intervals[:, 2]
>>>
>>> # Change quantiles after fitting (no retraining needed!)
>>> qrf.quantiles = [0.25, 0.75]
>>> iqr_bounds = qrf.predict(X[:5])  # Shape: (5, 2)

Notes

QRF is particularly useful for:

  1. Prediction Intervals: Unlike standard RF which only gives point predictions, QRF can give valid prediction intervals by predicting e.g., [0.05, 0.95] quantiles for 90% coverage.

  2. Heteroscedastic Data: When variance of Y varies with X, QRF naturally captures this through different interval widths.

  3. Conformal Prediction: QRF quantiles can be calibrated using conformal methods for guaranteed coverage.

  4. Asymmetric Loss: For problems where over/under-prediction have different costs (inventory, load forecasting), predict the appropriate quantile that minimizes expected loss.

Memory Usage: QRF stores all training y values at leaves, which uses more memory than standard RF. For very large datasets, consider using subsampling via max_samples parameter.

References

Meinshausen, N. (2006). “Quantile Regression Forests.” Journal of Machine Learning Research, 7, 983-999.

fit(X, y, sample_weight=None)[source]

Build a quantile regression forest from training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights for fitting.

Return type:

QuantileRegressorForest

Returns:

self (object) – Fitted estimator.

predict(X)[source]

Predict quantile(s) for samples in X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray) – Predicted quantile values. - If single quantile: shape (n_samples,) - If multiple quantiles: shape (n_samples, n_quantiles)

predict_quantiles(X, quantiles)[source]

Predict specific quantiles without changing the estimator.

This allows predicting different quantiles from what was specified at construction, without modifying the estimator’s state.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • quantiles (float or array-like of floats) – Quantile(s) to predict in [0, 1].

Return type:

ndarray

Returns:

y_pred (ndarray) – Predicted quantile values. - If single quantile: shape (n_samples,) - If multiple quantiles: shape (n_samples, n_quantiles)

predict_interval(X, coverage=0.9)[source]

Predict symmetric prediction interval.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • coverage (float, default=0.9) – Desired coverage probability in (0, 1). E.g., 0.9 gives [5th, 95th] percentile interval.

Return type:

tuple[ndarray, ndarray]

Returns:

  • lower (ndarray of shape (n_samples,)) – Lower bound of prediction interval.

  • upper (ndarray of shape (n_samples,)) – Upper bound of prediction interval.

predict_mean(X)[source]

Predict conditional mean (like standard Random Forest).

This collects all y values from relevant leaves and returns their mean, equivalent to standard RF prediction.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted mean values.

predict_std(X)[source]

Predict conditional standard deviation (uncertainty).

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_std (ndarray of shape (n_samples,)) – Predicted standard deviation for each sample.

apply(X)[source]

Apply trees to X, return leaf indices.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

X_leaves (ndarray of shape (n_samples, n_estimators)) – Leaf indices for each sample in each tree.

property n_estimators_: int

Number of fitted estimators.

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (QuantileRegressorForest)

Returns:

self (object) – The updated object.

Return type:

QuantileRegressorForest

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (QuantileRegressorForest)

Returns:

self (object) – The updated object.

Return type:

QuantileRegressorForest

endgame.models.pinball_loss(y_true, y_pred, quantile)[source]

Compute pinball (quantile) loss.

The pinball loss is the proper scoring rule for quantile regression: L(y, q) = (1-alpha) * max(q-y, 0) + alpha * max(y-q, 0)

where alpha is the quantile.

Parameters:
  • y_true (ndarray of shape (n_samples,)) – True values.

  • y_pred (ndarray of shape (n_samples,)) – Predicted quantile values.

  • quantile (float) – Quantile being predicted, in [0, 1].

Return type:

float

Returns:

float – Mean pinball loss.

Examples

>>> y_true = np.array([1, 2, 3, 4, 5])
>>> y_pred = np.array([1.5, 2.0, 2.5, 4.0, 4.5])
>>> pinball_loss(y_true, y_pred, 0.5)  # Median loss
endgame.models.interval_coverage(y_true, lower, upper)[source]

Compute empirical coverage of prediction intervals.

Parameters:
  • y_true (ndarray of shape (n_samples,)) – True values.

  • lower (ndarray of shape (n_samples,)) – Lower bounds of intervals.

  • upper (ndarray of shape (n_samples,)) – Upper bounds of intervals.

Return type:

float

Returns:

float – Fraction of y_true values within [lower, upper].

endgame.models.interval_width(lower, upper)[source]

Compute mean width of prediction intervals.

Parameters:
  • lower (ndarray of shape (n_samples,)) – Lower bounds of intervals.

  • upper (ndarray of shape (n_samples,)) – Upper bounds of intervals.

Return type:

float

Returns:

float – Mean interval width.

class endgame.models.EvolutionaryTreeClassifier(population_size=100, n_generations=100, max_depth=8, min_samples_leaf=5, alpha=1.0, mutation_prob=0.8, crossover_prob=0.2, patience=20, warm_start=True, n_jobs=1, random_state=None, verbose=False)[source]

Bases: _EvolutionaryTreeBase, ClassifierMixin

Evolutionary Tree Classifier - Globally optimal trees via genetic algorithms.

Unlike greedy methods (CART, C4.5) that make locally optimal splits, evolutionary trees use genetic algorithms to search for globally optimal tree structures. This can discover patterns that greedy methods miss.

Parameters:
  • population_size (int, default=100) – Number of trees in the population. Larger populations explore more of the search space but are slower.

  • n_generations (int, default=100) – Maximum number of evolutionary generations.

  • max_depth (int, default=8) – Maximum depth of trees in the population.

  • min_samples_leaf (int, default=5) – Minimum samples required in a leaf node.

  • alpha (float, default=1.0) – Complexity penalty coefficient. Higher values favor simpler trees. Controls the BIC-type tradeoff: loss + alpha * complexity.

  • mutation_prob (float, default=0.8) – Probability of applying mutation to offspring.

  • crossover_prob (float, default=0.2) – Probability of using crossover vs just mutation.

  • patience (int, default=20) – Generations without improvement before early stopping.

  • warm_start (bool, default=True) – If True, seed population with a greedy tree for faster convergence.

  • n_jobs (int, default=1) – Number of parallel jobs for fitness evaluation. -1 means using all processors.

  • random_state (int, optional) – Random seed for reproducibility.

  • verbose (bool, default=False) – If True, print progress every 10 generations.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features seen during fit.

Type:

int

tree_

The best tree found during evolution.

Type:

TreeNode

best_fitness_

Fitness of the best tree (lower is better).

Type:

float

fitness_history_

Best fitness at each generation.

Type:

list

Examples

>>> from endgame.models.trees.evtree import EvolutionaryTreeClassifier
>>> clf = EvolutionaryTreeClassifier(
...     population_size=50, n_generations=50, random_state=42
... )
>>> clf.fit(X_train, y_train)
>>> y_pred = clf.predict(X_test)

Notes

Evolutionary trees are slower than greedy methods but can find better structures for complex problems. They’re particularly valuable for:

  1. Ensemble diversity: Different inductive bias from greedy trees

  2. Interpretability: Often finds simpler trees with similar accuracy

  3. Avoiding local optima: Global search escapes greedy suboptimality

Performance tips: - Use warm_start=True (default) to seed with a greedy tree - Increase patience for harder problems - Use n_jobs=-1 for parallel fitness evaluation on large populations - Reduce population_size for faster (but potentially worse) results

References

Grubinger et al., “evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R” (2014)

fit(X, y, **fit_params)[source]

Fit the evolutionary tree classifier.

Parameters:
Return type:

EvolutionaryTreeClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

property feature_importances_: ndarray

Feature importance based on split frequency.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (EvolutionaryTreeClassifier)

Returns:

self (object) – The updated object.

Return type:

EvolutionaryTreeClassifier

class endgame.models.EvolutionaryTreeRegressor(population_size=100, n_generations=100, max_depth=8, min_samples_leaf=5, alpha=1.0, mutation_prob=0.8, crossover_prob=0.2, patience=20, warm_start=True, n_jobs=1, random_state=None, verbose=False)[source]

Bases: _EvolutionaryTreeBase, RegressorMixin

Evolutionary Tree Regressor - Globally optimal trees via genetic algorithms.

Parameters:
  • population_size (int, default=100) – Number of trees in the population.

  • n_generations (int, default=100) – Maximum number of evolutionary generations.

  • max_depth (int, default=8) – Maximum depth of trees.

  • min_samples_leaf (int, default=5) – Minimum samples required in a leaf node.

  • alpha (float, default=1.0) – Complexity penalty coefficient.

  • mutation_prob (float, default=0.8) – Probability of applying mutation.

  • crossover_prob (float, default=0.2) – Probability of using crossover.

  • patience (int, default=20) – Generations without improvement before early stopping.

  • warm_start (bool, default=True) – Seed population with a greedy tree.

  • n_jobs (int, default=1) – Number of parallel jobs (-1 for all processors).

  • random_state (int, optional) – Random seed for reproducibility.

  • verbose (bool, default=False) – Print progress during training.

n_features_in_

Number of features.

Type:

int

tree_

The best tree found.

Type:

TreeNode

best_fitness_

Fitness of the best tree.

Type:

float

Examples

>>> from endgame.models.trees.evtree import EvolutionaryTreeRegressor
>>> reg = EvolutionaryTreeRegressor(population_size=50, random_state=42)
>>> reg.fit(X_train, y_train)
>>> y_pred = reg.predict(X_test)
fit(X, y, **fit_params)[source]

Fit the evolutionary tree regressor.

Parameters:
Return type:

EvolutionaryTreeRegressor

Returns:

self

predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

property feature_importances_: ndarray

Feature importance based on split frequency.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (EvolutionaryTreeRegressor)

Returns:

self (object) – The updated object.

Return type:

EvolutionaryTreeRegressor

class endgame.models.EBMClassifier(feature_names=None, feature_types=None, max_bins=512, max_interaction_bins=32, interactions=10, exclude=None, validation_size=0.15, outer_bags=8, inner_bags=0, learning_rate=0.02, greedy_ratio=10.0, cyclic_progress=False, smoothing_rounds=50, interaction_smoothing_rounds=50, max_rounds=25000, early_stopping_rounds=50, early_stopping_tolerance=1e-05, min_samples_leaf=4, min_hessian=0.0001, reg_alpha=0.0, reg_lambda=0.0, max_delta_step=0.0, gain_scale=5.0, min_cat_samples=10, cat_smooth=10.0, missing='separate', max_leaves=2, monotone_constraints=None, n_jobs=-2, random_state=42)[source]

Bases: ClassifierMixin, EBMBase

Explainable Boosting Machine for Classification.

An interpretable classifier that combines the accuracy of gradient boosting with the transparency of Generalized Additive Models (GAMs).

Parameters:
  • feature_names (list of str, optional) – Names for features. If None, uses default naming.

  • feature_types (list of str, optional) – Types for features (“continuous”, “nominal”, “ordinal”).

  • max_bins (int, default=1024) – Maximum number of bins for continuous features.

  • max_interaction_bins (int, default=64) – Maximum bins for interaction terms.

  • interactions (int, float, str, or list, default=10) – Number or specification of interaction terms to detect. Can be an integer (number of interactions), float (fraction), string like “3x” (multiple of features), or explicit list.

  • exclude (list, optional) – Features or interactions to exclude.

  • validation_size (float, default=0.15) – Fraction of data to use for validation during training.

  • outer_bags (int, default=14) – Number of outer bags for ensembling.

  • inner_bags (int, default=0) – Number of inner bags (0 means no inner bagging).

  • learning_rate (float, default=0.015) – Learning rate for boosting.

  • greedy_ratio (float, default=10.0) – Ratio controlling greedy vs cyclic feature selection.

  • cyclic_progress (bool, default=False) – If True, use cyclic progress; if False, use greedy.

  • smoothing_rounds (int, default=75) – Number of smoothing rounds for main effects.

  • interaction_smoothing_rounds (int, default=75) – Number of smoothing rounds for interactions.

  • max_rounds (int, default=50000) – Maximum number of boosting rounds.

  • early_stopping_rounds (int, default=100) – Stop if no improvement after this many rounds.

  • early_stopping_tolerance (float, default=1e-5) – Tolerance for early stopping.

  • min_samples_leaf (int, default=4) – Minimum samples in a leaf.

  • min_hessian (float, default=0.0001) – Minimum hessian in a leaf.

  • reg_alpha (float, default=0.0) – L1 regularization.

  • reg_lambda (float, default=0.0) – L2 regularization.

  • max_delta_step (float, default=0.0) – Maximum delta step (0 means no limit).

  • gain_scale (float, default=5.0) – Scale factor for gain computation.

  • min_cat_samples (int, default=10) – Minimum samples for categorical bins.

  • cat_smooth (float, default=10.0) – Smoothing for categorical features.

  • missing (str, default="separate") – How to handle missing values (“separate”, “min”, “max”).

  • max_leaves (int, default=2) – Maximum leaves per tree (2 = stumps).

  • monotone_constraints (list, optional) – Monotonicity constraints per feature (-1, 0, 1).

  • n_jobs (int, default=-2) – Number of jobs for parallel processing.

  • random_state (int, default=42) – Random state for reproducibility.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features seen during fit.

Type:

int

feature_names_in_

Feature names.

Type:

list of str

feature_types_in_

Detected feature types.

Type:

list of str

intercept_

Model intercept.

Type:

ndarray

term_features_

Feature indices for each term.

Type:

list of tuple

term_scores_

Score lookup tables for each term.

Type:

list of ndarray

Examples

>>> from endgame.models import EBMClassifier
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> clf = EBMClassifier(interactions=5)
>>> clf.fit(X, y)
>>> clf.score(X, y)
0.98
>>> global_exp = clf.explain_global()
>>> local_exp = clf.explain_local(X[:5])
fit(X, y, sample_weight=None)[source]

Fit the EBM classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target labels.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

Return type:

EBMClassifier

Returns:

self (EBMClassifier) – Fitted classifier.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

decision_function(X)[source]

Compute decision function values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

decision (ndarray) – Decision function values.

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (EBMClassifier)

Returns:

self (object) – The updated object.

Return type:

EBMClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (EBMClassifier)

Returns:

self (object) – The updated object.

Return type:

EBMClassifier

class endgame.models.EBMRegressor(feature_names=None, feature_types=None, max_bins=1024, max_interaction_bins=64, interactions=10, exclude=None, validation_size=0.15, outer_bags=8, inner_bags=0, learning_rate=0.02, greedy_ratio=10.0, cyclic_progress=False, smoothing_rounds=50, interaction_smoothing_rounds=50, max_rounds=25000, early_stopping_rounds=50, early_stopping_tolerance=1e-05, min_samples_leaf=4, min_hessian=0.0001, reg_alpha=0.0, reg_lambda=0.0, max_delta_step=0.0, gain_scale=5.0, min_cat_samples=10, cat_smooth=10.0, missing='separate', max_leaves=2, monotone_constraints=None, n_jobs=-2, random_state=42)[source]

Bases: EBMBase, RegressorMixin

Explainable Boosting Machine for Regression.

An interpretable regressor that combines the accuracy of gradient boosting with the transparency of Generalized Additive Models (GAMs).

Parameters:
  • feature_names (list of str, optional) – Names for features. If None, uses default naming.

  • feature_types (list of str, optional) – Types for features (“continuous”, “nominal”, “ordinal”).

  • max_bins (int, default=1024) – Maximum number of bins for continuous features.

  • max_interaction_bins (int, default=64) – Maximum bins for interaction terms.

  • interactions (int, float, str, or list, default=10) – Number or specification of interaction terms to detect.

  • exclude (list, optional) – Features or interactions to exclude.

  • validation_size (float, default=0.15) – Fraction of data to use for validation during training.

  • outer_bags (int, default=14) – Number of outer bags for ensembling.

  • inner_bags (int, default=0) – Number of inner bags.

  • learning_rate (float, default=0.015) – Learning rate for boosting.

  • greedy_ratio (float, default=10.0) – Ratio controlling greedy vs cyclic feature selection.

  • cyclic_progress (bool, default=False) – If True, use cyclic progress.

  • smoothing_rounds (int, default=75) – Number of smoothing rounds.

  • interaction_smoothing_rounds (int, default=75) – Number of smoothing rounds for interactions.

  • max_rounds (int, default=50000) – Maximum number of boosting rounds.

  • early_stopping_rounds (int, default=100) – Stop if no improvement after this many rounds.

  • early_stopping_tolerance (float, default=1e-5) – Tolerance for early stopping.

  • min_samples_leaf (int, default=4) – Minimum samples in a leaf.

  • min_hessian (float, default=0.0001) – Minimum hessian in a leaf.

  • reg_alpha (float, default=0.0) – L1 regularization.

  • reg_lambda (float, default=0.0) – L2 regularization.

  • max_delta_step (float, default=0.0) – Maximum delta step.

  • gain_scale (float, default=5.0) – Scale factor for gain computation.

  • min_cat_samples (int, default=10) – Minimum samples for categorical bins.

  • cat_smooth (float, default=10.0) – Smoothing for categorical features.

  • missing (str, default="separate") – How to handle missing values.

  • max_leaves (int, default=2) – Maximum leaves per tree.

  • monotone_constraints (list, optional) – Monotonicity constraints per feature.

  • n_jobs (int, default=-2) – Number of jobs for parallel processing.

  • random_state (int, default=42) – Random state for reproducibility.

n_features_in_

Number of features seen during fit.

Type:

int

feature_names_in_

Feature names.

Type:

list of str

feature_types_in_

Detected feature types.

Type:

list of str

intercept_

Model intercept.

Type:

float

term_features_

Feature indices for each term.

Type:

list of tuple

term_scores_

Score lookup tables for each term.

Type:

list of ndarray

Examples

>>> from endgame.models import EBMRegressor
>>> from sklearn.datasets import load_diabetes
>>> X, y = load_diabetes(return_X_y=True)
>>> reg = EBMRegressor(interactions=10)
>>> reg.fit(X, y)
>>> reg.score(X, y)
0.72
>>> importance = reg.get_feature_importances()
fit(X, y, sample_weight=None)[source]

Fit the EBM regressor.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

Return type:

EBMRegressor

Returns:

self (EBMRegressor) – Fitted regressor.

predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (EBMRegressor)

Returns:

self (object) – The updated object.

Return type:

EBMRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (EBMRegressor)

Returns:

self (object) – The updated object.

Return type:

EBMRegressor

endgame.models.show_explanation(explanation, share_graphs=False)[source]

Display an EBM explanation in a dashboard.

This is a convenience function that wraps interpret’s show() function.

Parameters:
  • explanation (EBMExplanation) – Explanation from explain_global() or explain_local().

  • share_graphs (bool, default=False) – If True, link axes across graphs.

Returns:

None – Opens an interactive dashboard.

class endgame.models.MARSRegressor(max_terms=None, max_degree=1, penalty=3.0, thresh=0.001, min_span=None, endspan=None, fast_k=20, feature_names=None, allow_linear=True)[source]

Bases: BaseEstimator, RegressorMixin

Multivariate Adaptive Regression Splines for regression.

MARS builds a piecewise linear model by discovering knots (thresholds) where the relationship between features and target changes. The model is an additive combination of hinge functions: max(0, x - knot) and max(0, knot - x).

Parameters:
  • max_terms (int, default=None) – Maximum number of basis functions (including intercept). If None, defaults to min(100, max(20, 2 * n_features)) + 1.

  • max_degree (int, default=1) – Maximum degree of interactions. 1 = additive model (no interactions) 2 = pairwise interactions allowed 3 = three-way interactions allowed

  • penalty (float, default=3.0) – Generalized Cross-Validation (GCV) penalty per knot. Higher values produce simpler models. Typical range: 2-4.

  • thresh (float, default=0.001) – Forward pass stopping threshold. Stops adding terms when R^2 improvement falls below this value.

  • min_span (int, default=None) – Minimum number of observations between knots. If None, automatically calculated based on data size.

  • endspan (int, default=None) – Minimum observations before first knot and after last knot. If None, automatically calculated based on data size.

  • fast_k (int, default=20) – In the forward pass, only consider the best fast_k parent terms when searching for new basis functions. Set to 0 to consider all parents (slower but potentially better). This is “Fast MARS.”

  • feature_names (list of str, default=None) – Names for features (used in summary output).

  • allow_linear (bool, default=True) – If True, allows linear terms (no hinge) for features that appear to have purely linear relationships.

basis_functions_

The selected basis functions after pruning.

Type:

list of BasisFunction

coef_

Coefficients for each basis function.

Type:

ndarray of shape (n_basis_functions,)

intercept_

The intercept (coefficient of the constant basis function).

Type:

float

n_features_in_

Number of features seen during fit.

Type:

int

feature_names_in_

Names of features seen during fit.

Type:

ndarray of shape (n_features_in_,)

gcv_

Generalized Cross-Validation score of the final model.

Type:

float

rsq_

R-squared of the final model on training data.

Type:

float

forward_pass_record_

Record of terms added during forward pass (for diagnostics).

Type:

list

pruning_record_

Record of pruning decisions (for diagnostics).

Type:

list

Examples

>>> from endgame.models import MARSRegressor
>>> import numpy as np
>>> X = np.random.randn(100, 3)
>>> y = np.maximum(0, X[:, 0] - 0.5) + 2 * X[:, 1] + np.random.randn(100) * 0.1
>>> model = MARSRegressor(max_degree=1)
>>> model.fit(X, y)
MARSRegressor(max_degree=1)
>>> print(model.summary())
>>> predictions = model.predict(X)

References

Friedman, J. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1-67.

Friedman, J. (1993). Fast MARS. Stanford University Technical Report 110.

Milborrow, S. Earth package vignette (R implementation reference).

fit(X, y, sample_weight=None)[source]

Fit the MARS model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), default=None) – Individual weights for each sample.

Return type:

MARSRegressor

Returns:

self (object) – Fitted estimator.

predict(X)[source]

Predict using the fitted MARS model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

get_basis_matrix(X)[source]

Compute the basis function matrix for given X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

B (ndarray of shape (n_samples, n_basis_functions)) – Matrix where B[i, j] is the value of basis function j evaluated at sample i.

summary()[source]

Return a human-readable summary of the model.

Returns a string showing: - Model equation with all basis functions - R^2 and GCV statistics - Variable importance

Return type:

Text

Returns:

summary (str) – Formatted model summary.

compute_variable_importance()[source]

Compute variable importance based on GCV decrease.

For each variable, compute how much GCV would increase if all basis functions involving that variable were removed.

Return type:

WSGIEnvironment[Text, float]

Returns:

importance (dict) – {feature_name: importance_score} Scores are normalized so max = 100.

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (MARSRegressor)

Returns:

self (object) – The updated object.

Return type:

MARSRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (MARSRegressor)

Returns:

self (object) – The updated object.

Return type:

MARSRegressor

class endgame.models.MARSClassifier(max_terms=None, max_degree=1, penalty=3.0, thresh=0.001, min_span=None, endspan=None, fast_k=20, feature_names=None, allow_linear=True, method='logistic', logistic_C=1.0)[source]

Bases: ClassifierMixin, BaseEstimator

MARS for classification via logistic regression on basis functions.

Fits a MARS model to discover basis functions, then uses logistic regression on those basis functions for classification.

Parameters:
  • max_terms (int, default=None) – Maximum number of basis functions (including intercept). If None, defaults to min(100, max(20, 2 * n_features)) + 1.

  • max_degree (int, default=1) – Maximum degree of interactions. 1 = additive model (no interactions) 2 = pairwise interactions allowed 3 = three-way interactions allowed

  • penalty (float, default=3.0) – Generalized Cross-Validation (GCV) penalty per knot. Higher values produce simpler models.

  • thresh (float, default=0.001) – Forward pass stopping threshold.

  • min_span (int, default=None) – Minimum number of observations between knots.

  • endspan (int, default=None) – Minimum observations before first knot and after last knot.

  • fast_k (int, default=20) – Fast MARS parameter (see MARSRegressor).

  • feature_names (list of str, default=None) – Names for features.

  • allow_linear (bool, default=True) – If True, allows linear terms.

  • method (str, default='logistic') – Classification method: - ‘logistic’: Logistic regression on MARS basis functions - ‘threshold’: Threshold regression predictions at 0.5

  • logistic_C (float, default=1.0) – Regularization parameter for logistic regression. Only used when method=’logistic’.

classes_

Unique class labels.

Type:

ndarray of shape (n_classes,)

mars_regressor_

Underlying MARS model for basis function discovery.

Type:

MARSRegressor

logistic_

Fitted logistic regression model (when method=’logistic’).

Type:

LogisticRegression

Examples

>>> from endgame.models import MARSClassifier
>>> import numpy as np
>>> X = np.random.randn(100, 3)
>>> y = (X[:, 0] + X[:, 1] > 0).astype(int)
>>> model = MARSClassifier(max_degree=1)
>>> model.fit(X, y)
MARSClassifier(max_degree=1)
>>> predictions = model.predict(X)
>>> probas = model.predict_proba(X)
fit(X, y, sample_weight=None)[source]

Fit the MARS classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target class labels.

  • sample_weight (array-like of shape (n_samples,), default=None) – Individual weights for each sample.

Return type:

MARSClassifier

Returns:

self (object) – Fitted estimator.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Predicted class probabilities.

summary()[source]

Return a human-readable summary of the model.

Return type:

Text

Returns:

summary (str) – Formatted model summary.

property basis_functions_

Return basis functions from underlying MARS regressor.

get_basis_matrix(X)[source]

Compute the basis function matrix for given X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data.

Return type:

ndarray[tuple[Any, ...], dtype[floating]]

Returns:

B (ndarray of shape (n_samples, n_basis_functions)) – Basis matrix.

compute_variable_importance()[source]

Compute variable importance.

Return type:

WSGIEnvironment[Text, float]

Returns:

importance (dict) – {feature_name: importance_score}

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (MARSClassifier)

Returns:

self (object) – The updated object.

Return type:

MARSClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (MARSClassifier)

Returns:

self (object) – The updated object.

Return type:

MARSClassifier

class endgame.models.RuleFitRegressor(tree_generator=None, n_estimators=50, tree_max_depth=3, max_rules=300, min_support=0.01, max_support=0.99, alpha=None, cv=3, include_linear=True, standardize_linear=True, winsorize_linear=0.025, random_state=None, n_jobs=None)[source]

Bases: BaseEstimator, RegressorMixin

RuleFit: Rule-based regression combining tree ensembles with Lasso.

RuleFit generates interpretable models by extracting rules from a tree ensemble and fitting a sparse linear model on the original features plus binary rule features. The result is a human-readable model that shows exactly which rules and features drive predictions.

Parameters:
  • tree_generator (estimator or None, default=None) – The tree ensemble used to generate rules. If None, uses GradientBoostingRegressor with default parameters. Must have estimators_ attribute after fitting (trees). Common choices: - GradientBoostingRegressor/Classifier - RandomForestRegressor/Classifier - ExtraTreesRegressor/Classifier

  • n_estimators (int, default=100) – Number of trees in the ensemble (if tree_generator is None). Ignored if tree_generator is provided.

  • tree_max_depth (int, default=3) – Maximum depth of trees (if tree_generator is None). Shallow trees (2-4) produce simpler, more interpretable rules. Ignored if tree_generator is provided.

  • max_rules (int or None, default=2000) – Maximum number of rules to extract. If None, extracts all rules. Rules are selected by coverage (fraction of samples satisfying rule).

  • min_support (float, default=0.01) – Minimum fraction of samples that must satisfy a rule for it to be included. Rules with lower support are discarded.

  • max_support (float, default=0.99) – Maximum fraction of samples satisfying a rule. Rules with higher support are too general and discarded.

  • alpha (float or None, default=None) – Lasso regularization strength. If None, selected via cross-validation. Higher values produce sparser (more interpretable) models.

  • cv (int, default=5) – Number of cross-validation folds for alpha selection. Only used if alpha is None.

  • include_linear (bool, default=True) – Whether to include original features (linear terms) in the model. If False, model uses only rule features.

  • standardize_linear (bool, default=True) – Whether to standardize linear features before fitting Lasso. Recommended for fair penalization across features.

  • winsorize_linear (float or None, default=0.025) – Winsorization quantile for linear features. If not None, clips extreme values at this quantile to reduce outlier influence.

  • random_state (int, RandomState, or None, default=None) – Random seed for reproducibility.

  • n_jobs (int, default=None) – Number of parallel jobs for cross-validation.

rules_

Extracted rules with non-zero coefficients.

Type:

list of Rule

rule_ensemble_

All extracted rules (before Lasso selection).

Type:

RuleEnsemble

coef_

Coefficients for all features (linear + rules).

Type:

ndarray

intercept_

Model intercept.

Type:

float

linear_coef_

Coefficients for original linear features.

Type:

ndarray of shape (n_features_in_,)

rule_coef_

Coefficients for rule features.

Type:

ndarray of shape (n_rules,)

feature_names_in_

Feature names seen during fit.

Type:

ndarray of shape (n_features_in_,)

n_features_in_

Number of original features.

Type:

int

n_rules_

Number of rules extracted (before Lasso).

Type:

int

n_rules_selected_

Number of rules with non-zero coefficients.

Type:

int

alpha_

Regularization strength used (selected or provided).

Type:

float

cv_results_

Cross-validation results for alpha selection.

Type:

dict

tree_generator_

Fitted tree ensemble used for rule extraction.

Type:

estimator

feature_importances_

Importance of each original feature (sum of absolute coefficients for linear term and rules involving that feature).

Type:

ndarray

Examples

>>> from endgame.models import RuleFitRegressor
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=500, n_features=10, random_state=42)
>>> model = RuleFitRegressor(tree_max_depth=3, random_state=42)
>>> model.fit(X, y)
>>> print(model.get_rules())  # Print selected rules
>>> predictions = model.predict(X)

Notes

For best interpretability: - Use shallow trees (max_depth=2 or 3) for simpler rules - Use higher alpha (more regularization) for sparser models - Provide meaningful feature_names for readable rule output

References

Friedman, J. H., & Popescu, B. E. (2008). “Predictive learning via rule ensembles.” The Annals of Applied Statistics, 2(3), 916-954.

fit(X, y, feature_names=None, sample_weight=None)[source]

Fit the RuleFit model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • feature_names (list of str, default=None) – Names for features. If None, uses [‘x0’, ‘x1’, …].

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights for fitting.

Returns:

self (object) – Fitted estimator.

predict(X)[source]

Predict using the fitted RuleFit model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

transform(X)[source]

Transform X into rule features.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data.

Returns:

X_rules (ndarray of shape (n_samples, n_rules)) – Binary matrix of rule satisfactions.

get_rules(exclude_zero_coef=True, sort_by='importance')[source]

Get the extracted rules with their coefficients.

Parameters:
  • exclude_zero_coef (bool, default=True) – If True, only return rules with non-zero coefficients.

  • sort_by (str, default='importance') – How to sort rules: - ‘importance’: By absolute coefficient value * support (descending) - ‘support’: By rule support/coverage (descending) - ‘coefficient’: By raw coefficient value (descending) - ‘length’: By number of conditions (ascending)

Return type:

list[WSGIEnvironment]

Returns:

rules (list of dict) – Each dict contains: - ‘rule’: str, human-readable rule - ‘coefficient’: float, Lasso coefficient - ‘support’: float, fraction of samples satisfying rule - ‘importance’: float, |coefficient| * support - ‘conditions’: list of Condition objects

summary()[source]

Return a human-readable summary of the model.

Return type:

Text

Returns:

summary (str) – Formatted model summary including: - Model statistics - Top rules by importance - Linear feature coefficients

get_equation(precision=4)[source]

Get the model as a human-readable equation.

Parameters:

precision (int) – Number of decimal places for coefficients.

Return type:

Text

Returns:

equation (str) – Model equation string.

visualize_rule(rule_idx)[source]

Visualize a specific rule’s effect.

Parameters:

rule_idx (int) – Index of the rule to visualize.

Returns:

fig (matplotlib Figure) – Visualization of rule effect.

set_fit_request(*, feature_names='$UNCHANGED$', sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • feature_names (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for feature_names parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (RuleFitRegressor)

Returns:

self (object) – The updated object.

Return type:

RuleFitRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (RuleFitRegressor)

Returns:

self (object) – The updated object.

Return type:

RuleFitRegressor

class endgame.models.RuleFitClassifier(tree_generator=None, n_estimators=50, tree_max_depth=3, max_rules=300, min_support=0.01, max_support=0.99, alpha=None, cv=3, include_linear=True, standardize_linear=True, winsorize_linear=0.025, random_state=None, n_jobs=None, class_weight=None)[source]

Bases: ClassifierMixin, BaseEstimator

RuleFit for classification.

For binary classification, uses logistic regression on rule features. For multiclass, uses one-vs-rest strategy.

Parameters:
  • tree_generator (estimator or None, default=None) – The tree ensemble used to generate rules. If None, uses GradientBoostingClassifier with default parameters.

  • n_estimators (int, default=100) – Number of trees in the ensemble (if tree_generator is None).

  • tree_max_depth (int, default=3) – Maximum depth of trees (if tree_generator is None).

  • max_rules (int or None, default=2000) – Maximum number of rules to extract.

  • min_support (float, default=0.01) – Minimum fraction of samples that must satisfy a rule.

  • max_support (float, default=0.99) – Maximum fraction of samples satisfying a rule.

  • alpha (float or None, default=None) – Regularization strength (1/C for logistic regression). If None, selected via cross-validation.

  • cv (int, default=5) – Number of cross-validation folds for alpha selection.

  • include_linear (bool, default=True) – Whether to include original features (linear terms).

  • standardize_linear (bool, default=True) – Whether to standardize linear features.

  • winsorize_linear (float or None, default=0.025) – Winsorization quantile for linear features.

  • random_state (int, RandomState, or None, default=None) – Random seed for reproducibility.

  • n_jobs (int, default=None) – Number of parallel jobs.

  • class_weight (dict, 'balanced', or None, default=None) – Weights for classes in the logistic regression step.

classes_

Unique class labels.

Type:

ndarray of shape (n_classes,)

n_classes_

Number of classes.

Type:

int

(Plus all attributes from RuleFitRegressor)
fit(X, y, feature_names=None, sample_weight=None)[source]

Fit the RuleFit classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target class labels.

  • feature_names (list of str, default=None) – Names for features.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights for fitting.

Returns:

self (object) – Fitted estimator.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

predict_log_proba(X)[source]

Predict class log-probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

log_proba (ndarray of shape (n_samples, n_classes)) – Class log-probabilities.

transform(X)[source]

Transform X into rule features.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data.

Returns:

X_rules (ndarray of shape (n_samples, n_rules)) – Binary matrix of rule satisfactions.

get_rules(exclude_zero_coef=True, sort_by='importance')[source]

Get the extracted rules with their coefficients.

Parameters:
  • exclude_zero_coef (bool, default=True) – If True, only return rules with non-zero coefficients.

  • sort_by (str, default='importance') – How to sort rules: ‘importance’, ‘support’, ‘coefficient’, ‘length’.

Return type:

list[WSGIEnvironment]

Returns:

rules (list of dict) – Each dict contains rule information.

summary()[source]

Return a human-readable summary of the model.

Return type:

Text

get_equation(precision=4)[source]

Get the model as a human-readable equation.

For binary classification only.

Parameters:

precision (int) – Number of decimal places for coefficients.

Return type:

Text

Returns:

equation (str) – Model equation string.

set_fit_request(*, feature_names='$UNCHANGED$', sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • feature_names (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for feature_names parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (RuleFitClassifier)

Returns:

self (object) – The updated object.

Return type:

RuleFitClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (RuleFitClassifier)

Returns:

self (object) – The updated object.

Return type:

RuleFitClassifier

class endgame.models.FURIAClassifier(max_rules=50, min_support=2, max_conditions=10, fuzzify=True, rule_stretching=True, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

Fuzzy Unordered Rule Induction Algorithm (FURIA) classifier.

FURIA learns a set of fuzzy rules for classification. It extends RIPPER with fuzzy boundaries that allow soft decision regions.

Parameters:
  • max_rules (int, default=50) – Maximum number of rules to learn.

  • min_support (int, default=2) – Minimum number of positive examples a rule must cover.

  • max_conditions (int, default=10) – Maximum number of conditions per rule.

  • fuzzify (bool, default=True) – Whether to fuzzify rules after learning.

  • rule_stretching (bool, default=True) – Whether to use rule stretching for uncovered instances.

  • random_state (int or None, default=None) – Random seed for reproducibility.

classes_

Unique class labels.

Type:

ndarray

rules_

Learned fuzzy rules.

Type:

list of FuzzyRule

n_features_in_

Number of features seen during fit.

Type:

int

feature_names_

Feature names.

Type:

list of str

Examples

>>> from endgame.models.rules import FURIAClassifier
>>> clf = FURIAClassifier(max_rules=30)
>>> clf.fit(X_train, y_train)
>>> predictions = clf.predict(X_test)
>>> print(clf.get_rules_str())
fit(X, y, feature_names=None)[source]

Fit the FURIA classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target labels.

  • feature_names (list of str, optional) – Names for the features.

Return type:

FURIAClassifier

Returns:

self (FURIAClassifier) – Fitted classifier.

predict_proba(X)[source]

Predict class probabilities.

Uses the fuzzy firing strengths to compute class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

get_rules()[source]

Get the learned rules.

Return type:

list[FuzzyRule]

Returns:

rules (list of FuzzyRule) – The learned fuzzy rules.

get_rules_str()[source]

Get a human-readable string representation of the rules.

Return type:

Text

Returns:

rules_str (str) – String representation of all rules.

get_rule_importance()[source]

Get feature importance based on rule usage.

Return type:

ndarray

Returns:

importance (ndarray of shape (n_features,)) – Feature importance scores.

property feature_importances_: ndarray

Feature importance scores (sklearn compatible).

set_fit_request(*, feature_names='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • feature_names (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for feature_names parameter in fit.

  • self (FURIAClassifier)

Returns:

self (object) – The updated object.

Return type:

FURIAClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (FURIAClassifier)

Returns:

self (object) – The updated object.

Return type:

FURIAClassifier

class endgame.models.FuzzyRule(conditions=<factory>, consequent=0, weight=1.0, support=0)[source]

Bases: object

A fuzzy rule consisting of multiple fuzzy conditions.

The rule’s firing strength is the minimum membership across all conditions (fuzzy AND via t-norm).

Parameters:
  • conditions (list of FuzzyCondition) – The fuzzy conditions that make up this rule.

  • consequent (int) – The class label this rule predicts.

  • weight (float) – Rule weight (confidence/accuracy on training data).

  • support (int) – Number of training instances covered by this rule.

conditions: list[FuzzyCondition]
consequent: int = 0
weight: float = 1.0
support: int = 0
firing_strength(X)[source]

Compute firing strength (membership degree) for samples.

Uses minimum t-norm for fuzzy AND.

Parameters:

X (ndarray of shape (n_samples, n_features)) – Input data.

Return type:

ndarray

Returns:

strength (ndarray of shape (n_samples,)) – Firing strength in [0, 1] for each sample.

covers(X, threshold=0.0)[source]

Check which samples are covered by this rule.

Parameters:
  • X (ndarray of shape (n_samples, n_features)) – Input data.

  • threshold (float) – Minimum firing strength to be considered covered.

Return type:

ndarray

Returns:

covered (ndarray of shape (n_samples,) dtype=bool) – True where sample is covered by rule.

class endgame.models.FuzzyCondition(feature_idx, feature_name, lower_bound=None, upper_bound=None, lower_support=None, upper_support=None)[source]

Bases: object

A fuzzy condition with trapezoidal membership function.

The membership function is defined by four points (a, b, c, d): - membership = 0 for x <= a or x >= d - membership = 1 for b <= x <= c - linear interpolation for a < x < b and c < x < d

For crisp conditions, a=b and c=d.

Parameters:
  • feature_idx (int) – Index of the feature.

  • feature_name (str) – Name of the feature.

  • lower_bound (float or None) – Lower bound (a, b) of trapezoidal function. None means -inf.

  • upper_bound (float or None) – Upper bound (c, d) of trapezoidal function. None means +inf.

  • lower_support (float) – Support point a (where membership starts rising).

  • upper_support (float) – Support point d (where membership ends).

feature_idx: int
feature_name: str
lower_bound: float | None = None
upper_bound: float | None = None
lower_support: float | None = None
upper_support: float | None = None
membership(X)[source]

Compute fuzzy membership degree for samples.

Parameters:

X (ndarray of shape (n_samples, n_features)) – Input data.

Return type:

ndarray

Returns:

membership (ndarray of shape (n_samples,)) – Membership degree in [0, 1] for each sample.

class endgame.models.TANClassifier(smoothing=1.0, root_selection='max_mi', missing_values='error', max_cardinality=100, auto_discretize=True, discretizer_strategy='mdlp', discretizer_max_bins=10, n_jobs=1, random_state=None, verbose=False)[source]

Bases: BaseBayesianClassifier

Tree Augmented Naive Bayes classifier.

TAN extends Naive Bayes by allowing features to have one additional parent from other features, forming a tree structure. This captures pairwise feature dependencies while remaining computationally tractable.

Parameters:
  • smoothing (float, default=1.0) – Laplace smoothing parameter (alpha). Use 0 for MLE, 1 for add-one smoothing.

  • root_selection ({'max_mi', 'random', int}, default='max_mi') – How to select the root of the tree structure. - ‘max_mi’: Feature with highest MI with target - ‘random’: Random selection (for ensembling) - int: Specific feature index

  • missing_values ({'error', 'marginalize'}, default='error') – Strategy for missing values during predict.

  • auto_discretize (bool, default=True) – If True, automatically discretize continuous features.

  • discretizer_strategy (str, default='mdlp') – Discretization strategy: ‘mdlp’, ‘equal_width’, ‘equal_freq’, ‘kmeans’.

  • discretizer_max_bins (int, default=10) – Maximum bins per feature when auto-discretizing.

  • n_jobs (int, default=1) – Parallelization for MI computation. -1 uses all cores.

  • random_state (int, optional) – Random seed for reproducibility.

  • verbose (bool, default=False) – Enable verbose output.

  • max_cardinality (int)

structure_

Learned TAN structure after fit().

Type:

nx.DiGraph

cpts_

Conditional probability tables. cpts_[i] has shape that depends on parent configuration.

Type:

dict[int, np.ndarray]

class_prior_

Prior class probabilities P(Y).

Type:

np.ndarray

feature_importances_

Mutual information I(X_i; Y) normalized.

Type:

np.ndarray

Examples

>>> from endgame.models.bayesian import TANClassifier
>>> clf = TANClassifier(smoothing=1.0)
>>> clf.fit(X_train, y_train)
>>> clf.predict_proba(X_test)
predict_proba(X)[source]

Predict class probabilities using TAN structure.

For each class c: P(Y=c|X) ∝ P(Y=c) * ∏_i P(X_i | Pa(X_i), Y=c)

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

ndarray of shape (n_samples, n_classes) – Class probabilities.

class endgame.models.EBMCClassifier(score='bdeu', equivalent_sample_size=10.0, max_parents=3, max_features=None, convergence_threshold=0.0001, max_iter=100, use_equivalence_transform=True, smoothing=1.0, max_cardinality=100, auto_discretize=True, discretizer_strategy='mdlp', discretizer_max_bins=10, random_state=None, verbose=False)[source]

Bases: BaseBayesianClassifier

Efficient Bayesian Multivariate Classifier with automatic feature selection.

EBMC learns a Bayesian Network structure and then prunes features to those in the Markov Blanket of the target. This provides built-in feature selection while maintaining interpretability.

Parameters:
  • score ({'bdeu', 'bic', 'k2'}, default='bdeu') – Scoring function for structure learning.

  • equivalent_sample_size (float, default=10.0) – ESS for BDeu score. Lower = more aggressive pruning.

  • max_parents (int, default=3) – Maximum parents per node (controls complexity).

  • max_features (int | None, default=None) – Maximum features to select. None = no limit.

  • convergence_threshold (float, default=1e-4) – Stop when score improvement falls below this.

  • max_iter (int, default=100) – Maximum iterations for structure search.

  • use_equivalence_transform (bool, default=True) – Whether to apply statistical equivalence transformation.

  • smoothing (float, default=1.0) – Laplace smoothing for CPT estimation.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

  • max_cardinality (int)

  • auto_discretize (bool)

  • discretizer_strategy (str)

  • discretizer_max_bins (int)

selected_features_

Indices of selected features (Markov Blanket).

Type:

list[int]

structure_

Learned DAG structure.

Type:

nx.DiGraph

cpts_

Conditional probability tables.

Type:

dict

current_score_

Final structure score.

Type:

float

Examples

>>> from endgame.models.bayesian import EBMCClassifier
>>> clf = EBMCClassifier(max_parents=2)
>>> clf.fit(X_train, y_train)
>>> print(f"Selected {len(clf.selected_features_)} features")
>>> clf.predict(X_test)
predict_proba(X)[source]

Predict class probabilities using learned structure.

Return type:

ndarray

class endgame.models.ESKDBClassifier(n_estimators=50, k=2, smoothing='hdp', diversity_method='sao', aggregation='averaging', n_jobs=-1, max_cardinality=100, auto_discretize=True, discretizer_strategy='mdlp', discretizer_max_bins=10, random_state=None, verbose=False)[source]

Bases: BaseBayesianClassifier

Ensemble of Selective K-Dependence Bayes classifiers.

ESKDB is a state-of-the-art BNC ensemble that achieves diversity through Stochastic Attribute Ordering (SAO) and/or bootstrapping.

Parameters:
  • n_estimators (int, default=50) – Number of KDB models in ensemble.

  • k (int, default=2) – Maximum number of parent features per node (K-dependence).

  • smoothing ({'laplace', 'hdp'}, default='hdp') –

    • ‘laplace’: Standard add-alpha smoothing

    • ’hdp’: Hierarchical Dirichlet Process (adapts to sparsity)

  • diversity_method ({'sao', 'bootstrap', 'both'}, default='sao') – How to generate ensemble diversity: - ‘sao’: Stochastic Attribute Ordering - ‘bootstrap’: Sample with replacement - ‘both’: Combine both methods

  • aggregation ({'averaging', 'voting', 'stacking'}, default='averaging') – How to combine predictions.

  • n_jobs (int, default=-1) – Parallelization. -1 uses all cores.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

  • max_cardinality (int)

  • auto_discretize (bool)

  • discretizer_strategy (str)

  • discretizer_max_bins (int)

estimators_

Fitted KDB models.

Type:

list[KDBClassifier]

oob_score_

Out-of-bag accuracy (when diversity_method includes ‘bootstrap’).

Type:

float

feature_importances_

Average feature importance across estimators.

Type:

np.ndarray

Examples

>>> from endgame.models.bayesian import ESKDBClassifier
>>> clf = ESKDBClassifier(n_estimators=50, k=2)
>>> clf.fit(X_train, y_train)
>>> clf.predict_proba(X_test)
fit(X, y, **fit_params)[source]

Fit ensemble of KDB classifiers.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data. Can be continuous (will be auto-discretized if auto_discretize=True) or discrete/integer-valued.

  • y (array-like of shape (n_samples,)) – Target values.

Return type:

ESKDBClassifier

Returns:

self

predict_proba(X)[source]

Predict class probabilities by aggregating estimators.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

ndarray of shape (n_samples, n_classes) – Class probabilities.

class endgame.models.KDBClassifier(k=2, smoothing='laplace', smoothing_alpha=1.0, attribute_order=None, max_cardinality=100, auto_discretize=True, discretizer_strategy='mdlp', discretizer_max_bins=10, random_state=None, verbose=False)[source]

Bases: BaseBayesianClassifier

K-Dependence Bayes Classifier.

KDB allows each feature to have at most K feature parents plus the class as a parent. This generalizes: - K=0: Naive Bayes - K=1: One-Dependence Estimator (AODE) - K>=n_features: Unrestricted (but computationally expensive)

Parameters:
  • k (int, default=2) – Maximum number of feature parents per node.

  • smoothing ({'laplace', 'hdp'}, default='laplace') – Smoothing method: - ‘laplace’: Standard add-alpha smoothing - ‘hdp’: Hierarchical Dirichlet Process

  • smoothing_alpha (float, default=1.0) – Smoothing parameter (for Laplace).

  • attribute_order (list[int] | None, default=None) – Custom attribute ordering. If None, uses MI ranking.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

  • max_cardinality (int)

  • auto_discretize (bool)

  • discretizer_strategy (str)

  • discretizer_max_bins (int)

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

class endgame.models.AutoSLE(solvers=None, partition_method='spectral', max_cluster_size=50, edge_threshold=0.5, n_jobs=-1, random_state=None, verbose=False)[source]

Bases: object

Scalable structure learning for massive variable sets.

AutoSLE works by: 1. Partitioning variables into manageable clusters 2. Running multiple structure learning algorithms on each cluster 3. Combining results via edge voting 4. Learning inter-cluster edges 5. Fusing into a global DAG

Parameters:
  • solvers (list[str], default=['pc', 'fges', 'hc']) – Base solvers to ensemble: - ‘pc’: PC-Stable (constraint-based, via pgmpy if available) - ‘fges’: Fast Greedy Equivalence Search (via causal-learn if available) - ‘hc’: Hill Climbing with restarts (built-in) - ‘ges’: Greedy Equivalence Search

  • partition_method ({'spectral', 'correlation', 'random'}, default='spectral') – How to partition variables into clusters.

  • max_cluster_size (int, default=50) – Maximum variables per cluster.

  • edge_threshold (float, default=0.5) – Minimum fraction of solvers that must agree on an edge.

  • n_jobs (int, default=-1) – Parallelization for cluster solving. -1 uses all cores.

  • random_state (int, optional) – Random seed for reproducibility.

  • verbose (bool, default=False) – Enable verbose output.

structure_

Learned global DAG.

Type:

nx.DiGraph

edge_confidence_

Confidence score (agreement ratio) for each edge.

Type:

dict[tuple, float]

cluster_assignments_

Which cluster each variable was assigned to.

Type:

np.ndarray

Examples

>>> from endgame.models.bayesian.structure import AutoSLE
>>> sle = AutoSLE(max_cluster_size=30)
>>> structure = sle.learn(data, variable_names=['x1', 'x2', ...])
learn(data, variable_names=None, cardinalities=None)[source]

Learn structure from data.

Parameters:
  • data (np.ndarray) – Shape (n_samples, n_variables). Should be discrete (integer).

  • variable_names (list[str], optional) – Names for variables. Defaults to indices.

  • cardinalities (dict[int, int], optional) – Number of states per variable. Auto-computed if not provided.

Return type:

DiGraph

Returns:

nx.DiGraph – Learned directed acyclic graph.

get_edge_confidence(u, v)[source]

Get confidence score for an edge.

Parameters:
  • u (int) – Source and target node indices.

  • v (int) – Source and target node indices.

Return type:

float

Returns:

float – Confidence score (0-1), or 0 if edge doesn’t exist.

get_highly_confident_edges(min_confidence=0.8)[source]

Get edges with confidence above threshold.

Parameters:

min_confidence (float, default=0.8) – Minimum confidence score.

Return type:

list[tuple]

Returns:

list[tuple] – List of (source, target) edges.

class endgame.models.NGBoostRegressor(preset='endgame', distribution='normal', score='crps', n_estimators=None, learning_rate=None, minibatch_frac=None, col_sample=None, base_learner=None, natural_gradient=True, early_stopping_rounds=None, random_state=None, verbose=False, **kwargs)[source]

Bases: EndgameEstimator, RegressorMixin

NGBoost Regressor for probabilistic regression.

Produces full probability distributions for predictions, enabling uncertainty quantification and scoring with proper scoring rules.

Parameters:
  • preset (str, default='endgame') – Hyperparameter preset: ‘endgame’, ‘fast’, ‘accurate’, ‘competition’.

  • distribution (str, default='normal') – Output distribution: ‘normal’, ‘lognormal’, ‘exponential’, ‘laplace’, ‘t’, ‘cauchy’, ‘poisson’.

  • score (str, default='crps') – Scoring rule: ‘crps’ (Continuous Ranked Probability Score), ‘mle’/’nll’ (Maximum Likelihood / Negative Log Likelihood).

  • n_estimators (int, optional) – Number of boosting iterations. Overrides preset.

  • learning_rate (float, optional) – Learning rate. Overrides preset.

  • minibatch_frac (float, optional) – Fraction of data to use in each iteration.

  • col_sample (float, optional) – Fraction of features to use in each iteration.

  • base_learner (estimator, optional) – Base learner for boosting. Default is DecisionTreeRegressor(max_depth=3).

  • natural_gradient (bool, default=True) – Use natural gradient (recommended).

  • early_stopping_rounds (int, optional) – Early stopping patience. If None, no early stopping.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

  • **kwargs – Additional parameters passed to NGBRegressor.

model_

Fitted NGBoost model.

Type:

NGBRegressor

feature_importances_

Feature importances from the base learners.

Type:

ndarray

Examples

>>> from endgame.models import NGBoostRegressor
>>> model = NGBoostRegressor(distribution='normal', score='crps')
>>> model.fit(X_train, y_train)
>>>
>>> # Point predictions
>>> y_pred = model.predict(X_test)
>>>
>>> # Full distribution predictions
>>> y_dist = model.pred_dist(X_test)
>>> mean = y_dist.mean()
>>> std = y_dist.std()
>>>
>>> # Prediction intervals
>>> lower, upper = model.predict_interval(X_test, alpha=0.1)  # 90% CI
>>>
>>> # Negative log-likelihood
>>> nll = -y_dist.logpdf(y_test).mean()

References

Duan et al., 2020. “NGBoost: Natural Gradient Boosting for Probabilistic Prediction.” https://arxiv.org/abs/1910.03225

fit(X, y, X_val=None, y_val=None, sample_weight=None, val_sample_weight=None)[source]

Fit the NGBoost regressor.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • X_val (array-like, optional) – Validation features for early stopping.

  • y_val (array-like, optional) – Validation targets for early stopping.

  • sample_weight (array-like, optional) – Training sample weights.

  • val_sample_weight (array-like, optional) – Validation sample weights.

Return type:

NGBoostRegressor

Returns:

self

predict(X)[source]

Predict the mean of the distribution.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted means.

pred_dist(X)[source]

Predict the full distribution.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

dist (ngboost distribution) – Predicted distributions with methods: - mean(): Expected value - std(): Standard deviation - var(): Variance - logpdf(y): Log probability density - pdf(y): Probability density - cdf(y): Cumulative distribution function - ppf(q): Percent point function (inverse CDF) - sample(n): Draw n samples

predict_interval(X, alpha=0.1)[source]

Predict prediction intervals.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • alpha (float, default=0.1) – Significance level. Returns (1-alpha) prediction interval. E.g., alpha=0.1 returns 90% prediction interval.

Return type:

tuple[ndarray, ndarray]

Returns:

  • lower (ndarray of shape (n_samples,)) – Lower bound of prediction interval.

  • upper (ndarray of shape (n_samples,)) – Upper bound of prediction interval.

predict_std(X)[source]

Predict the standard deviation (uncertainty).

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

std (ndarray of shape (n_samples,)) – Predicted standard deviations.

score(X, y, sample_weight=None)[source]

Return the negative log-likelihood on the given data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,)) – True target values.

  • sample_weight (array-like, optional) – Sample weights (not used, for API compatibility).

Return type:

float

Returns:

score (float) – Mean negative log-likelihood (lower is better).

property feature_importances_: ndarray

Feature importances based on base learner splits.

set_fit_request(*, X_val='$UNCHANGED$', sample_weight='$UNCHANGED$', val_sample_weight='$UNCHANGED$', y_val='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X_val (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_val parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • val_sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_sample_weight parameter in fit.

  • y_val (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_val parameter in fit.

  • self (NGBoostRegressor)

Returns:

self (object) – The updated object.

Return type:

NGBoostRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NGBoostRegressor)

Returns:

self (object) – The updated object.

Return type:

NGBoostRegressor

class endgame.models.NGBoostClassifier(preset='endgame', n_estimators=None, learning_rate=None, minibatch_frac=None, col_sample=None, base_learner=None, natural_gradient=True, early_stopping_rounds=None, random_state=None, verbose=False, **kwargs)[source]

Bases: ClassifierMixin, EndgameEstimator

NGBoost Classifier for probabilistic classification.

Produces calibrated probability distributions over classes, with proper uncertainty quantification.

Parameters:
  • preset (str, default='endgame') – Hyperparameter preset: ‘endgame’, ‘fast’, ‘accurate’, ‘competition’.

  • n_estimators (int, optional) – Number of boosting iterations. Overrides preset.

  • learning_rate (float, optional) – Learning rate. Overrides preset.

  • minibatch_frac (float, optional) – Fraction of data to use in each iteration.

  • col_sample (float, optional) – Fraction of features to use in each iteration.

  • base_learner (estimator, optional) – Base learner for boosting. Default is DecisionTreeRegressor(max_depth=3).

  • natural_gradient (bool, default=True) – Use natural gradient (recommended).

  • early_stopping_rounds (int, optional) – Early stopping patience. If None, no early stopping.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

  • **kwargs – Additional parameters passed to NGBClassifier.

model_

Fitted NGBoost model.

Type:

NGBClassifier

classes_

Unique class labels.

Type:

ndarray

n_classes_

Number of classes.

Type:

int

feature_importances_

Feature importances from the base learners.

Type:

ndarray

Examples

>>> from endgame.models import NGBoostClassifier
>>> model = NGBoostClassifier(preset='endgame')
>>> model.fit(X_train, y_train)
>>>
>>> # Class predictions
>>> y_pred = model.predict(X_test)
>>>
>>> # Probability predictions
>>> y_proba = model.predict_proba(X_test)
>>>
>>> # Distribution predictions
>>> y_dist = model.pred_dist(X_test)
>>>
>>> # Log-loss
>>> from sklearn.metrics import log_loss
>>> loss = log_loss(y_test, y_proba)

References

Duan et al., 2020. “NGBoost: Natural Gradient Boosting for Probabilistic Prediction.” https://arxiv.org/abs/1910.03225

fit(X, y, X_val=None, y_val=None, sample_weight=None, val_sample_weight=None)[source]

Fit the NGBoost classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target labels.

  • X_val (array-like, optional) – Validation features for early stopping.

  • y_val (array-like, optional) – Validation labels for early stopping.

  • sample_weight (array-like, optional) – Training sample weights.

  • val_sample_weight (array-like, optional) – Validation sample weights.

Return type:

NGBoostClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

pred_dist(X)[source]

Predict the full distribution over classes.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Returns:

dist (ngboost distribution) – Predicted distributions.

score(X, y, sample_weight=None)[source]

Return accuracy on the given data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,)) – True labels.

  • sample_weight (array-like, optional) – Sample weights.

Return type:

float

Returns:

score (float) – Accuracy score.

property feature_importances_: ndarray

Feature importances based on base learner splits.

set_fit_request(*, X_val='$UNCHANGED$', sample_weight='$UNCHANGED$', val_sample_weight='$UNCHANGED$', y_val='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X_val (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_val parameter in fit.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • val_sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_sample_weight parameter in fit.

  • y_val (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_val parameter in fit.

  • self (NGBoostClassifier)

Returns:

self (object) – The updated object.

Return type:

NGBoostClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NGBoostClassifier)

Returns:

self (object) – The updated object.

Return type:

NGBoostClassifier

class endgame.models.MLPClassifier(hidden_dims=None, dropout=0.3, batch_norm=True, activation='relu', learning_rate=0.001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=10, class_weight=None, scheduler='cosine', device='auto', random_state=None, verbose=False)[source]

Bases: ClassifierMixin, _BaseMLPEstimator

Multi-Layer Perceptron classifier.

PyTorch-based MLP with modern techniques for tabular classification.

Parameters:
  • hidden_dims (List[int], default=[256, 128]) – Hidden layer dimensions.

  • dropout (float, default=0.3) – Dropout rate for regularization.

  • batch_norm (bool, default=True) – Whether to use batch normalization.

  • activation (str, default='relu') – Activation function.

  • learning_rate (float, default=1e-3) – Initial learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization strength.

  • n_epochs (int, default=100) – Maximum number of training epochs.

  • batch_size (int, default=256) – Training batch size.

  • early_stopping (int, default=10) – Patience for early stopping.

  • class_weight (str or dict, optional) – Class weights: ‘balanced’ or dict mapping classes to weights.

  • scheduler (str, default='cosine') – Learning rate scheduler.

  • device (str, default='auto') – Device: ‘cuda’, ‘cpu’, or ‘auto’.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

classes_

Unique class labels.

Type:

ndarray

n_classes_

Number of classes.

Type:

int

model_

Fitted PyTorch model.

Type:

_MLPModule

history_

Training history with ‘train_loss’ and ‘val_loss’.

Type:

dict

Examples

>>> from endgame.models.neural import MLPClassifier
>>> clf = MLPClassifier(hidden_dims=[128, 64], n_epochs=50)
>>> clf.fit(X_train, y_train, val_data=(X_val, y_val))
>>> predictions = clf.predict(X_test)
>>> probabilities = clf.predict_proba(X_test)
fit(X, y, val_data=None)[source]

Fit the classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target labels.

  • val_data (tuple of (X_val, y_val), optional) – Validation data for early stopping.

Return type:

MLPClassifier

Returns:

self – Fitted classifier.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

ndarray of shape (n_samples,) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

ndarray of shape (n_samples, n_classes) – Class probabilities.

set_fit_request(*, val_data='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • val_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_data parameter in fit.

  • self (MLPClassifier)

Returns:

self (object) – The updated object.

Return type:

MLPClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (MLPClassifier)

Returns:

self (object) – The updated object.

Return type:

MLPClassifier

class endgame.models.MLPRegressor(hidden_dims=None, dropout=0.3, batch_norm=True, activation='relu', learning_rate=0.001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=10, loss='mse', scheduler='cosine', device='auto', random_state=None, verbose=False)[source]

Bases: _BaseMLPEstimator, RegressorMixin

Multi-Layer Perceptron regressor.

PyTorch-based MLP with modern techniques for tabular regression.

Parameters:
  • hidden_dims (List[int], default=[256, 128]) – Hidden layer dimensions.

  • dropout (float, default=0.3) – Dropout rate for regularization.

  • batch_norm (bool, default=True) – Whether to use batch normalization.

  • activation (str, default='relu') – Activation function.

  • learning_rate (float, default=1e-3) – Initial learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization strength.

  • n_epochs (int, default=100) – Maximum number of training epochs.

  • batch_size (int, default=256) – Training batch size.

  • early_stopping (int, default=10) – Patience for early stopping.

  • loss (str, default='mse') – Loss function: ‘mse’, ‘mae’, ‘huber’.

  • scheduler (str, default='cosine') – Learning rate scheduler.

  • device (str, default='auto') – Device: ‘cuda’, ‘cpu’, or ‘auto’.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

model_

Fitted PyTorch model.

Type:

_MLPModule

history_

Training history with ‘train_loss’ and ‘val_loss’.

Type:

dict

Examples

>>> from endgame.models.neural import MLPRegressor
>>> reg = MLPRegressor(hidden_dims=[128, 64], n_epochs=50)
>>> reg.fit(X_train, y_train, val_data=(X_val, y_val))
>>> predictions = reg.predict(X_test)
fit(X, y, val_data=None)[source]

Fit the regressor.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

  • val_data (tuple of (X_val, y_val), optional) – Validation data for early stopping.

Return type:

MLPRegressor

Returns:

self – Fitted regressor.

predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

ndarray of shape (n_samples,) or (n_samples, n_targets) – Predicted values.

set_fit_request(*, val_data='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • val_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_data parameter in fit.

  • self (MLPRegressor)

Returns:

self (object) – The updated object.

Return type:

MLPRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (MLPRegressor)

Returns:

self (object) – The updated object.

Return type:

MLPRegressor

class endgame.models.EmbeddingMLPClassifier(categorical_features=None, embedding_dims=None, hidden_dims=None, dropout=0.3, embedding_dropout=0.1, batch_norm=True, activation='relu', learning_rate=0.001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=10, class_weight=None, scheduler='cosine', device='auto', random_state=None, verbose=False)[source]

Bases: ClassifierMixin, _BaseEmbeddingMLP

MLP classifier with entity embeddings for categorical features.

Learns dense representations for categorical variables, enabling effective handling of high-cardinality features.

Parameters:
  • categorical_features (List[str] or List[int], optional) – Names or indices of categorical features. If None, auto-detects based on unique values.

  • embedding_dims (Dict[str, int] or int, optional) – Embedding dimensions per feature or default dimension.

  • hidden_dims (List[int], default=[256, 128]) – Hidden layer dimensions.

  • dropout (float, default=0.3) – Dropout rate for hidden layers.

  • embedding_dropout (float, default=0.1) – Dropout rate for embeddings.

  • batch_norm (bool, default=True) – Whether to use batch normalization.

  • activation (str, default='relu') – Activation function.

  • learning_rate (float, default=1e-3) – Initial learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization strength.

  • n_epochs (int, default=100) – Maximum training epochs.

  • batch_size (int, default=256) – Training batch size.

  • early_stopping (int, default=10) – Early stopping patience.

  • class_weight (str or dict, optional) – Class weights: ‘balanced’ or dict.

  • scheduler (str, default='cosine') – Learning rate scheduler.

  • device (str, default='auto') – Device: ‘cuda’, ‘cpu’, or ‘auto’.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

classes_

Unique class labels.

Type:

ndarray

n_classes_

Number of classes.

Type:

int

model_

Fitted PyTorch model.

Type:

_EmbeddingMLPModule

history_

Training history.

Type:

dict

Examples

>>> from endgame.models.neural import EmbeddingMLPClassifier
>>> clf = EmbeddingMLPClassifier(
...     categorical_features=['category', 'brand'],
...     embedding_dims={'category': 10, 'brand': 8},
...     hidden_dims=[128, 64]
... )
>>> clf.fit(X_train, y_train, val_data=(X_val, y_val))
>>> predictions = clf.predict(X_test)
>>> # Get learned embeddings
>>> category_embeddings = clf.get_embeddings('category')
fit(X, y, val_data=None)[source]

Fit the classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target labels.

  • val_data (tuple of (X_val, y_val), optional) – Validation data for early stopping.

Return type:

EmbeddingMLPClassifier

Returns:

self – Fitted classifier.

predict(X)[source]

Predict class labels.

Return type:

ndarray

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

set_fit_request(*, val_data='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • val_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_data parameter in fit.

  • self (EmbeddingMLPClassifier)

Returns:

self (object) – The updated object.

Return type:

EmbeddingMLPClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (EmbeddingMLPClassifier)

Returns:

self (object) – The updated object.

Return type:

EmbeddingMLPClassifier

class endgame.models.EmbeddingMLPRegressor(categorical_features=None, embedding_dims=None, hidden_dims=None, dropout=0.3, embedding_dropout=0.1, batch_norm=True, activation='relu', learning_rate=0.001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=10, loss='mse', scheduler='cosine', device='auto', random_state=None, verbose=False)[source]

Bases: _BaseEmbeddingMLP, RegressorMixin

MLP regressor with entity embeddings for categorical features.

Learns dense representations for categorical variables, enabling effective handling of high-cardinality features.

Parameters:
  • categorical_features (List[str] or List[int], optional) – Names or indices of categorical features.

  • embedding_dims (Dict[str, int] or int, optional) – Embedding dimensions per feature or default dimension.

  • hidden_dims (List[int], default=[256, 128]) – Hidden layer dimensions.

  • dropout (float, default=0.3) – Dropout rate for hidden layers.

  • embedding_dropout (float, default=0.1) – Dropout rate for embeddings.

  • batch_norm (bool, default=True) – Whether to use batch normalization.

  • activation (str, default='relu') – Activation function.

  • learning_rate (float, default=1e-3) – Initial learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization strength.

  • n_epochs (int, default=100) – Maximum training epochs.

  • batch_size (int, default=256) – Training batch size.

  • early_stopping (int, default=10) – Early stopping patience.

  • loss (str, default='mse') – Loss function: ‘mse’, ‘mae’, ‘huber’.

  • scheduler (str, default='cosine') – Learning rate scheduler.

  • device (str, default='auto') – Device: ‘cuda’, ‘cpu’, or ‘auto’.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

model_

Fitted PyTorch model.

Type:

_EmbeddingMLPModule

history_

Training history.

Type:

dict

Examples

>>> from endgame.models.neural import EmbeddingMLPRegressor
>>> reg = EmbeddingMLPRegressor(
...     categorical_features=['store_id', 'product_id'],
...     embedding_dims=16
... )
>>> reg.fit(X_train, y_train, val_data=(X_val, y_val))
>>> predictions = reg.predict(X_test)
fit(X, y, val_data=None)[source]

Fit the regressor.

Return type:

EmbeddingMLPRegressor

Parameters:

val_data (tuple[Any, Any] | None)

predict(X)[source]

Predict target values.

Return type:

ndarray

set_fit_request(*, val_data='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • val_data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_data parameter in fit.

  • self (EmbeddingMLPRegressor)

Returns:

self (object) – The updated object.

Return type:

EmbeddingMLPRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (EmbeddingMLPRegressor)

Returns:

self (object) – The updated object.

Return type:

EmbeddingMLPRegressor

class endgame.models.TabNetClassifier(n_d=64, n_a=64, n_steps=5, gamma=1.5, n_independent=2, n_shared=2, momentum=0.3, clip_value=None, lambda_sparse=0.0001, optimizer_fn=None, optimizer_params=None, scheduler_fn=None, scheduler_params=None, mask_type='sparsemax', n_epochs=100, patience=15, batch_size=1024, virtual_batch_size=256, device_name='auto', random_state=None, verbose=0)[source]

Bases: _BaseTabNetWrapper, ClassifierMixin

TabNet classifier wrapper.

Attention-based deep learning architecture for tabular classification with built-in feature selection and interpretability.

Parameters:
  • n_d (int, default=64) – Width of the decision prediction layer.

  • n_a (int, default=64) – Width of the attention embedding.

  • n_steps (int, default=5) – Number of decision steps.

  • gamma (float, default=1.5) – Coefficient for feature reusage.

  • n_independent (int, default=2) – Number of independent GLU layers.

  • n_shared (int, default=2) – Number of shared GLU layers.

  • momentum (float, default=0.3) – Batch normalization momentum.

  • clip_value (float, optional) – Gradient clipping value.

  • lambda_sparse (float, default=1e-4) – Sparsity regularization coefficient.

  • optimizer_params (dict, optional) – Optimizer parameters.

  • mask_type (str, default='sparsemax') – Attention type: ‘sparsemax’ or ‘entmax’.

  • n_epochs (int, default=100) – Maximum training epochs.

  • patience (int, default=15) – Early stopping patience.

  • batch_size (int, default=1024) – Training batch size.

  • virtual_batch_size (int, default=256) – Ghost Batch Normalization batch size.

  • device_name (str, default='auto') – Device: ‘cuda’, ‘cpu’, or ‘auto’.

  • random_state (int, optional) – Random seed.

  • verbose (int, default=0) – Verbosity level.

  • optimizer_fn (Any | None)

  • scheduler_fn (Any | None)

  • scheduler_params (dict | None)

classes_

Unique class labels.

Type:

ndarray

n_classes_

Number of classes.

Type:

int

model_

Fitted TabNet model.

Type:

TabNetClassifier

feature_importances_

Feature importance dictionary.

Type:

dict

Examples

>>> from endgame.models.neural import TabNetClassifier
>>> clf = TabNetClassifier(n_steps=3, n_epochs=50)
>>> clf.fit(X_train, y_train, eval_set=[(X_val, y_val)])
>>> predictions = clf.predict(X_test)
>>> proba = clf.predict_proba(X_test)
>>> # Get feature importance masks
>>> explain_matrix, masks = clf.explain(X_test)
fit(X, y, eval_set=None, eval_name=None, eval_metric=None, weights=None, **fit_params)[source]

Fit the classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target labels.

  • eval_set (list of (X, y) tuples, optional) – Validation sets for early stopping.

  • eval_name (list of str, optional) – Names for evaluation sets.

  • eval_metric (list of str, optional) – Evaluation metrics.

  • weights (int or ndarray, optional) – Sample weights (0 for unweighted, 1 for balanced, or array).

  • **fit_params – Additional fit parameters.

Return type:

TabNetClassifier

Returns:

self – Fitted classifier.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like) – Input samples.

Return type:

ndarray

Returns:

ndarray – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like) – Input samples.

Return type:

ndarray

Returns:

ndarray – Class probabilities.

set_fit_request(*, eval_metric='$UNCHANGED$', eval_name='$UNCHANGED$', eval_set='$UNCHANGED$', weights='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_metric (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_metric parameter in fit.

  • eval_name (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_name parameter in fit.

  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.

  • self (TabNetClassifier)

Returns:

self (object) – The updated object.

Return type:

TabNetClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (TabNetClassifier)

Returns:

self (object) – The updated object.

Return type:

TabNetClassifier

class endgame.models.TabNetRegressor(n_d=64, n_a=64, n_steps=5, gamma=1.5, n_independent=2, n_shared=2, momentum=0.3, clip_value=None, lambda_sparse=0.0001, optimizer_fn=None, optimizer_params=None, scheduler_fn=None, scheduler_params=None, mask_type='sparsemax', n_epochs=100, patience=15, batch_size=1024, virtual_batch_size=256, device_name='auto', random_state=None, verbose=0)[source]

Bases: _BaseTabNetWrapper, RegressorMixin

TabNet regressor wrapper.

Attention-based deep learning architecture for tabular regression with built-in feature selection and interpretability.

Parameters:
  • n_d (int, default=64) – Width of the decision prediction layer.

  • n_a (int, default=64) – Width of the attention embedding.

  • n_steps (int, default=5) – Number of decision steps.

  • gamma (float, default=1.5) – Coefficient for feature reusage.

  • n_independent (int, default=2) – Number of independent GLU layers.

  • n_shared (int, default=2) – Number of shared GLU layers.

  • momentum (float, default=0.3) – Batch normalization momentum.

  • clip_value (float, optional) – Gradient clipping value.

  • lambda_sparse (float, default=1e-4) – Sparsity regularization coefficient.

  • optimizer_params (dict, optional) – Optimizer parameters.

  • mask_type (str, default='sparsemax') – Attention type: ‘sparsemax’ or ‘entmax’.

  • n_epochs (int, default=100) – Maximum training epochs.

  • patience (int, default=15) – Early stopping patience.

  • batch_size (int, default=1024) – Training batch size.

  • virtual_batch_size (int, default=256) – Ghost Batch Normalization batch size.

  • device_name (str, default='auto') – Device: ‘cuda’, ‘cpu’, or ‘auto’.

  • random_state (int, optional) – Random seed.

  • verbose (int, default=0) – Verbosity level.

  • optimizer_fn (Any | None)

  • scheduler_fn (Any | None)

  • scheduler_params (dict | None)

model_

Fitted TabNet model.

Type:

TabNetRegressor

feature_importances_

Feature importance dictionary.

Type:

dict

Examples

>>> from endgame.models.neural import TabNetRegressor
>>> reg = TabNetRegressor(n_steps=3, n_epochs=50)
>>> reg.fit(X_train, y_train, eval_set=[(X_val, y_val)])
>>> predictions = reg.predict(X_test)
>>> # Get feature importance masks
>>> explain_matrix, masks = reg.explain(X_test)
fit(X, y, eval_set=None, eval_name=None, eval_metric=None, weights=None, **fit_params)[source]

Fit the regressor.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target values.

  • eval_set (list of (X, y) tuples, optional) – Validation sets for early stopping.

  • eval_name (list of str, optional) – Names for evaluation sets.

  • eval_metric (list of str, optional) – Evaluation metrics.

  • weights (int or ndarray, optional) – Sample weights.

  • **fit_params – Additional fit parameters.

Return type:

TabNetRegressor

Returns:

self – Fitted regressor.

predict(X)[source]

Predict target values.

Parameters:

X (array-like) – Input samples.

Return type:

ndarray

Returns:

ndarray – Predicted values.

set_fit_request(*, eval_metric='$UNCHANGED$', eval_name='$UNCHANGED$', eval_set='$UNCHANGED$', weights='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_metric (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_metric parameter in fit.

  • eval_name (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_name parameter in fit.

  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • weights (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weights parameter in fit.

  • self (TabNetRegressor)

Returns:

self (object) – The updated object.

Return type:

TabNetRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (TabNetRegressor)

Returns:

self (object) – The updated object.

Return type:

TabNetRegressor

class endgame.models.NeuralKDBClassifier(k=2, embedding_dim=16, hidden_dim=64, n_hidden_layers=2, epochs=20, batch_size=256, learning_rate=0.001, weight_decay=1e-05, dropout=0.1, device='auto', early_stopping=5, validation_fraction=0.1, max_cardinality=100, auto_discretize=True, discretizer_strategy='mdlp', discretizer_max_bins=10, random_state=None, verbose=False)[source]

Bases: BaseBayesianClassifier

K-Dependence Bayes with neural conditional probability estimators.

NeuralKDB maintains the interpretable DAG structure of classical KDB but uses neural networks to estimate conditional probabilities. This enables handling of high-cardinality features and better generalization.

Parameters:
  • k (int, default=2) – Maximum parents per feature (excluding class).

  • embedding_dim (int, default=16) – Dimensionality of value embeddings.

  • hidden_dim (int, default=64) – Hidden layer size in conditional networks.

  • n_hidden_layers (int, default=2) – Number of hidden layers per conditional network.

  • epochs (int, default=20) – Training epochs.

  • batch_size (int, default=256) – Mini-batch size for training.

  • learning_rate (float, default=1e-3) – Adam learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization.

  • dropout (float, default=0.1) – Dropout rate in networks.

  • device (str, default='auto') – ‘cuda’, ‘cpu’, or ‘auto’ (detect GPU).

  • early_stopping (int | None, default=5) – Stop if validation loss doesn’t improve for this many epochs. None disables early stopping.

  • validation_fraction (float, default=0.1) – Fraction of training data for validation (if X_val not provided).

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

  • max_cardinality (int)

  • auto_discretize (bool)

  • discretizer_strategy (str)

  • discretizer_max_bins (int)

structure_

Learned KDB structure.

Type:

nx.DiGraph

conditionals_

Neural conditional estimators for each feature.

Type:

nn.ModuleDict

class_prior_

Prior class probabilities.

Type:

np.ndarray

Examples

>>> from endgame.models.bayesian import NeuralKDBClassifier
>>> clf = NeuralKDBClassifier(k=2, epochs=10)
>>> clf.fit(X_train, y_train)
>>> clf.predict_proba(X_test)
fit(X, y, X_val=None, y_val=None, **fit_params)[source]

Fit the Neural KDB classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data. Can be continuous (will be auto-discretized if auto_discretize=True) or discrete/integer-valued.

  • y (array-like of shape (n_samples,)) – Target values.

  • X_val (np.ndarray, optional) – Validation features for early stopping.

  • y_val (np.ndarray, optional) – Validation targets.

Return type:

NeuralKDBClassifier

Returns:

self

predict_proba(X)[source]

Compute P(Y|X) using neural conditionals.

For each class c: P(Y=c|X) ∝ P(Y=c) * ∏_i P(x_i | parents(x_i), Y=c)

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

ndarray of shape (n_samples, n_classes) – Class probabilities.

to_onnx(path)[source]

Export model to ONNX format for production deployment.

Parameters:

path (str) – Path to save ONNX model.

Return type:

None

set_fit_request(*, X_val='$UNCHANGED$', y_val='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X_val (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_val parameter in fit.

  • y_val (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_val parameter in fit.

  • self (NeuralKDBClassifier)

Returns:

self (object) – The updated object.

Return type:

NeuralKDBClassifier

class endgame.models.FTTransformerClassifier(n_blocks=3, d_token=192, n_heads=8, attention_dropout=0.2, ffn_dropout=0.1, residual_dropout=0.0, d_ffn_factor=1.3333333333333333, learning_rate=0.0001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=15, cat_cardinality_threshold=20, device='auto', random_state=None, verbose=False)[source]

Bases: ClassifierMixin, BaseEstimator

Feature Tokenizer Transformer for tabular classification.

Transforms each feature into an embedding and applies transformer layers. Currently state-of-the-art for deep learning on tabular data.

Parameters:
  • n_blocks (int, default=3) – Number of transformer blocks.

  • d_token (int, default=192) – Embedding dimension for each feature token.

  • n_heads (int, default=8) – Number of attention heads.

  • attention_dropout (float, default=0.2) – Attention dropout rate.

  • ffn_dropout (float, default=0.1) – Feed-forward dropout rate.

  • residual_dropout (float, default=0.0) – Residual connection dropout.

  • d_ffn_factor (float, default=4/3) – FFN hidden dimension factor (d_ffn = d_token * d_ffn_factor).

  • learning_rate (float, default=1e-4) – Learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization.

  • n_epochs (int, default=100) – Maximum training epochs.

  • batch_size (int, default=256) – Training batch size.

  • early_stopping (int, default=15) – Early stopping patience.

  • cat_cardinality_threshold (int, default=20) – Treat features with <= this many unique values as categorical.

  • device (str, default='auto') – Device: ‘cuda’, ‘cpu’, or ‘auto’.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Enable verbose output.

classes_

Unique class labels.

Type:

ndarray

n_classes_

Number of classes.

Type:

int

model_

Fitted PyTorch model.

Type:

_FTTransformerModule

history_

Training history.

Type:

dict

Examples

>>> from endgame.models.tabular import FTTransformerClassifier
>>> clf = FTTransformerClassifier(n_blocks=3, d_token=192)
>>> clf.fit(X_train, y_train, eval_set=(X_val, y_val))
>>> proba = clf.predict_proba(X_test)
fit(X, y, eval_set=None, **fit_params)[source]

Fit the FT-Transformer classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Training labels.

  • eval_set (tuple of (X_val, y_val), optional) – Validation set for early stopping.

Return type:

FTTransformerClassifier

Returns:

self – Fitted classifier.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like) – Test samples.

Return type:

ndarray

Returns:

ndarray of shape (n_samples, n_classes) – Class probabilities.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like) – Test samples.

Return type:

ndarray

Returns:

ndarray – Predicted class labels.

set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (FTTransformerClassifier)

Returns:

self (object) – The updated object.

Return type:

FTTransformerClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (FTTransformerClassifier)

Returns:

self (object) – The updated object.

Return type:

FTTransformerClassifier

class endgame.models.FTTransformerRegressor(n_blocks=3, d_token=192, n_heads=8, attention_dropout=0.2, ffn_dropout=0.1, residual_dropout=0.0, d_ffn_factor=1.3333333333333333, learning_rate=0.0001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=15, cat_cardinality_threshold=20, device='auto', random_state=None, verbose=False)[source]

Bases: BaseEstimator, RegressorMixin

Feature Tokenizer Transformer for regression.

Same architecture as FTTransformerClassifier but with regression head.

Parameters:
  • n_blocks (int, default=3) – Number of transformer blocks.

  • d_token (int, default=192) – Embedding dimension.

  • n_heads (int, default=8) – Number of attention heads.

  • attention_dropout (float, default=0.2) – Attention dropout.

  • ffn_dropout (float, default=0.1) – Feed-forward dropout.

  • learning_rate (float, default=1e-4) – Learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization.

  • n_epochs (int, default=100) – Maximum epochs.

  • batch_size (int, default=256) – Batch size.

  • early_stopping (int, default=15) – Early stopping patience.

  • device (str, default='auto') – Device.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Verbose output.

  • residual_dropout (float)

  • d_ffn_factor (float)

  • cat_cardinality_threshold (int)

Examples

>>> reg = FTTransformerRegressor()
>>> reg.fit(X_train, y_train, eval_set=(X_val, y_val))
>>> predictions = reg.predict(X_test)
set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (FTTransformerRegressor)

Returns:

self (object) – The updated object.

Return type:

FTTransformerRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (FTTransformerRegressor)

Returns:

self (object) – The updated object.

Return type:

FTTransformerRegressor

fit(X, y, eval_set=None, **fit_params)[source]

Fit the FT-Transformer regressor.

Return type:

FTTransformerRegressor

predict(X)[source]

Predict target values.

Return type:

ndarray

class endgame.models.SAINTClassifier(n_layers=3, d_model=32, n_heads=4, attention_dropout=0.1, ffn_dropout=0.1, d_ffn_factor=4.0, use_intersample=True, learning_rate=0.001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=15, validation_fraction=0.1, cat_cardinality_threshold=20, device='auto', random_state=None, verbose=False)[source]

Bases: ClassifierMixin, BaseEstimator

SAINT: Self-Attention and Intersample Attention Transformer.

Combines column-wise self-attention with row-wise (intersample) attention to capture both feature interactions and sample similarities.

Parameters:
  • n_layers (int, default=3) – Number of SAINT layers. 2-4 works well for most datasets.

  • d_model (int, default=32) – Model dimension.

  • n_heads (int, default=4) – Number of attention heads.

  • attention_dropout (float, default=0.1) – Attention dropout.

  • ffn_dropout (float, default=0.1) – Feed-forward dropout.

  • d_ffn_factor (float, default=4.0) – FFN hidden dimension factor.

  • use_intersample (bool, default=True) – Whether to use intersample attention (unique to SAINT).

  • learning_rate (float, default=1e-3) – Learning rate. Higher rates (1e-3) often work better than 1e-4.

  • weight_decay (float, default=1e-5) – L2 regularization.

  • n_epochs (int, default=100) – Maximum epochs.

  • batch_size (int, default=256) – Batch size.

  • early_stopping (int, default=15) – Early stopping patience.

  • validation_fraction (float, default=0.1) – Fraction of training data to use for validation when eval_set not provided.

  • cat_cardinality_threshold (int, default=20) – Threshold for categorical detection.

  • device (str, default='auto') – Device.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Verbose output.

classes_

Class labels.

Type:

ndarray

model_

Fitted model.

Type:

_SAINTModule

history_

Training history.

Type:

dict

Examples

>>> clf = SAINTClassifier(n_layers=3, d_model=32)
>>> clf.fit(X_train, y_train, eval_set=(X_val, y_val))
>>> proba = clf.predict_proba(X_test)

Notes

SAINT’s intersample attention allows it to consider relationships between different samples, which can be powerful for learning patterns that span across the dataset.

For best performance: - Use an eval_set for early stopping (or validation_fraction > 0) - Start with n_layers=3 and increase if underfitting - Higher learning rates (1e-3) often work better than typical transformer LR

fit(X, y, eval_set=None, **fit_params)[source]

Fit the SAINT classifier.

Return type:

SAINTClassifier

Parameters:

eval_set (tuple[Any, Any] | None)

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

predict(X)[source]

Predict class labels.

Return type:

ndarray

set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (SAINTClassifier)

Returns:

self (object) – The updated object.

Return type:

SAINTClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (SAINTClassifier)

Returns:

self (object) – The updated object.

Return type:

SAINTClassifier

class endgame.models.SAINTRegressor(n_layers=6, d_model=32, n_heads=8, attention_dropout=0.1, ffn_dropout=0.1, d_ffn_factor=4.0, use_intersample=True, learning_rate=0.0001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=15, cat_cardinality_threshold=20, device='auto', random_state=None, verbose=False)[source]

Bases: BaseEstimator, RegressorMixin

SAINT for regression.

Same architecture as SAINTClassifier but with MSE loss.

Parameters are the same as SAINTClassifier except no n_classes.

Parameters:
  • n_layers (int)

  • d_model (int)

  • n_heads (int)

  • attention_dropout (float)

  • ffn_dropout (float)

  • d_ffn_factor (float)

  • use_intersample (bool)

  • learning_rate (float)

  • weight_decay (float)

  • n_epochs (int)

  • batch_size (int)

  • early_stopping (int)

  • cat_cardinality_threshold (int)

  • device (str)

  • random_state (int | None)

  • verbose (bool)

fit(X, y, eval_set=None, **fit_params)[source]

Fit SAINT regressor.

Return type:

SAINTRegressor

predict(X)[source]

Predict target values.

Return type:

ndarray

set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (SAINTRegressor)

Returns:

self (object) – The updated object.

Return type:

SAINTRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (SAINTRegressor)

Returns:

self (object) – The updated object.

Return type:

SAINTRegressor

class endgame.models.NODEClassifier(n_layers=1, n_trees=128, tree_depth=4, choice_function='softmax', bin_function='sigmoid', learning_rate=0.01, weight_decay=1e-05, n_epochs=100, batch_size=128, early_stopping=20, max_grad_norm=1.0, validation_fraction=0.1, device='auto', random_state=None, verbose=False)[source]

Bases: ClassifierMixin, BaseEstimator

NODE: Neural Oblivious Decision Ensembles for classification.

A differentiable ensemble of oblivious decision trees. Bridges the gap between gradient boosting and neural networks.

Parameters:
  • n_layers (int, default=1) – Number of dense NODE layers. Start with 1 for most datasets.

  • n_trees (int, default=128) – Number of trees per layer. 64-256 works well for most datasets.

  • tree_depth (int, default=4) – Depth of each oblivious tree. 3-5 works well; deeper trees risk overfitting.

  • choice_function (str, default='softmax') – Soft choice function: ‘entmax15’, ‘softmax’. Softmax is more stable.

  • bin_function (str, default='sigmoid') – Binning function: ‘entmoid15’, ‘sigmoid’. Sigmoid is more stable.

  • learning_rate (float, default=0.01) – Learning rate. Higher values (0.01-0.1) often work better for NODE.

  • weight_decay (float, default=1e-5) – L2 regularization.

  • n_epochs (int, default=100) – Maximum training epochs.

  • batch_size (int, default=128) – Training batch size. Smaller batches (64-256) often work better.

  • early_stopping (int, default=20) – Early stopping patience.

  • max_grad_norm (float, default=1.0) – Maximum gradient norm for clipping.

  • validation_fraction (float, default=0.1) – Fraction of training data to use for validation when eval_set not provided. Set to 0 to disable internal validation split (not recommended).

  • device (str, default='auto') – Device: ‘cuda’, ‘cpu’, ‘auto’.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Verbose output.

classes_

Class labels.

Type:

ndarray

model_

Fitted model.

Type:

_NODEModule

history_

Training history.

Type:

dict

Examples

>>> clf = NODEClassifier(n_layers=1, n_trees=128, tree_depth=4)
>>> clf.fit(X_train, y_train, eval_set=(X_val, y_val))
>>> proba = clf.predict_proba(X_test)

Notes

NODE works best with: - An eval_set for early stopping (or validation_fraction > 0) - Higher learning rates (0.01-0.1) than typical neural networks - Smaller batch sizes (64-256) - Fewer/shallower trees than you might expect (start small)

When using sklearn’s cross_val_score (which doesn’t support eval_set), the model will automatically create an internal validation split using validation_fraction of the training data.

fit(X, y, eval_set=None, **fit_params)[source]

Fit the NODE classifier.

Parameters:
  • X (array-like) – Training features.

  • y (array-like) – Training labels.

  • eval_set (tuple, optional) – Validation set for early stopping.

Return type:

NODEClassifier

Returns:

self

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

predict(X)[source]

Predict class labels.

Return type:

ndarray

set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (NODEClassifier)

Returns:

self (object) – The updated object.

Return type:

NODEClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NODEClassifier)

Returns:

self (object) – The updated object.

Return type:

NODEClassifier

class endgame.models.NODERegressor(n_layers=2, n_trees=256, tree_depth=4, choice_function='softmax', bin_function='sigmoid', learning_rate=0.001, weight_decay=1e-05, n_epochs=100, batch_size=512, early_stopping=15, max_grad_norm=1.0, device='auto', random_state=None, verbose=False)[source]

Bases: BaseEstimator, RegressorMixin

NODE for regression.

Same architecture as NODEClassifier but with MSE loss. See NODEClassifier for parameter descriptions.

Parameters:
  • n_layers (int)

  • n_trees (int)

  • tree_depth (int)

  • choice_function (str)

  • bin_function (str)

  • learning_rate (float)

  • weight_decay (float)

  • n_epochs (int)

  • batch_size (int)

  • early_stopping (int)

  • max_grad_norm (float)

  • device (str)

  • random_state (int | None)

  • verbose (bool)

set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (NODERegressor)

Returns:

self (object) – The updated object.

Return type:

NODERegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NODERegressor)

Returns:

self (object) – The updated object.

Return type:

NODERegressor

fit(X, y, eval_set=None, **fit_params)[source]

Fit NODE regressor.

Return type:

NODERegressor

predict(X)[source]

Predict target values.

Return type:

ndarray

class endgame.models.ModernNCAClassifier(n_neighbors=32, embedding_dim=128, hidden_dims=None, temperature=0.1, dropout=0.1, learning_rate=0.001, weight_decay=1e-05, n_epochs=100, batch_size=256, early_stopping=15, device='auto', random_state=None, verbose=False)[source]

Bases: ClassifierMixin, BaseEstimator

Modern Neighborhood Component Analysis classifier.

A kNN-based approach with learned distance metric using neural networks. Surprisingly competitive with gradient boosting on many tasks.

The model learns an embedding space where samples of the same class are close together and samples of different classes are far apart. At inference, it uses kNN in this learned space.

Parameters:
  • n_neighbors (int, default=32) – Number of neighbors for kNN prediction.

  • embedding_dim (int, default=128) – Dimension of learned embedding space.

  • hidden_dims (List[int], default=[256, 256]) – Hidden layer dimensions for embedding network.

  • temperature (float, default=0.1) – Softmax temperature for neighbor weighting.

  • dropout (float, default=0.1) – Dropout rate in embedding network.

  • learning_rate (float, default=1e-3) – Learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization.

  • n_epochs (int, default=100) – Training epochs.

  • batch_size (int, default=256) – Batch size.

  • early_stopping (int, default=15) – Early stopping patience.

  • device (str, default='auto') – Device.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Verbose output.

classes_

Class labels.

Type:

ndarray

model_

Fitted embedding network.

Type:

_EmbeddingNetwork

train_embeddings_

Embeddings of training data.

Type:

ndarray

train_labels_

Training labels.

Type:

ndarray

history_

Training history.

Type:

dict

Examples

>>> clf = ModernNCAClassifier(n_neighbors=32, embedding_dim=128)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

This approach is particularly effective when: - The decision boundary is locally smooth - Class separation can benefit from learned features - You want probabilistic predictions based on neighborhood

fit(X, y, eval_set=None, **fit_params)[source]

Fit the ModernNCA classifier.

Parameters:
  • X (array-like) – Training features.

  • y (array-like) – Training labels.

  • eval_set (tuple, optional) – Validation set.

Return type:

ModernNCAClassifier

Returns:

self

predict_proba(X)[source]

Predict class probabilities using kNN in embedding space.

Return type:

ndarray

predict(X)[source]

Predict class labels.

Return type:

ndarray

transform(X)[source]

Transform features to embedding space.

Parameters:

X (array-like) – Input features.

Return type:

ndarray

Returns:

ndarray – Learned embeddings.

set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (ModernNCAClassifier)

Returns:

self (object) – The updated object.

Return type:

ModernNCAClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (ModernNCAClassifier)

Returns:

self (object) – The updated object.

Return type:

ModernNCAClassifier

class endgame.models.NAMClassifier(n_hidden=32, n_layers=2, activation='relu', dropout=0.0, feature_dropout=0.0, learning_rate=0.005, weight_decay=1e-05, output_regularization=0.0, n_epochs=50, batch_size=1024, early_stopping=10, validation_fraction=0.1, device='auto', random_state=None, verbose=False)[source]

Bases: ClassifierMixin, BaseEstimator

Neural Additive Model for classification.

NAM learns a separate neural network for each input feature, providing interpretability similar to GAMs while leveraging neural network expressivity. The model is fully interpretable as you can visualize each feature’s contribution.

Parameters:
  • n_hidden (int, default=64) – Number of hidden units per feature network.

  • n_layers (int, default=3) – Number of hidden layers per feature network.

  • activation (str, default='relu') – Activation function: ‘relu’ or ‘exu’ (exponential units). ExU can capture more complex shapes but is less stable.

  • dropout (float, default=0.0) – Dropout rate within feature networks.

  • feature_dropout (float, default=0.0) – Probability of dropping entire feature networks during training. Acts as regularization to prevent feature co-adaptation.

  • learning_rate (float, default=1e-3) – Learning rate for Adam optimizer.

  • weight_decay (float, default=1e-5) – L2 regularization strength.

  • output_regularization (float, default=0.0) – Regularization on feature network outputs to encourage sparsity.

  • n_epochs (int, default=100) – Maximum number of training epochs.

  • batch_size (int, default=128) – Training batch size.

  • early_stopping (int, default=20) – Early stopping patience (epochs without improvement).

  • validation_fraction (float, default=0.1) – Fraction of training data for validation when eval_set not provided.

  • device (str, default='auto') – Device to use: ‘cuda’, ‘cpu’, or ‘auto’.

  • random_state (int, optional) – Random seed for reproducibility.

  • verbose (bool, default=False) – Whether to print training progress.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features seen during fit.

Type:

int

model_

The fitted PyTorch module.

Type:

_NAMModule

feature_importances_

Feature importance scores.

Type:

ndarray

history_

Training history with loss values.

Type:

dict

Examples

>>> from endgame.models.tabular import NAMClassifier
>>> clf = NAMClassifier(n_hidden=64, n_layers=3)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)
>>> # Get feature contributions for interpretability
>>> contributions = clf.get_feature_contributions(X_test)

Notes

NAM provides several interpretability features: - get_feature_contributions(X): Get each feature’s contribution - feature_importances_: Overall feature importance - plot_feature_effects(): Visualize learned feature shapes (if matplotlib available)

For best results: - Start with default hyperparameters - Use feature_dropout > 0 if features are correlated - Try ‘exu’ activation for highly non-linear relationships

fit(X, y, eval_set=None, **fit_params)[source]

Fit the NAM classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Training labels.

  • eval_set (tuple of (X_val, y_val), optional) – Validation set for early stopping. If not provided, uses validation_fraction of training data.

  • **fit_params (dict) – Additional parameters (ignored).

Return type:

NAMClassifier

Returns:

self (NAMClassifier) – Fitted classifier.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

get_feature_contributions(X)[source]

Get individual feature contributions for predictions.

This is the key interpretability feature of NAM. Each feature’s contribution shows how it affects the prediction independently.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to explain.

Return type:

ndarray

Returns:

contributions (ndarray of shape (n_samples, n_features)) – Each feature’s contribution to the prediction. Positive values push toward higher class indices.

plot_feature_effects(feature_idx=None, X=None, n_points=100)[source]

Plot learned feature effect shapes.

Parameters:
  • feature_idx (int, optional) – Index of feature to plot. If None, plots all features.

  • X (array-like, optional) – Data to determine feature ranges. If None, uses standard range.

  • n_points (int, default=100) – Number of points to evaluate.

Returns:

fig (matplotlib Figure) – The figure object.

set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (NAMClassifier)

Returns:

self (object) – The updated object.

Return type:

NAMClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NAMClassifier)

Returns:

self (object) – The updated object.

Return type:

NAMClassifier

class endgame.models.NAMRegressor(n_hidden=32, n_layers=2, activation='relu', dropout=0.0, feature_dropout=0.0, learning_rate=0.005, weight_decay=1e-05, output_regularization=0.0, n_epochs=50, batch_size=1024, early_stopping=10, validation_fraction=0.1, device='auto', random_state=None, verbose=False)[source]

Bases: RegressorMixin, BaseEstimator

Neural Additive Model for regression.

Same architecture as NAMClassifier but with MSE loss for continuous target prediction.

Parameters:
  • n_hidden (int, default=64) – Number of hidden units per feature network.

  • n_layers (int, default=3) – Number of hidden layers per feature network.

  • activation (str, default='relu') – Activation function: ‘relu’ or ‘exu’.

  • dropout (float, default=0.0) – Dropout rate within feature networks.

  • feature_dropout (float, default=0.0) – Probability of dropping entire feature networks.

  • learning_rate (float, default=1e-3) – Learning rate.

  • weight_decay (float, default=1e-5) – L2 regularization.

  • output_regularization (float, default=0.0) – Regularization on feature network outputs.

  • n_epochs (int, default=100) – Maximum training epochs.

  • batch_size (int, default=128) – Training batch size.

  • early_stopping (int, default=20) – Early stopping patience.

  • validation_fraction (float, default=0.1) – Fraction for validation when eval_set not provided.

  • device (str, default='auto') – Device to use.

  • random_state (int, optional) – Random seed.

  • verbose (bool, default=False) – Verbose output.

n_features_in_

Number of features.

Type:

int

model_

Fitted model.

Type:

_NAMModule

feature_importances_

Feature importance scores.

Type:

ndarray

history_

Training history.

Type:

dict

Examples

>>> from endgame.models.tabular import NAMRegressor
>>> reg = NAMRegressor(n_hidden=64)
>>> reg.fit(X_train, y_train)
>>> predictions = reg.predict(X_test)
>>> contributions = reg.get_feature_contributions(X_test)
fit(X, y, eval_set=None, **fit_params)[source]

Fit the NAM regressor.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Training targets.

  • eval_set (tuple of (X_val, y_val), optional) – Validation set for early stopping.

  • **fit_params (dict) – Additional parameters (ignored).

Return type:

NAMRegressor

Returns:

self (NAMRegressor) – Fitted regressor.

predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

get_feature_contributions(X)[source]

Get individual feature contributions for predictions.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to explain.

Return type:

ndarray

Returns:

contributions (ndarray of shape (n_samples, n_features)) – Each feature’s contribution to the prediction.

plot_feature_effects(feature_idx=None, X=None, n_points=100)[source]

Plot learned feature effect shapes.

Parameters:
  • feature_idx (int, optional) – Index of feature to plot. If None, plots all features.

  • X (array-like, optional) – Data to determine feature ranges.

  • n_points (int, default=100) – Number of points to evaluate.

Returns:

fig (matplotlib Figure) – The figure object.

set_fit_request(*, eval_set='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • self (NAMRegressor)

Returns:

self (object) – The updated object.

Return type:

NAMRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NAMRegressor)

Returns:

self (object) – The updated object.

Return type:

NAMRegressor

class endgame.models.GPClassifier(kernel='rbf', length_scale=1.0, n_restarts_optimizer=3, max_iter_predict=100, warm_start=False, multi_class='one_vs_rest', auto_scale=True, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

Gaussian Process Classifier with competition-tuned defaults.

A Bayesian kernel method that provides probabilistic predictions with principled uncertainty estimates. Different inductive bias from trees and neural networks, making it valuable for ensemble diversity.

Parameters:
  • kernel (str or sklearn kernel, default='rbf') – Kernel type. Options: ‘rbf’, ‘matern’, ‘matern12’, ‘matern32’, ‘matern52’, ‘rq’, ‘linear’, or a sklearn kernel object.

  • length_scale (float, default=1.0) – Length scale parameter for the kernel.

  • n_restarts_optimizer (int, default=3) – Number of restarts for the optimizer.

  • max_iter_predict (int, default=100) – Maximum iterations for prediction.

  • warm_start (bool, default=False) – Use previous fit as initialization.

  • multi_class (str, default='one_vs_rest') – Multi-class strategy: ‘one_vs_rest’ or ‘one_vs_one’.

  • auto_scale (bool, default=True) – Automatically scale features before fitting.

  • random_state (int, optional) – Random seed for reproducibility.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

model_

Fitted sklearn GP classifier.

Type:

GaussianProcessClassifier

Examples

>>> from endgame.models.kernel import GPClassifier
>>> clf = GPClassifier(kernel='rbf', random_state=42)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)
>>> # Get uncertainty
>>> proba, std = clf.predict_proba(X_test, return_std=True)

Notes

Gaussian Processes excel on small-medium datasets where uncertainty matters. They scale O(n^3) with training size, so not suitable for large datasets (>10k samples) without approximations.

fit(X, y, **fit_params)[source]

Fit the Gaussian Process classifier.

Parameters:
Return type:

GPClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X, return_std=False)[source]

Predict class probabilities.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • return_std (bool, default=False) – If True, also return uncertainty estimates.

Return type:

ndarray | tuple

Returns:

  • proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

  • std (ndarray of shape (n_samples,), optional) – Uncertainty estimates (if return_std=True).

set_predict_proba_request(*, return_std='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the predict_proba method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict_proba.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std parameter in predict_proba.

  • self (GPClassifier)

Returns:

self (object) – The updated object.

Return type:

GPClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (GPClassifier)

Returns:

self (object) – The updated object.

Return type:

GPClassifier

class endgame.models.GPRegressor(kernel='rbf', length_scale=1.0, alpha=1e-10, n_restarts_optimizer=3, normalize_y=True, auto_scale=True, random_state=None)[source]

Bases: RegressorMixin, BaseEstimator

Gaussian Process Regressor with competition-tuned defaults.

A Bayesian kernel method that provides predictions with principled uncertainty estimates through the posterior predictive distribution.

Parameters:
  • kernel (str or sklearn kernel, default='rbf') – Kernel type. Options: ‘rbf’, ‘matern’, ‘matern12’, ‘matern32’, ‘matern52’, ‘rq’, ‘linear’, or a sklearn kernel object.

  • length_scale (float, default=1.0) – Length scale parameter for the kernel.

  • alpha (float, default=1e-10) – Value added to diagonal for numerical stability.

  • n_restarts_optimizer (int, default=3) – Number of restarts for the optimizer.

  • normalize_y (bool, default=True) – Normalize target values.

  • auto_scale (bool, default=True) – Automatically scale features before fitting.

  • random_state (int, optional) – Random seed for reproducibility.

n_features_in_

Number of features.

Type:

int

model_

Fitted sklearn GP regressor.

Type:

GaussianProcessRegressor

Examples

>>> from endgame.models.kernel import GPRegressor
>>> reg = GPRegressor(kernel='matern', random_state=42)
>>> reg.fit(X_train, y_train)
>>> y_pred, y_std = reg.predict(X_test, return_std=True)
>>> # Prediction intervals
>>> lower = y_pred - 1.96 * y_std
>>> upper = y_pred + 1.96 * y_std
fit(X, y, **fit_params)[source]

Fit the Gaussian Process regressor.

Parameters:
Return type:

GPRegressor

Returns:

self

predict(X, return_std=False, return_cov=False)[source]

Predict target values.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • return_std (bool, default=False) – If True, return standard deviation of predictions.

  • return_cov (bool, default=False) – If True, return covariance of predictions.

Returns:

  • y_pred (ndarray of shape (n_samples,)) – Predicted values.

  • y_std (ndarray of shape (n_samples,), optional) – Standard deviation (if return_std=True).

  • y_cov (ndarray of shape (n_samples, n_samples), optional) – Covariance matrix (if return_cov=True).

predict_interval(X, alpha=0.05)[source]

Predict with prediction intervals.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • alpha (float, default=0.05) – Significance level (0.05 = 95% interval).

Return type:

tuple

Returns:

  • y_pred (ndarray of shape (n_samples,)) – Point predictions.

  • lower (ndarray of shape (n_samples,)) – Lower bound of prediction interval.

  • upper (ndarray of shape (n_samples,)) – Upper bound of prediction interval.

sample_y(X, n_samples=1, random_state=None)[source]

Sample from the posterior predictive distribution.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Query points.

  • n_samples (int, default=1) – Number of samples to draw.

  • random_state (int, optional) – Random seed.

Return type:

ndarray

Returns:

samples (ndarray of shape (n_query, n_samples)) – Samples from posterior predictive.

set_predict_request(*, return_cov='$UNCHANGED$', return_std='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • return_cov (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_cov parameter in predict.

  • return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std parameter in predict.

  • self (GPRegressor)

Returns:

self (object) – The updated object.

Return type:

GPRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (GPRegressor)

Returns:

self (object) – The updated object.

Return type:

GPRegressor

class endgame.models.SVMClassifier(kernel='rbf', C=1.0, gamma='scale', degree=3, probability=True, class_weight='balanced', auto_scale=True, max_iter=10000, cache_size=500, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

Support Vector Machine Classifier with competition-tuned defaults.

A max-margin kernel classifier that finds the optimal separating hyperplane. Different optimization objective from probabilistic models, making it valuable for ensemble diversity.

Parameters:
  • kernel (str, default='rbf') – Kernel type: ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’.

  • C (float, default=1.0) – Regularization parameter. Lower = more regularization.

  • gamma (str or float, default='scale') – Kernel coefficient for ‘rbf’, ‘poly’, ‘sigmoid’.

  • degree (int, default=3) – Degree for polynomial kernel.

  • probability (bool, default=True) – Enable probability estimates (uses Platt scaling).

  • class_weight (str or dict, default='balanced') – Class weights: ‘balanced’, None, or dict.

  • auto_scale (bool, default=True) – Automatically scale features before fitting.

  • max_iter (int, default=10000) – Maximum iterations for solver.

  • cache_size (float, default=500) – Kernel cache size in MB.

  • random_state (int, optional) – Random seed for reproducibility.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

model_

Fitted sklearn SVC.

Type:

SVC

support_vectors_

Support vectors from training.

Type:

ndarray

Examples

>>> from endgame.models.kernel import SVMClassifier
>>> clf = SVMClassifier(kernel='rbf', C=1.0, random_state=42)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

SVMs work best when: - Features are scaled (auto_scale=True handles this) - Dataset is small-medium sized (scales O(n^2) to O(n^3)) - Clear margin separation exists

The max-margin objective is fundamentally different from log-loss (logistic regression) or GBDT objectives, providing ensemble diversity.

fit(X, y, sample_weight=None, **fit_params)[source]

Fit the SVM classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target labels.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

Return type:

SVMClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Uses Platt scaling for probability calibration.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

decision_function(X)[source]

Compute decision function values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

decision (ndarray) – Decision function values.

property support_vectors_

Support vectors from training.

property n_support_

Number of support vectors for each class.

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (SVMClassifier)

Returns:

self (object) – The updated object.

Return type:

SVMClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (SVMClassifier)

Returns:

self (object) – The updated object.

Return type:

SVMClassifier

class endgame.models.SVMRegressor(kernel='rbf', C=1.0, epsilon=0.1, gamma='scale', degree=3, auto_scale=True, max_iter=10000, cache_size=500)[source]

Bases: RegressorMixin, BaseEstimator

Support Vector Machine Regressor with competition-tuned defaults.

Epsilon-SVR that finds a tube around the data where deviations smaller than epsilon are ignored.

Parameters:
  • kernel (str, default='rbf') – Kernel type: ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’.

  • C (float, default=1.0) – Regularization parameter.

  • epsilon (float, default=0.1) – Epsilon in the epsilon-SVR model.

  • gamma (str or float, default='scale') – Kernel coefficient.

  • degree (int, default=3) – Degree for polynomial kernel.

  • auto_scale (bool, default=True) – Automatically scale features before fitting.

  • max_iter (int, default=10000) – Maximum iterations for solver.

  • cache_size (float, default=500) – Kernel cache size in MB.

n_features_in_

Number of features.

Type:

int

model_

Fitted sklearn SVR.

Type:

SVR

Examples

>>> from endgame.models.kernel import SVMRegressor
>>> reg = SVMRegressor(kernel='rbf', C=1.0)
>>> reg.fit(X_train, y_train)
>>> y_pred = reg.predict(X_test)
fit(X, y, sample_weight=None, **fit_params)[source]

Fit the SVM regressor.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

Return type:

SVMRegressor

Returns:

self

predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

property support_vectors_

Support vectors from training.

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (SVMRegressor)

Returns:

self (object) – The updated object.

Return type:

SVMRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (SVMRegressor)

Returns:

self (object) – The updated object.

Return type:

SVMRegressor

class endgame.models.ELMClassifier(n_hidden=500, activation='sigmoid', alpha=1e-06, auto_scale=True, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

Extreme Learning Machine Classifier.

A single-layer neural network with random input weights and analytically computed output weights. Training is extremely fast (milliseconds) because there’s no iterative optimization.

Parameters:
  • n_hidden (int, default=500) – Number of hidden neurons.

  • activation (str or callable, default='sigmoid') – Activation function: ‘sigmoid’, ‘tanh’, ‘relu’, ‘leaky_relu’, ‘sin’, ‘hardlim’, or a callable.

  • alpha (float, default=1e-6) – Regularization parameter for ridge regression.

  • auto_scale (bool, default=True) – Automatically scale features before fitting.

  • random_state (int, optional) – Random seed for reproducibility.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

input_weights_

Random input-to-hidden weights.

Type:

ndarray

biases_

Random hidden layer biases.

Type:

ndarray

output_weights_

Learned hidden-to-output weights.

Type:

ndarray

Examples

>>> from endgame.models.baselines import ELMClassifier
>>> clf = ELMClassifier(n_hidden=500, random_state=42)
>>> clf.fit(X_train, y_train)  # Milliseconds!
>>> proba = clf.predict_proba(X_test)

Notes

ELM is valuable for ensemble diversity because: 1. No backpropagation - fundamentally different optimization 2. Random projections explore different feature spaces 3. Extremely fast - can train many models for ensemble selection 4. Often surprisingly competitive with slower methods

The analytical solution is: beta = pinv(H) @ T where H is the hidden layer output and T is the target.

fit(X, y, **fit_params)[source]

Fit the ELM classifier.

Training is O(n * m * h) where n=samples, m=features, h=hidden. The closed-form solution makes this extremely fast.

Parameters:
Return type:

ELMClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities (softmax normalized).

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (ELMClassifier)

Returns:

self (object) – The updated object.

Return type:

ELMClassifier

class endgame.models.ELMRegressor(n_hidden=500, activation='tanh', alpha=0.01, auto_scale=True, random_state=None)[source]

Bases: RegressorMixin, BaseEstimator

Extreme Learning Machine Regressor.

A single-layer neural network with random input weights and analytically computed output weights for regression.

Parameters:
  • n_hidden (int, default=500) – Number of hidden neurons.

  • activation (str or callable, default='tanh') – Activation function. ‘tanh’ is preferred for regression (unbounded, symmetric). ‘sigmoid’ compresses to [0,1].

  • alpha (float, default=0.01) – Regularization parameter for ridge regression on output weights.

  • auto_scale (bool, default=True) – Automatically scale features before fitting.

  • random_state (int, optional) – Random seed for reproducibility.

n_features_in_

Number of features.

Type:

int

input_weights_

Random input-to-hidden weights.

Type:

ndarray

output_weights_

Learned hidden-to-output weights.

Type:

ndarray

Examples

>>> from endgame.models.baselines import ELMRegressor
>>> reg = ELMRegressor(n_hidden=500, random_state=42)
>>> reg.fit(X_train, y_train)
>>> y_pred = reg.predict(X_test)
fit(X, y, **fit_params)[source]

Fit the ELM regressor.

Parameters:
Return type:

ELMRegressor

Returns:

self

predict(X)[source]

Predict target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted values.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (ELMRegressor)

Returns:

self (object) – The updated object.

Return type:

ELMRegressor

class endgame.models.NaiveBayesClassifier(variant='auto', var_smoothing=1e-09, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None)[source]

Bases: ClassifierMixin, BaseEstimator

Naive Bayes Classifier with automatic variant selection.

Automatically selects the appropriate Naive Bayes variant based on feature characteristics, or uses a specified variant.

The feature independence assumption is fundamentally different from tree-based models (which capture interactions) and neural networks (which learn complex dependencies), making this valuable for ensemble diversity.

Parameters:
  • variant (str, default='auto') – Naive Bayes variant: - ‘auto’: Automatically select based on features - ‘gaussian’: For continuous features - ‘bernoulli’: For binary features - ‘multinomial’: For count/frequency features - ‘complement’: For imbalanced text classification

  • var_smoothing (float, default=1e-9) – Portion of the largest variance of all features added to variances for stability (Gaussian only).

  • alpha (float, default=1.0) – Additive smoothing parameter (Bernoulli, Multinomial, Complement).

  • binarize (float or None, default=0.0) – Threshold for binarizing features (Bernoulli only). None means features are already binary.

  • fit_prior (bool, default=True) – Whether to learn class prior probabilities.

  • class_prior (array-like, optional) – Prior probabilities of the classes.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

variant_

The actual variant used (resolved from ‘auto’).

Type:

str

model_

Fitted Naive Bayes model.

Type:

sklearn NB estimator

Examples

>>> from endgame.models.baselines import NaiveBayesClassifier
>>> clf = NaiveBayesClassifier(variant='auto')
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

Despite the naive independence assumption, Naive Bayes often works surprisingly well because: 1. Classification only requires correct ordering, not accurate probabilities 2. Dependencies often “cancel out” when aggregated 3. Regularization effect from the strong prior

For ensembles, NB provides diversity because it makes fundamentally different errors from models that capture feature interactions.

fit(X, y, sample_weight=None, **fit_params)[source]

Fit the Naive Bayes classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target labels.

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights.

Return type:

NaiveBayesClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

predict_log_proba(X)[source]

Predict log class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

log_proba (ndarray of shape (n_samples, n_classes)) – Log class probabilities.

property feature_log_prob_

Log probability of features given a class (for discrete NB).

property class_log_prior_

Log probability of each class.

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (NaiveBayesClassifier)

Returns:

self (object) – The updated object.

Return type:

NaiveBayesClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NaiveBayesClassifier)

Returns:

self (object) – The updated object.

Return type:

NaiveBayesClassifier

class endgame.models.LDAClassifier(solver='svd', shrinkage='auto', n_components=None, store_covariance=False, tol=0.0001)[source]

Bases: ClassifierMixin, BaseEstimator

Linear Discriminant Analysis Classifier.

LDA assumes that all classes share the same covariance matrix. This leads to linear decision boundaries between classes.

Parameters:
  • solver (str, default='svd') – Solver: ‘svd’, ‘lsqr’, ‘eigen’.

  • shrinkage (str, float, or None, default='auto') – Shrinkage parameter: ‘auto’ (Ledoit-Wolf), float in [0,1], or None.

  • n_components (int, optional) – Number of components for dimensionality reduction.

  • store_covariance (bool, default=False) – Store the covariance matrix.

  • tol (float, default=1e-4) – Tolerance for singular value decomposition.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

coef_

Weights of the features.

Type:

ndarray

intercept_

Intercept term.

Type:

ndarray

Examples

>>> from endgame.models.baselines import LDAClassifier
>>> clf = LDAClassifier(shrinkage='auto')
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

LDA is different from logistic regression because: 1. LDA is generative (models P(X|y)), LR is discriminative (models P(y|X)) 2. LDA assumes Gaussian class-conditional distributions 3. LDA can be more efficient with limited data

The shrinkage=’auto’ option uses Ledoit-Wolf estimation which improves performance when n_features > n_samples.

fit(X, y, **fit_params)[source]

Fit the LDA classifier.

Parameters:
Return type:

LDAClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Return type:

ndarray

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

predict_log_proba(X)[source]

Predict log class probabilities.

Return type:

ndarray

transform(X)[source]

Project data to maximize class separation.

Return type:

ndarray

decision_function(X)[source]

Compute decision function.

Return type:

ndarray

property coef_

Feature weights.

property intercept_

Intercept term.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (LDAClassifier)

Returns:

self (object) – The updated object.

Return type:

LDAClassifier

class endgame.models.QDAClassifier(reg_param=0.0, store_covariance=False, tol=0.0001)[source]

Bases: ClassifierMixin, BaseEstimator

Quadratic Discriminant Analysis Classifier.

QDA allows each class to have its own covariance matrix, leading to quadratic decision boundaries between classes.

Parameters:
  • reg_param (float, default=0.0) – Regularization parameter: covariance = (1-reg_param)*cov + reg_param*I

  • store_covariance (bool, default=False) – Store the covariance matrices.

  • tol (float, default=1e-4) – Tolerance for rank estimation.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

Examples

>>> from endgame.models.baselines import QDAClassifier
>>> clf = QDAClassifier(reg_param=0.1)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

QDA is more flexible than LDA because it allows different class covariances. However, this requires estimating more parameters: - LDA: O(d^2) for shared covariance - QDA: O(K * d^2) for K classes

Use reg_param > 0 when you have few samples per class to regularize the covariance estimates toward the identity matrix.

fit(X, y, **fit_params)[source]

Fit the QDA classifier.

Return type:

QDAClassifier

predict(X)[source]

Predict class labels.

Return type:

ndarray

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

predict_log_proba(X)[source]

Predict log class probabilities.

Return type:

ndarray

decision_function(X)[source]

Compute decision function.

Return type:

ndarray

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (QDAClassifier)

Returns:

self (object) – The updated object.

Return type:

QDAClassifier

class endgame.models.RDAClassifier(alpha=0.5, shrinkage=0.0, store_covariance=False)[source]

Bases: ClassifierMixin, BaseEstimator

Regularized Discriminant Analysis Classifier.

RDA interpolates between LDA and QDA using a regularization parameter. This allows finding the optimal trade-off between the bias of LDA and the variance of QDA.

Parameters:
  • alpha (float, default=0.5) – Interpolation parameter between LDA (alpha=1) and QDA (alpha=0). alpha=0.5 is a common middle ground.

  • shrinkage (float, default=0.0) – Shrinkage toward scaled identity: cov = (1-shrinkage)*cov + shrinkage*trace(cov)/d*I

  • store_covariance (bool, default=False) – Store the covariance matrices.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

Examples

>>> from endgame.models.baselines import RDAClassifier
>>> clf = RDAClassifier(alpha=0.5, shrinkage=0.1)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

RDA was proposed by Friedman (1989) to handle the bias-variance trade-off between LDA and QDA. The regularized covariance is:

Sigma_k(alpha, gamma) = alpha * Sigma_pooled + (1-alpha) * Sigma_k

followed by shrinkage toward scaled identity.

This provides a continuous family of classifiers that can adapt to the complexity supported by the data.

fit(X, y, **fit_params)[source]

Fit the RDA classifier.

Return type:

RDAClassifier

predict(X)[source]

Predict class labels.

Return type:

ndarray

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

predict_log_proba(X)[source]

Predict log class probabilities.

Return type:

ndarray

decision_function(X)[source]

Compute decision function (log posteriors).

Return type:

ndarray

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (RDAClassifier)

Returns:

self (object) – The updated object.

Return type:

RDAClassifier

class endgame.models.KNNClassifier(n_neighbors=5, weights='distance', metric='minkowski', p=2, leaf_size=30, algorithm='auto', scale_features=True, n_jobs=-1)[source]

Bases: ClassifierMixin, BaseEstimator

K-Nearest Neighbors Classifier with competition-tuned defaults.

A wrapper around sklearn’s KNeighborsClassifier with automatic feature scaling and sensible defaults for competitive ML.

Parameters:
  • n_neighbors (int, default=5) – Number of neighbors to use.

  • weights (str, default='distance') – Weight function: ‘uniform’ or ‘distance’. ‘distance’ often works better in practice.

  • metric (str, default='minkowski') – Distance metric: ‘minkowski’, ‘euclidean’, ‘manhattan’, ‘cosine’, etc.

  • p (int, default=2) – Power parameter for Minkowski metric. p=2 is Euclidean, p=1 is Manhattan.

  • leaf_size (int, default=30) – Leaf size for BallTree or KDTree.

  • algorithm (str, default='auto') – Algorithm: ‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’.

  • scale_features (bool, default=True) – Whether to standardize features before fitting. Highly recommended for distance-based methods.

  • n_jobs (int, default=-1) – Number of parallel jobs.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

Examples

>>> from endgame.models.baselines import KNNClassifier
>>> clf = KNNClassifier(n_neighbors=5, weights='distance')
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

KNN is different from other models because: 1. Instance-based - stores training data, no explicit model 2. Non-parametric - makes no assumptions about data distribution 3. Local decision boundaries - can capture complex patterns 4. Sensitive to curse of dimensionality in high dimensions

The scale_features=True default is important because KNN relies on distance calculations that can be dominated by features with larger scales.

fit(X, y, **fit_params)[source]

Fit the KNN classifier.

Parameters:
Return type:

KNNClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Return type:

ndarray

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

kneighbors(X=None, n_neighbors=None, return_distance=True)[source]

Find the K-neighbors of a point.

Parameters:
  • X (array-like, optional) – Query points. If None, returns neighbors of training data.

  • n_neighbors (int, optional) – Number of neighbors. If None, uses n_neighbors from init.

  • return_distance (bool, default=True) – Whether to return distances.

Returns:

  • neigh_dist (ndarray (if return_distance=True)) – Distances to neighbors.

  • neigh_ind (ndarray) – Indices of neighbors.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (KNNClassifier)

Returns:

self (object) – The updated object.

Return type:

KNNClassifier

class endgame.models.KNNRegressor(n_neighbors=5, weights='distance', metric='minkowski', p=2, leaf_size=30, algorithm='auto', scale_features=True, n_jobs=-1)[source]

Bases: RegressorMixin, BaseEstimator

K-Nearest Neighbors Regressor with competition-tuned defaults.

A wrapper around sklearn’s KNeighborsRegressor with automatic feature scaling and sensible defaults for competitive ML.

Parameters:
  • n_neighbors (int, default=5) – Number of neighbors to use.

  • weights (str, default='distance') – Weight function: ‘uniform’ or ‘distance’.

  • metric (str, default='minkowski') – Distance metric.

  • p (int, default=2) – Power parameter for Minkowski metric.

  • leaf_size (int, default=30) – Leaf size for BallTree or KDTree.

  • algorithm (str, default='auto') – Algorithm: ‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’.

  • scale_features (bool, default=True) – Whether to standardize features before fitting.

  • n_jobs (int, default=-1) – Number of parallel jobs.

n_features_in_

Number of features.

Type:

int

Examples

>>> from endgame.models.baselines import KNNRegressor
>>> reg = KNNRegressor(n_neighbors=10, weights='distance')
>>> reg.fit(X_train, y_train)
>>> predictions = reg.predict(X_test)

Notes

KNN regression averages (or weighted-averages) the target values of the k nearest neighbors. This provides a local, non-parametric estimate that can capture complex patterns but may suffer from the curse of dimensionality.

fit(X, y, **fit_params)[source]

Fit the KNN regressor.

Parameters:
Return type:

KNNRegressor

Returns:

self

predict(X)[source]

Predict target values.

Return type:

ndarray

kneighbors(X=None, n_neighbors=None, return_distance=True)[source]

Find the K-neighbors of a point.

Parameters:
  • X (array-like, optional) – Query points. If None, returns neighbors of training data.

  • n_neighbors (int, optional) – Number of neighbors. If None, uses n_neighbors from init.

  • return_distance (bool, default=True) – Whether to return distances.

Returns:

  • neigh_dist (ndarray (if return_distance=True)) – Distances to neighbors.

  • neigh_ind (ndarray) – Indices of neighbors.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (KNNRegressor)

Returns:

self (object) – The updated object.

Return type:

KNNRegressor

class endgame.models.LinearClassifier(penalty='l2', C=1.0, l1_ratio=0.5, solver='lbfgs', max_iter=1000, class_weight='balanced', scale_features=True, n_jobs=-1, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

Linear Classifier with competition-tuned defaults.

Wraps LogisticRegression with automatic feature scaling and sensible defaults for competitive ML. Supports both L1, L2, and ElasticNet regularization.

Parameters:
  • penalty (str, default='l2') – Regularization: ‘l1’, ‘l2’, ‘elasticnet’, or ‘none’.

  • C (float, default=1.0) – Inverse of regularization strength. Smaller values = stronger regularization.

  • l1_ratio (float, default=0.5) – ElasticNet mixing parameter (only used when penalty=’elasticnet’).

  • solver (str, default='lbfgs') – Optimization algorithm. ‘saga’ required for L1/ElasticNet.

  • max_iter (int, default=1000) – Maximum iterations for solver.

  • class_weight (str or dict, default='balanced') – Class weights: ‘balanced’ adjusts for class imbalance.

  • scale_features (bool, default=True) – Whether to standardize features before fitting.

  • n_jobs (int, default=-1) – Number of parallel jobs.

  • random_state (int, optional) – Random seed for reproducibility.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

coef_

Feature coefficients.

Type:

ndarray

intercept_

Intercept term.

Type:

ndarray

Examples

>>> from endgame.models.baselines import LinearClassifier
>>> clf = LinearClassifier(penalty='l2', C=1.0)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

Linear classifiers are different from tree-based models because: 1. Global decision boundary - same coefficients for all regions 2. Monotonic feature relationships 3. Implicit feature selection with L1 penalty 4. Well-calibrated probabilities (especially with Platt scaling)

The class_weight=’balanced’ default helps with imbalanced datasets.

fit(X, y, sample_weight=None, **fit_params)[source]

Fit the linear classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target labels.

  • sample_weight (array-like, optional) – Sample weights.

Return type:

LinearClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Return type:

ndarray

predict_proba(X)[source]

Predict class probabilities.

Return type:

ndarray

predict_log_proba(X)[source]

Predict log class probabilities.

Return type:

ndarray

decision_function(X)[source]

Compute decision function.

Return type:

ndarray

property coef_

Feature coefficients.

property intercept_

Intercept term.

property feature_importances_: ndarray

Feature importances (absolute value of coefficients).

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (LinearClassifier)

Returns:

self (object) – The updated object.

Return type:

LinearClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (LinearClassifier)

Returns:

self (object) – The updated object.

Return type:

LinearClassifier

class endgame.models.LinearRegressor(penalty='l2', alpha=1.0, l1_ratio=0.5, max_iter=1000, scale_features=True, random_state=None)[source]

Bases: RegressorMixin, BaseEstimator

Linear Regressor with competition-tuned defaults.

Wraps Ridge/Lasso/ElasticNet with automatic feature scaling and sensible defaults for competitive ML.

Parameters:
  • penalty (str, default='l2') – Regularization: ‘l1’ (Lasso), ‘l2’ (Ridge), ‘elasticnet’.

  • alpha (float, default=1.0) – Regularization strength. Larger values = stronger regularization.

  • l1_ratio (float, default=0.5) – ElasticNet mixing parameter (only used when penalty=’elasticnet’).

  • max_iter (int, default=1000) – Maximum iterations for solver (only for L1/ElasticNet).

  • scale_features (bool, default=True) – Whether to standardize features before fitting.

  • random_state (int, optional) – Random seed for reproducibility.

n_features_in_

Number of features.

Type:

int

coef_

Feature coefficients.

Type:

ndarray

intercept_

Intercept term.

Type:

float

Examples

>>> from endgame.models.baselines import LinearRegressor
>>> reg = LinearRegressor(penalty='l2', alpha=1.0)
>>> reg.fit(X_train, y_train)
>>> predictions = reg.predict(X_test)

Notes

Linear regression provides: 1. Interpretable coefficients 2. Fast training and inference 3. L1 penalty for feature selection 4. L2 penalty for multicollinearity

fit(X, y, sample_weight=None, **fit_params)[source]

Fit the linear regressor.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like, optional) – Sample weights.

Return type:

LinearRegressor

Returns:

self

predict(X)[source]

Predict target values.

Return type:

ndarray

property coef_

Feature coefficients.

property intercept_

Intercept term.

property feature_importances_: ndarray

Feature importances (absolute value of coefficients).

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (LinearRegressor)

Returns:

self (object) – The updated object.

Return type:

LinearRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (LinearRegressor)

Returns:

self (object) – The updated object.

Return type:

LinearRegressor

class endgame.models.PRIMClassifier(alpha=0.05, min_support=20, pasting=True, paste_alpha=0.01, n_boxes=1)[source]

Bases: ClassifierMixin, BaseEstimator

PRIM for classification via one-vs-rest subgroup discovery.

Trains a PRIM regressor per class (one-vs-rest) on the binary indicator for each class. At prediction time, the class whose box gives the highest density for a sample wins; samples not in any box are assigned the majority class.

Parameters:
  • alpha (float, default=0.05) – Peeling fraction.

  • min_support (int or float, default=20) – Minimum number of points in a box.

  • pasting (bool, default=True) – Whether to apply pasting after peeling.

  • paste_alpha (float, default=0.01) – Pasting fraction.

  • n_boxes (int, default=1) – Number of boxes to find per class.

Examples

>>> from endgame.models.subgroup import PRIMClassifier
>>> prim = PRIMClassifier(alpha=0.05)
>>> prim.fit(X, y)
>>> preds = prim.predict(X)
>>> print(prim.get_rules())
fit(X, y, feature_names=None)[source]

Fit one PRIM model per class (one-vs-rest).

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target labels.

  • feature_names (list of str, optional) – Names of features.

Return type:

PRIMClassifier

Returns:

self

property boxes_: list[list[Box]]

Get the discovered boxes for each class.

property feature_names_in_: ndarray | None

Get feature names.

property n_features_in_: int

Get number of features.

predict_proba(X)[source]

Estimate class probabilities based on box densities.

For each sample, the probability for class c is the density of the best box that contains it, or the base rate if no box contains it. Probabilities are row-normalised.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data points.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data points.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – Predicted class labels.

score(X, y)[source]

Classification accuracy.

Parameters:
  • X (array-like) – Features.

  • y (array-like) – Target labels.

Return type:

float

Returns:

score (float) – Accuracy.

get_rules()[source]

Get human-readable rules for all classes and boxes.

Return type:

list[list[list[Text]]]

Returns:

rules (list[list[list[str]]]) – rules[class_idx][box_idx] is a list of rule strings.

set_fit_request(*, feature_names='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • feature_names (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for feature_names parameter in fit.

  • self (PRIMClassifier)

Returns:

self (object) – The updated object.

Return type:

PRIMClassifier

class endgame.models.PRIMRegressor(alpha=0.05, threshold_type='quantile', threshold=0.9, min_support=20, pasting=True, paste_alpha=0.01, n_boxes=1)[source]

Bases: RegressorMixin, BaseEstimator

PRIM (Patient Rule Induction Method) for regression/continuous targets.

Finds rectangular regions where the target variable has unusually high mean values. Uses iterative peeling to shrink boxes while increasing target density.

Parameters:
  • alpha (float, default=0.05) – Peeling fraction - proportion of data removed in each peel. Smaller values = more “patient” peeling.

  • threshold_type (str, default='quantile') – How to define “interesting” regions: ‘quantile’ or ‘absolute’.

  • threshold (float, default=0.9) – Threshold for defining interesting regions. If ‘quantile’, fraction of top values to consider interesting.

  • min_support (int or float, default=20) – Minimum number of points in a box. If float, interpreted as fraction.

  • pasting (bool, default=True) – Whether to apply pasting (box expansion) after peeling.

  • paste_alpha (float, default=0.01) – Pasting fraction for box expansion.

  • n_boxes (int, default=1) – Number of boxes to find (sequential covering).

result_

Full PRIM analysis result.

Type:

PRIMResult

boxes_

The final boxes found.

Type:

List[Box]

feature_names_in_

Names of features.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

Examples

>>> from endgame.models.subgroup import PRIMRegressor
>>> prim = PRIMRegressor(alpha=0.05, min_support=30)
>>> prim.fit(X, y)
>>> print(prim.boxes_[0].to_rules())
>>> mask = prim.predict(X)  # Boolean mask of points in box

Notes

PRIM works best when: 1. You’re looking for interpretable subgroups 2. The target has heterogeneous behavior across the feature space 3. You want rectangular (axis-aligned) regions

fit(X, y, feature_names=None)[source]

Fit PRIM to find high-density regions.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Target values (higher = more interesting).

  • feature_names (list of str, optional) – Names of features for interpretable output.

Return type:

PRIMRegressor

Returns:

self

predict(X)[source]

Predict target values based on box membership.

Points inside a box get that box’s mean target density. Points outside all boxes get the global training mean.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data points.

Return type:

ndarray

Returns:

predictions (ndarray of shape (n_samples,)) – Predicted target values.

predict_membership(X)[source]

Predict whether points fall in the found box(es).

Parameters:

X (array-like of shape (n_samples, n_features)) – Data points.

Return type:

ndarray

Returns:

mask (ndarray of shape (n_samples,)) – Boolean mask, True if point is in any box.

score(X, y)[source]

Score the model: mean target value in predicted boxes.

Parameters:
  • X (array-like) – Features.

  • y (array-like) – Target values.

Return type:

float

Returns:

score (float) – Mean target value in boxes minus overall mean.

get_rules()[source]

Get human-readable rules for all boxes.

Return type:

list[list[Text]]

Returns:

rules (list of list of str) – Rules for each box.

set_fit_request(*, feature_names='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • feature_names (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for feature_names parameter in fit.

  • self (PRIMRegressor)

Returns:

self (object) – The updated object.

Return type:

PRIMRegressor

class endgame.models.Box(limits=<factory>, coverage=1.0, density=0.0, support=0)[source]

Bases: object

A rectangular region (box) in feature space.

Parameters:
limits

Feature index -> (lower, upper) bound.

Type:

Dict[int, Tuple[float, float]]

coverage

Fraction of data points inside the box.

Type:

float

density

Mean target value inside the box.

Type:

float

support

Number of data points inside the box.

Type:

int

limits: dict[int, tuple[float, float]]
coverage: float = 1.0
density: float = 0.0
support: int = 0
contains(X)[source]

Check which points are inside the box.

Parameters:

X (ndarray of shape (n_samples, n_features)) – Data points to check.

Return type:

ndarray

Returns:

mask (ndarray of shape (n_samples,)) – Boolean mask, True if point is inside box.

to_rules(feature_names=None)[source]

Convert box to human-readable rules.

Parameters:

feature_names (list of str, optional) – Names of features.

Return type:

list[Text]

Returns:

rules (list of str) – List of rule strings.

class endgame.models.PRIMResult(boxes=<factory>, peeling_trajectory=<factory>, selected_box=None, selected_idx=-1)[source]

Bases: object

Result of PRIM analysis.

Parameters:
boxes

Sequence of boxes from peeling trajectory.

Type:

List[Box]

peeling_trajectory

Statistics at each peeling step.

Type:

List[Dict]

selected_box

The selected box (based on some criterion).

Type:

Box

selected_idx

Index of selected box in trajectory.

Type:

int

boxes: list[Box]
peeling_trajectory: list[dict[str, float]]
selected_box: Box | None = None
selected_idx: int = -1
get_pareto_frontier()[source]

Get indices of boxes on the coverage-density Pareto frontier.

Return type:

list[int]

class endgame.models.OrdinalClassifier(variant='auto', alpha=1.0, max_iter=1000, auto_scale=True, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

Unified Ordinal Regression Classifier with auto-variant selection.

Wraps mord library ordinal regression methods with automatic model selection based on data characteristics.

Ordinal regression is critical for ordered categorical targets where standard classification ignores the ordering (e.g., rating prediction, grade classification, severity levels).

Parameters:
  • variant (str, default='auto') – Ordinal regression variant: - ‘auto’: Automatically select based on data - ‘at’: All-Threshold (LogisticAT) - most common - ‘it’: Immediate-Threshold (LogisticIT) - ‘se’: All-Threshold with absolute errors - ‘lad’: Least Absolute Deviation - ‘ridge’: Ordinal Ridge regression

  • alpha (float, default=1.0) – Regularization strength (inverse of C for logistic models, regularization strength for Ridge/LAD).

  • max_iter (int, default=1000) – Maximum iterations for optimization.

  • auto_scale (bool, default=True) – Whether to standardize features before fitting.

  • random_state (int, optional) – Random seed (not used by all variants).

classes_

Ordered class labels.

Type:

ndarray

n_classes_

Number of classes.

Type:

int

n_features_in_

Number of features.

Type:

int

variant_

The actual variant used.

Type:

str

model_

Fitted ordinal regression model.

Type:

mord estimator

coef_

Feature coefficients.

Type:

ndarray

theta_

Class thresholds (boundaries).

Type:

ndarray

Examples

>>> from endgame.models.ordinal import OrdinalClassifier
>>> clf = OrdinalClassifier(variant='at', alpha=1.0)
>>> clf.fit(X_train, y_train)  # y_train has ordered labels
>>> y_pred = clf.predict(X_test)
>>> proba = clf.predict_proba(X_test)

Notes

Ordinal regression assumes: 1. Target classes have a meaningful order 2. A latent continuous variable underlies the ordered categories 3. Thresholds partition this latent space into ordered categories

The cumulative model is:

P(Y <= j) = g(theta_j - X @ beta)

where g is a link function (logistic, probit, etc.).

fit(X, y, sample_weight=None, **fit_params)[source]

Fit the ordinal regression model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Ordered target labels. Labels should be integers 0, 1, 2, … or will be encoded to integers preserving order.

  • sample_weight (array-like, optional) – Not supported by mord, ignored.

Return type:

OrdinalClassifier

Returns:

self

predict(X)[source]

Predict ordinal class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

For ordinal regression, probabilities are derived from the cumulative model:

P(Y = j) = P(Y <= j) - P(Y <= j-1)

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

property coef_: ndarray

Feature coefficients.

property theta_: ndarray

Class thresholds (boundaries).

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (OrdinalClassifier)

Returns:

self (object) – The updated object.

Return type:

OrdinalClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (OrdinalClassifier)

Returns:

self (object) – The updated object.

Return type:

OrdinalClassifier

class endgame.models.OrdinalRidge(alpha=1.0, max_iter=1000, auto_scale=True, random_state=None)[source]

Bases: OrdinalClassifier

Ordinal Ridge Regression.

Ridge regression for ordinal targets. Uses L2 regularization. Good for smaller datasets and many ordinal classes.

Parameters:
  • alpha (float, default=1.0) – Regularization strength.

  • max_iter (int, default=1000) – Maximum iterations.

  • auto_scale (bool, default=True) – Whether to standardize features.

  • random_state (int | None)

Examples

>>> from endgame.models.ordinal import OrdinalRidge
>>> clf = OrdinalRidge(alpha=1.0)
>>> clf.fit(X_train, y_train)
>>> y_pred = clf.predict(X_test)
set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (OrdinalRidge)

Returns:

self (object) – The updated object.

Return type:

OrdinalRidge

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (OrdinalRidge)

Returns:

self (object) – The updated object.

Return type:

OrdinalRidge

class endgame.models.LogisticAT(alpha=1.0, max_iter=1000, auto_scale=True, random_state=None)[source]

Bases: OrdinalClassifier

All-Threshold Ordinal Logistic Regression.

The most common ordinal regression model. Each class boundary has its own threshold parameter.

Also known as: Proportional Odds Model, Cumulative Logit Model.

Parameters:
  • alpha (float, default=1.0) – Regularization strength (inverse of C).

  • max_iter (int, default=1000) – Maximum iterations.

  • auto_scale (bool, default=True) – Whether to standardize features.

  • random_state (int | None)

Examples

>>> from endgame.models.ordinal import LogisticAT
>>> clf = LogisticAT(alpha=1.0)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)
set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (LogisticAT)

Returns:

self (object) – The updated object.

Return type:

LogisticAT

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (LogisticAT)

Returns:

self (object) – The updated object.

Return type:

LogisticAT

class endgame.models.LogisticIT(alpha=1.0, max_iter=1000, auto_scale=True, random_state=None)[source]

Bases: OrdinalClassifier

Immediate-Threshold Ordinal Logistic Regression.

Adjacent classes share threshold boundaries. More constrained than All-Threshold, which can help with small datasets.

Parameters:
  • alpha (float, default=1.0) – Regularization strength.

  • max_iter (int, default=1000) – Maximum iterations.

  • auto_scale (bool, default=True) – Whether to standardize features.

  • random_state (int | None)

Examples

>>> from endgame.models.ordinal import LogisticIT
>>> clf = LogisticIT(alpha=1.0)
>>> clf.fit(X_train, y_train)
>>> y_pred = clf.predict(X_test)
set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (LogisticIT)

Returns:

self (object) – The updated object.

Return type:

LogisticIT

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (LogisticIT)

Returns:

self (object) – The updated object.

Return type:

LogisticIT

class endgame.models.LogisticSE(alpha=1.0, max_iter=1000, auto_scale=True, random_state=None)[source]

Bases: OrdinalClassifier

Squared-Error Ordinal Logistic Regression.

All-Threshold variant but using squared errors in optimization. Can be more robust to outliers.

Parameters:
  • alpha (float, default=1.0) – Regularization strength.

  • max_iter (int, default=1000) – Maximum iterations.

  • auto_scale (bool, default=True) – Whether to standardize features.

  • random_state (int | None)

Examples

>>> from endgame.models.ordinal import LogisticSE
>>> clf = LogisticSE(alpha=1.0)
>>> clf.fit(X_train, y_train)
>>> y_pred = clf.predict(X_test)
set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (LogisticSE)

Returns:

self (object) – The updated object.

Return type:

LogisticSE

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (LogisticSE)

Returns:

self (object) – The updated object.

Return type:

LogisticSE

class endgame.models.LAD(alpha=1.0, max_iter=1000, auto_scale=True, random_state=None)[source]

Bases: OrdinalClassifier

Least Absolute Deviation Ordinal Regression.

Uses L1 loss (absolute errors) instead of L2. More robust to outliers in the target variable.

Parameters:
  • alpha (float, default=1.0) – Regularization strength (inverse of C parameter).

  • max_iter (int, default=1000) – Maximum iterations.

  • auto_scale (bool, default=True) – Whether to standardize features.

  • random_state (int | None)

Examples

>>> from endgame.models.ordinal import LAD
>>> clf = LAD(alpha=1.0)
>>> clf.fit(X_train, y_train)
>>> y_pred = clf.predict(X_test)
set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

  • self (LAD)

Returns:

self (object) – The updated object.

Return type:

LAD

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (LAD)

Returns:

self (object) – The updated object.

Return type:

LAD

class endgame.models.BARTClassifier(n_trees=50, n_samples=1000, n_tune=500, n_chains=2, alpha=0.95, beta=2.0, auto_scale=True, random_state=None)[source]

Bases: ClassifierMixin, BaseEstimator

Bayesian Additive Regression Trees Classifier.

BART for classification uses a probit or logit link function to model class probabilities. The latent function is modeled as a sum of many trees with Bayesian priors.

Parameters:
  • n_trees (int, default=50) – Number of trees in the ensemble.

  • n_samples (int, default=1000) – Number of posterior samples.

  • n_tune (int, default=500) – Number of tuning samples.

  • n_chains (int, default=2) – Number of MCMC chains.

  • alpha (float, default=0.95) – Tree depth prior parameter.

  • beta (float, default=2.0) – Tree depth penalty parameter.

  • auto_scale (bool, default=True) – Whether to standardize features.

  • random_state (int, optional) – Random seed.

classes_

Unique class labels.

Type:

ndarray

n_features_in_

Number of features.

Type:

int

variable_importance_

Feature importance scores.

Type:

ndarray

Examples

>>> from endgame.models.probabilistic import BARTClassifier
>>> clf = BARTClassifier(n_trees=50, n_samples=500, random_state=42)
>>> clf.fit(X_train, y_train)
>>> proba = clf.predict_proba(X_test)

Notes

For binary classification, BART uses probit regression:

P(y=1|X) = Phi(sum of trees)

where Phi is the standard normal CDF.

For multiclass, a softmax or one-vs-rest approach is used.

fit(X, y, **fit_params)[source]

Fit the BART classifier.

Parameters:
Return type:

BARTClassifier

Returns:

self

predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, n_classes)) – Class probabilities.

property feature_importances_: ndarray

Feature importance based on posterior split frequencies.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (BARTClassifier)

Returns:

self (object) – The updated object.

Return type:

BARTClassifier

class endgame.models.BARTRegressor(n_trees=50, n_samples=1000, n_tune=500, n_chains=2, alpha=0.95, beta=2.0, auto_scale=True, random_state=None)[source]

Bases: RegressorMixin, BaseEstimator

Bayesian Additive Regression Trees Regressor.

BART models the conditional mean function as a sum of many regression trees, using Bayesian priors to regularize complexity. Unlike greedy boosting (XGBoost, LightGBM), BART uses MCMC to explore the posterior distribution of tree structures.

Parameters:
  • n_trees (int, default=50) – Number of trees in the ensemble. 50-200 trees typically work well. More trees = smoother predictions but slower inference.

  • n_samples (int, default=1000) – Number of posterior samples to draw via MCMC.

  • n_tune (int, default=500) – Number of tuning samples (burn-in) before posterior sampling.

  • n_chains (int, default=2) – Number of MCMC chains to run in parallel.

  • alpha (float, default=0.95) – Prior probability that a tree split terminates at depth d. Higher values favor shallower trees.

  • beta (float, default=2.0) – Prior rate of decrease in split probability with depth. Higher values penalize deeper trees more strongly.

  • auto_scale (bool, default=True) – Whether to standardize features before fitting.

  • random_state (int, optional) – Random seed for reproducibility.

n_features_in_

Number of features seen during fit.

Type:

int

variable_importance_

Relative importance of each feature (based on split frequency).

Type:

ndarray of shape (n_features,)

Examples

>>> from endgame.models.probabilistic import BARTRegressor
>>> reg = BARTRegressor(n_trees=50, n_samples=500, random_state=42)
>>> reg.fit(X_train, y_train)
>>> y_pred = reg.predict(X_test)
>>> intervals = reg.predict_interval(X_test, alpha=0.1)  # 90% intervals

Notes

BART’s Bayesian approach provides: 1. Uncertainty quantification: Full posterior over predictions 2. Regularization via priors: Avoids overfitting without CV 3. Variable importance: Based on posterior split frequencies 4. Different inductive bias: Complements greedy boosted trees

For ensemble diversity, BART makes fundamentally different errors than XGBoost/LightGBM because it explores tree space via MCMC rather than greedy sequential fitting.

fit(X, y, **fit_params)[source]

Fit the BART regressor using MCMC.

Parameters:
Return type:

BARTRegressor

Returns:

self

predict(X)[source]

Predict mean target values.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

y_pred (ndarray of shape (n_samples,)) – Predicted mean values.

predict_interval(X, alpha=0.1)[source]

Predict credible intervals.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to predict.

  • alpha (float, default=0.1) – Significance level. Returns (1-alpha)*100% credible intervals.

Return type:

ndarray

Returns:

intervals (ndarray of shape (n_samples, 2)) – Lower and upper bounds of credible intervals.

predict_std(X)[source]

Predict posterior standard deviation.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict.

Return type:

ndarray

Returns:

std (ndarray of shape (n_samples,)) – Posterior standard deviation for each prediction.

property feature_importances_: ndarray

Feature importance based on posterior split frequencies.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (BARTRegressor)

Returns:

self (object) – The updated object.

Return type:

BARTRegressor

class endgame.models.SymbolicRegressor(preset='default', operators='scientific', binary_operators=None, unary_operators=None, niterations=None, maxsize=None, maxdepth=None, populations=None, population_size=None, parsimony=None, model_selection='best', loss='L2DistLoss()', constraints=None, nested_constraints=None, denoise=False, select_k_features=None, turbo=False, parallelism='multithreading', procs=None, random_state=None, verbosity=0, temp_equation_file=True, output_directory=None)[source]

Bases: BaseEstimator, RegressorMixin

Symbolic Regression for discovering interpretable equations.

Uses multi-population genetic programming with Pareto-frontier tracking to find symbolic expressions balancing accuracy and complexity.

Parameters:
  • preset (str, default="default") – Preset configuration: “fast”, “default”, “competition”, “interpretable”.

  • operators (str or dict, default="scientific") – Operator set name or dict with “binary_operators”/”unary_operators”.

  • binary_operators (list of str, optional) – Explicit binary operators (overrides operators).

  • unary_operators (list of str, optional) – Explicit unary operators (overrides operators).

  • niterations (int, optional) – Number of GP iterations.

  • maxsize (int, optional) – Max tree complexity (nodes).

  • maxdepth (int, optional) – Max tree depth.

  • populations (int, optional) – Number of sub-populations.

  • population_size (int, optional) – Individuals per population.

  • parsimony (float, optional) – Complexity penalty added to loss.

  • model_selection (str, default="best") – “best” (lowest loss) or “score” (loss-complexity trade-off).

  • loss (str, default="L2DistLoss()") – Loss function name. Accepts Julia-style names for backward compatibility (e.g. "L2DistLoss()") or Python names ("mse", "mae", "huber").

  • constraints (dict, optional) – Reserved for API compatibility (not enforced in GP engine).

  • nested_constraints (dict, optional) – Reserved for API compatibility.

  • denoise (bool, default=False) – Reserved for API compatibility.

  • select_k_features (int, optional) – Reserved for API compatibility.

  • turbo (bool, default=False) – Reserved for API compatibility.

  • parallelism (str, default="multithreading") – Reserved for API compatibility (GP runs single-threaded).

  • procs (int, optional) – Reserved for API compatibility.

  • random_state (int, optional) – Random seed.

  • verbosity (int, default=0) – 0 = silent, 1 = progress, 2 = detailed.

  • temp_equation_file (bool, default=True) – Reserved for API compatibility.

  • output_directory (str, optional) – Reserved for API compatibility.

equations_

All discovered equations with loss and complexity.

Type:

DataFrame

best_equation_

String of the best equation.

Type:

str

best_loss_

Loss of the best equation.

Type:

float

best_complexity_

Complexity of the best equation.

Type:

int

n_features_in_

Number of features seen during fit.

Type:

int

feature_names_in_

Feature names.

Type:

ndarray

fit(X, y, **fit_params)[source]

Fit symbolic regression model.

Parameters:
Return type:

SymbolicRegressor

Returns:

self

predict(X, index=None)[source]

Predict using the discovered equation.

Parameters:
  • X (array-like of shape (n_samples, n_features))

  • index (int, optional) – Complexity level to use. If None, uses best equation.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

y_pred (ndarray of shape (n_samples,))

sympy(index=None)[source]

Return SymPy expression of the best (or indexed) equation.

Parameters:

index (int | None)

latex(index=None)[source]

Return LaTeX string of the equation.

Return type:

Text

Parameters:

index (int | None)

get_best_equation()[source]
Return type:

Text

get_pareto_frontier()[source]

Return Pareto-optimal equations as a DataFrame.

Return type:

DataFrame

get_equation_at_complexity(complexity)[source]
Return type:

Text | None

Parameters:

complexity (int)

property feature_importances_: ndarray[tuple[Any, ...], dtype[_ScalarT]]

Feature importances from equation structure (occurrence count).

summary()[source]
Return type:

Text

set_predict_request(*, index='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • index (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for index parameter in predict.

  • self (SymbolicRegressor)

Returns:

self (object) – The updated object.

Return type:

SymbolicRegressor

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (SymbolicRegressor)

Returns:

self (object) – The updated object.

Return type:

SymbolicRegressor

class endgame.models.SymbolicClassifier(preset='default', operators='scientific', binary_operators=None, unary_operators=None, niterations=None, maxsize=None, maxdepth=None, populations=None, population_size=None, parsimony=None, model_selection='best', constraints=None, nested_constraints=None, denoise=False, select_k_features=None, turbo=False, parallelism='multithreading', procs=None, random_state=None, verbosity=0, temp_equation_file=True, output_directory=None, threshold=0.5)[source]

Bases: BaseEstimator, ClassifierMixin

Symbolic Classification via logistic transformation of symbolic regression.

For binary classification, fits a symbolic regression model to the log-odds and applies sigmoid transformation for probabilities.

For multiclass, uses one-vs-rest strategy with multiple symbolic regressors.

Parameters:
  • accepted. (All parameters from SymbolicRegressor are)

  • threshold (float, default=0.5) – Classification threshold for binary classification.

  • preset (str)

  • operators (str | dict[str, list[str]])

  • binary_operators (list[str] | None)

  • unary_operators (list[str] | None)

  • niterations (int | None)

  • maxsize (int | None)

  • maxdepth (int | None)

  • populations (int | None)

  • population_size (int | None)

  • parsimony (float | None)

  • model_selection (str)

  • constraints (dict | None)

  • nested_constraints (dict | None)

  • denoise (bool)

  • select_k_features (int | None)

  • turbo (bool)

  • parallelism (str)

  • procs (int | None)

  • random_state (int | None)

  • verbosity (int)

  • temp_equation_file (bool)

  • output_directory (str | None)

model_

Underlying symbolic regressor(s).

Type:

SymbolicRegressor or list of SymbolicRegressor

classes_

Unique class labels.

Type:

ndarray

n_classes_

Number of classes.

Type:

int

fit(X, y, **fit_params)[source]

Fit symbolic classifier.

Parameters:
Return type:

SymbolicClassifier

Returns:

self

predict_proba(X)[source]
Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

predict(X)[source]
Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

get_best_equation(class_idx=1)[source]
Return type:

Text

Parameters:

class_idx (int)

sympy(class_idx=1)[source]
Parameters:

class_idx (int)

property feature_importances_: ndarray[tuple[Any, ...], dtype[_ScalarT]]
summary()[source]
Return type:

Text

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (SymbolicClassifier)

Returns:

self (object) – The updated object.

Return type:

SymbolicClassifier

class endgame.models.NEATClassifier(population_size=150, n_generations=100, n_hidden=0, activation_default='sigmoid', random_state=None, verbose=0)[source]

Bases: BaseEstimator, ClassifierMixin

NEAT classifier using neat-python.

Evolves neural network topology and weights using the NEAT algorithm.

Parameters:
  • population_size (int) – Number of individuals per generation.

  • n_generations (int) – Number of evolutionary generations.

  • n_hidden (int) – Initial number of hidden nodes (0 = minimal topology).

  • activation_default (str) – Default activation function for new nodes.

  • random_state (int or None) – Random seed for reproducibility.

  • verbose (int) – Verbosity level (0 = silent).

fit(X, y)[source]

Fit the NEAT classifier by evolving neural network topology.

predict_proba(X)[source]

Predict class probabilities using the best evolved network.

predict(X)[source]

Predict class labels.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NEATClassifier)

Returns:

self (object) – The updated object.

Return type:

NEATClassifier

class endgame.models.NEATRegressor(population_size=150, n_generations=100, n_hidden=0, activation_default='tanh', random_state=None, verbose=0)[source]

Bases: BaseEstimator, RegressorMixin

NEAT regressor using neat-python.

Evolves neural network topology and weights using the NEAT algorithm, optimizing for mean squared error. Targets are normalized internally so that network outputs (near [-1, 1]) can match the target scale.

Parameters:
  • population_size (int) – Number of individuals per generation.

  • n_generations (int) – Number of evolutionary generations.

  • n_hidden (int) – Initial number of hidden nodes (0 = minimal topology).

  • activation_default (str) – Default activation function for new nodes.

  • random_state (int or None) – Random seed for reproducibility.

  • verbose (int) – Verbosity level (0 = silent).

fit(X, y)[source]

Fit the NEAT regressor by evolving neural network topology.

predict(X)[source]

Predict continuous values using the best evolved network.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (NEATRegressor)

Returns:

self (object) – The updated object.

Return type:

NEATRegressor

class endgame.models.TensorNEATClassifier(population_size=1000, n_generations=100, species_size=10, random_state=None, verbose=0)[source]

Bases: BaseEstimator, ClassifierMixin

TensorNEAT classifier — GPU-accelerated neuroevolution via JAX.

Parameters:
  • population_size (int) – Number of individuals per generation.

  • n_generations (int) – Number of evolutionary generations.

  • species_size (int) – Target number of species for speciation.

  • random_state (int or None) – Random seed for reproducibility.

  • verbose (int) – Verbosity level (0 = silent).

fit(X, y)[source]

Fit the TensorNEAT classifier.

predict_proba(X)[source]

Predict class probabilities using the best evolved genome.

predict(X)[source]

Predict class labels.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (TensorNEATClassifier)

Returns:

self (object) – The updated object.

Return type:

TensorNEATClassifier

class endgame.models.TensorNEATRegressor(population_size=1000, n_generations=100, species_size=10, random_state=None, verbose=0)[source]

Bases: BaseEstimator, RegressorMixin

TensorNEAT regressor — GPU-accelerated neuroevolution via JAX.

Parameters:
  • population_size (int) – Number of individuals per generation.

  • n_generations (int) – Number of evolutionary generations.

  • species_size (int) – Target number of species for speciation.

  • random_state (int or None) – Random seed for reproducibility.

  • verbose (int) – Verbosity level (0 = silent).

fit(X, y)[source]

Fit the TensorNEAT regressor.

predict(X)[source]

Predict continuous values using the best evolved genome.

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (TensorNEATRegressor)

Returns:

self (object) – The updated object.

Return type:

TensorNEATRegressor