Calibration¶

class endgame.calibration.ConformalClassifier(estimator, method='lac', alpha=0.1, k_reg=1, lambda_reg=0.01, random_state=None)[source]¶

Bases: BaseEstimator, ClassifierMixin

Conformal prediction wrapper for classification.

Provides prediction sets with guaranteed coverage probability. Under exchangeability, P(y ∈ C(X)) >= 1 - alpha for a new test point.

Parameters:

estimator (sklearn-compatible classifier) – Base classifier (must have predict_proba).
method (str, default='lac') – Conformal method: - ‘lac’: Least Ambiguous set-valued Classifier (adaptive) - ‘aps’: Adaptive Prediction Sets (randomized, exact coverage) - ‘raps’: Regularized APS (controls set size with penalty) - ‘naive’: Simple threshold on class probabilities
alpha (float, default=0.1) – Miscoverage rate (1 - alpha = coverage probability).
k_reg (int, default=1) – RAPS regularization: penalize sets larger than k_reg.
lambda_reg (float, default=0.01) – RAPS regularization strength.
random_state (int, optional) – Random seed for reproducibility.

estimator_¶

Fitted base classifier.

Type:: estimator

classes_¶

Class labels.

Type:: ndarray

n_classes_¶

Number of classes.

Type:: int

conformity_scores_¶

Calibration conformity scores.

Type:: ndarray

quantile_¶

Calibrated quantile threshold.

Type:: float

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from endgame.calibration import ConformalClassifier
>>>
>>> # Split data: train, calibration, test
>>> X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4)
>>> X_cal, X_test, y_cal, y_test = train_test_split(X_temp, y_temp, test_size=0.5)
>>>
>>> cp = ConformalClassifier(LogisticRegression(), method='aps', alpha=0.1)
>>> cp.fit(X_train, y_train, X_cal, y_cal)
>>> prediction_sets = cp.predict(X_test)  # Returns sets with ~90% coverage
>>> print(f"Coverage: {cp.coverage_score(X_test, y_test):.3f}")

fit(X_train, y_train, X_cal=None, y_cal=None, cal_size=0.2)[source]¶

Fit base model and calibrate conformal scores.

Parameters:

X_train (array-like of shape (n_train_samples, n_features)) – Training features.
y_train (array-like of shape (n_train_samples,)) – Training labels.
X_cal (array-like of shape (n_cal_samples, n_features), optional) – Calibration features. If None, splits from training data.
y_cal (array-like of shape (n_cal_samples,), optional) – Calibration labels. If None, splits from training data.
cal_size (float, default=0.2) – Fraction of training data to use for calibration if X_cal not provided.

Return type:

ConformalClassifier

Returns:

self – Fitted conformal classifier.

predict(X)[source]¶

Return prediction sets for each sample.

Parameters:: X (array-like of shape (n_samples, n_features)) – Test samples.
Return type:: list[set[int]]
Returns:: List[Set[int]] – Prediction set for each sample (set of class indices).

predict_proba(X)[source]¶

Return base classifier probabilities.

Parameters:: X (array-like of shape (n_samples, n_features)) – Test samples.
Return type:: ndarray
Returns:: ndarray of shape (n_samples, n_classes) – Class probabilities from base classifier.

predict_point(X)[source]¶

Return point predictions (most likely class).

Parameters:: X (array-like of shape (n_samples, n_features)) – Test samples.
Return type:: ndarray
Returns:: ndarray of shape (n_samples,) – Predicted class labels.

coverage_score(X, y)[source]¶

Compute empirical coverage on test data.

Parameters:

X (array-like) – Test features.
y (array-like) – True labels.

Return type:

float

Returns:

float – Fraction of samples where true label is in prediction set.

average_set_size(X)[source]¶

Average size of prediction sets (efficiency metric).

Smaller sets = more informative predictions.

Parameters:: X (array-like) – Test features.
Return type:: float
Returns:: float – Average prediction set size.

set_fit_request(*, X_cal='$UNCHANGED$', X_train='$UNCHANGED$', cal_size='$UNCHANGED$', y_cal='$UNCHANGED$', y_train='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_cal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_cal parameter in fit.
X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.
cal_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for cal_size parameter in fit.
y_cal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_cal parameter in fit.
y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.
self (ConformalClassifier)

Returns:

self (object) – The updated object.

Return type:

ConformalClassifier

set_score_request(*, sample_weight='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
self (ConformalClassifier)

Returns:

self (object) – The updated object.

Return type:

ConformalClassifier

class endgame.calibration.ConformalRegressor(estimator, method='absolute', alpha=0.1, quantile_estimator=None, symmetry=True, random_state=None)[source]¶

Bases: BaseEstimator, RegressorMixin

Conformal prediction wrapper for regression.

Provides prediction intervals with guaranteed coverage probability. Under exchangeability, P(y ∈ [lower, upper]) >= 1 - alpha.

Parameters:

estimator (sklearn-compatible regressor) – Base regressor.
method (str, default='absolute') – Conformal method: - ‘absolute’: Absolute residual method (symmetric intervals) - ‘normalized’: Normalized residuals using variance estimate - ‘cqr’: Conformalized Quantile Regression (asymmetric intervals) - ‘cqr_asymmetric’: CQR with separate lower/upper scores
alpha (float, default=0.1) – Miscoverage rate (1 - alpha = coverage probability).
quantile_estimator (estimator, optional) – For CQR methods, estimator for quantiles. If None, uses GradientBoostingRegressor with quantile loss.
symmetry (bool, default=True) – For CQR, whether to use symmetric or asymmetric intervals.
random_state (int, optional) – Random seed.

estimator_¶

Fitted base regressor.

Type:: estimator

lower_estimator_¶

For CQR, fitted lower quantile estimator.

Type:: estimator

upper_estimator_¶

For CQR, fitted upper quantile estimator.

Type:: estimator

conformity_scores_¶

Calibration conformity scores.

Type:: ndarray

quantile_¶

Calibrated quantile threshold.

Type:: float

Examples

>>> from sklearn.ensemble import RandomForestRegressor
>>> from endgame.calibration import ConformalRegressor
>>>
>>> cr = ConformalRegressor(RandomForestRegressor(), method='cqr', alpha=0.1)
>>> cr.fit(X_train, y_train, X_cal, y_cal)
>>> lower, upper = cr.predict_interval(X_test)
>>> print(f"Coverage: {cr.coverage_score(X_test, y_test):.3f}")
>>> print(f"Avg width: {cr.average_interval_width(X_test):.3f}")

fit(X_train, y_train, X_cal=None, y_cal=None, cal_size=0.2)[source]¶

Fit base model and calibrate conformal scores.

Parameters:

X_train (array-like) – Training features.
y_train (array-like) – Training targets.
X_cal (array-like, optional) – Calibration features.
y_cal (array-like, optional) – Calibration targets.
cal_size (float, default=0.2) – Fraction for calibration if not provided.

Return type:

ConformalRegressor

Returns:

self – Fitted conformal regressor.

predict(X)[source]¶

Return point predictions.

Parameters:: X (array-like) – Test samples.
Return type:: ndarray
Returns:: ndarray – Point predictions.

predict_interval(X)[source]¶

Return prediction intervals.

Parameters:

X (array-like) – Test samples.

Return type:

tuple[ndarray, ndarray]

Returns:

lower (ndarray) – Lower bounds of prediction intervals.
upper (ndarray) – Upper bounds of prediction intervals.

coverage_score(X, y)[source]¶

Compute empirical coverage.

Parameters:

X (array-like) – Test features.
y (array-like) – True targets.

Return type:

float

Returns:

float – Fraction of samples where true value is within interval.

average_interval_width(X)[source]¶

Average width of prediction intervals.

Parameters:: X (array-like) – Test features.
Return type:: float
Returns:: float – Average interval width.

set_fit_request(*, X_cal='$UNCHANGED$', X_train='$UNCHANGED$', cal_size='$UNCHANGED$', y_cal='$UNCHANGED$', y_train='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_cal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_cal parameter in fit.
X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.
cal_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for cal_size parameter in fit.
y_cal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_cal parameter in fit.
y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.
self (ConformalRegressor)

Returns:

self (object) – The updated object.

Return type:

ConformalRegressor

set_score_request(*, sample_weight='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
self (ConformalRegressor)

Returns:

self (object) – The updated object.

Return type:

ConformalRegressor

class endgame.calibration.ConformizedQuantileRegressor(quantile_estimator=None, alpha=0.1, cv=None, symmetric=True, random_state=None)[source]¶

Bases: BaseEstimator, RegressorMixin

Conformalized Quantile Regression for adaptive prediction intervals.

CQR combines quantile regression with conformal calibration to produce prediction intervals that: 1. Adapt to heteroscedasticity (wider where uncertainty is higher) 2. Have guaranteed coverage under exchangeability 3. Are typically narrower than standard conformal methods

The algorithm: 1. Train quantile regressors for lower (α/2) and upper (1-α/2) quantiles 2. On calibration data, compute conformity scores:

E_i = max(q_lower(x_i) - y_i, y_i - q_upper(x_i))

Compute the (1-α)(1 + 1/n) quantile of scores
At prediction time: [q_lower(x) - Q, q_upper(x) + Q]

This implementation integrates with QuantileRegressorForest but works with any regressor that can predict quantiles.

Parameters:

quantile_estimator (estimator, optional) – A regressor capable of predicting quantiles. Options: - QuantileRegressorForest (recommended): Native quantile support - GradientBoostingRegressor: With loss=’quantile’ - Any estimator with predict_quantiles(X, quantiles) method If None, uses QuantileRegressorForest with default settings.
alpha (float, default=0.1) – Miscoverage rate. Target coverage is 1 - alpha. E.g., alpha=0.1 targets 90% coverage.
cv (int or None, default=None) – If int, use cross-conformal with cv folds for more efficient data usage. Each fold serves as calibration for models trained on other folds. If None, requires separate calibration set.
symmetric (bool, default=True) –

If True, use symmetric conformity scores:
E = max(q_lower - y, y - q_upper)

If False, use asymmetric scores (separate lower/upper adjustments):
E_lower = q_lower - y E_upper = y - q_upper

Asymmetric can give tighter intervals but requires more calibration data.
random_state (int, RandomState, or None, default=None) – Random seed for reproducibility.

quantile_estimator_¶

Fitted quantile estimator (single model predicting both quantiles).

Type:: estimator

conformity_scores_¶

Calibration conformity scores.

Type:: ndarray

quantile_¶

Calibrated quantile threshold for symmetric CQR.

Type:: float

lower_quantile_¶

Lower threshold for asymmetric CQR.

Type:: float

upper_quantile_¶

Upper threshold for asymmetric CQR.

Type:: float

n_features_in_¶

Number of features seen during fit.

Type:: int

Examples

>>> from endgame.models.trees import QuantileRegressorForest
>>> from endgame.calibration import ConformizedQuantileRegressor
>>> from sklearn.model_selection import train_test_split
>>>
>>> # Split data into train and calibration
>>> X_train, X_cal, y_train, y_cal = train_test_split(X, y, test_size=0.2)
>>>
>>> # Create CQR with QRF (default)
>>> cqr = ConformizedQuantileRegressor(alpha=0.1)
>>> cqr.fit(X_train, y_train, X_cal, y_cal)
>>>
>>> # Get prediction intervals
>>> lower, upper = cqr.predict_interval(X_test)
>>> print(f"Coverage: {cqr.coverage_score(X_test, y_test):.3f}")
>>> print(f"Avg width: {np.mean(upper - lower):.3f}")
>>>
>>> # Using cross-conformal (no separate calibration set needed)
>>> cqr_cv = ConformizedQuantileRegressor(alpha=0.1, cv=5)
>>> cqr_cv.fit(X_train, y_train)
>>> lower, upper = cqr_cv.predict_interval(X_test)

Notes

When to use CQR vs standard conformal regression:

CQR produces adaptive intervals that are wider in high-uncertainty regions and narrower in low-uncertainty regions
Standard conformal produces constant-width intervals
CQR is preferred for heteroscedastic data (varying noise)
Standard conformal is simpler and may suffice for homoscedastic data

Calibration set size:

For reliable coverage, use at least 100-500 calibration samples. With cv > 0, each fold acts as calibration, improving data efficiency.

Integration with QuantileRegressorForest:

QRF naturally estimates quantiles by storing leaf distributions. This makes it ideal for CQR as it provides well-calibrated quantile estimates out of the box without separate quantile loss training.

References

Romano, Y., Patterson, E., & Candès, E. (2019). “Conformalized Quantile Regression.” NeurIPS.

Sesia, M., & Candès, E. (2020). “A comparison of some conformal quantile regression methods.” Stat, 9(1), e261.

fit(X_train, y_train, X_cal=None, y_cal=None, cal_size=0.2)[source]¶

Fit the CQR model.

Parameters:

X_train (array-like of shape (n_train_samples, n_features)) – Training features.
y_train (array-like of shape (n_train_samples,)) – Training targets.
X_cal (array-like of shape (n_cal_samples, n_features), optional) – Calibration features. Required if cv is None. If None and cv is None, splits from training data.
y_cal (array-like of shape (n_cal_samples,), optional) – Calibration targets.
cal_size (float, default=0.2) – Fraction of training data for calibration if X_cal not provided.

Return type:

ConformizedQuantileRegressor

Returns:

self (object) – Fitted CQR model.

predict(X)[source]¶

Return point predictions (median).

Parameters:: X (array-like of shape (n_samples, n_features)) – Test samples.
Return type:: ndarray
Returns:: y_pred (ndarray of shape (n_samples,)) – Point predictions.

predict_interval(X)[source]¶

Return conformalized prediction intervals.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples.

Return type:

tuple[ndarray, ndarray]

Returns:

lower (ndarray of shape (n_samples,)) – Lower bounds of prediction intervals.
upper (ndarray of shape (n_samples,)) – Upper bounds of prediction intervals.

predict_quantiles(X, quantiles=None)[source]¶

Predict arbitrary quantiles (uncalibrated).

Note: These are raw quantile predictions without conformal calibration. For calibrated intervals, use predict_interval().

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples.
quantiles (list of float, optional) – Quantiles to predict. Default: [0.1, 0.5, 0.9]

Return type:

ndarray

Returns:

ndarray of shape (n_samples, n_quantiles) – Quantile predictions.

coverage_score(X, y)[source]¶

Compute empirical coverage.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test features.
y (array-like of shape (n_samples,)) – True targets.

Return type:

float

Returns:

float – Fraction of samples where true value is within interval.

average_interval_width(X)[source]¶

Compute average prediction interval width.

Parameters:: X (array-like of shape (n_samples, n_features)) – Test features.
Return type:: float
Returns:: float – Average interval width.

interval_width(X)[source]¶

Compute prediction interval widths for each sample.

Parameters:: X (array-like of shape (n_samples, n_features)) – Test features.
Return type:: ndarray
Returns:: ndarray of shape (n_samples,) – Interval width for each sample.

score(X, y)[source]¶

Return negative interval width (for sklearn compatibility).

Higher is better (narrower intervals).

Parameters:

X (array-like) – Test features.
y (array-like) – Test targets (unused, for API compatibility).

Return type:

float

Returns:

float – Negative average interval width.

set_fit_request(*, X_cal='$UNCHANGED$', X_train='$UNCHANGED$', cal_size='$UNCHANGED$', y_cal='$UNCHANGED$', y_train='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_cal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_cal parameter in fit.
X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.
cal_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for cal_size parameter in fit.
y_cal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_cal parameter in fit.
y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.
self (ConformizedQuantileRegressor)

Returns:

self (object) – The updated object.

Return type:

ConformizedQuantileRegressor

class endgame.calibration.TemperatureScaling(method='nll', max_iter=100)[source]¶

Bases: BaseEstimator, TransformerMixin

Temperature scaling for neural network calibration.

Learns a single temperature parameter T to scale logits: calibrated_proba = softmax(logits / T)

This is a simple but effective method for calibrating neural networks, particularly when the model is already reasonably calibrated.

Parameters:

method (str, default='nll') – Optimization objective: - ‘nll’: Negative log-likelihood (cross-entropy) - ‘ece’: Expected Calibration Error
max_iter (int, default=100) – Maximum optimization iterations.

temperature_¶

Learned temperature parameter.

Type:: float

Examples

>>> ts = TemperatureScaling()
>>> ts.fit(logits_val, y_val)
>>> calibrated = ts.transform(logits_test)

fit(logits, y)[source]¶

Fit temperature parameter on validation data.

Parameters:

logits (array-like of shape (n_samples, n_classes)) – Raw logits (pre-softmax outputs).
y (array-like of shape (n_samples,)) – True class labels.

Return type:

TemperatureScaling

Returns:

self

transform(logits)[source]¶

Apply temperature scaling to logits.

Parameters:: logits (array-like) – Raw logits.
Return type:: ndarray
Returns:: ndarray – Calibrated probabilities.

fit_transform(logits, y)[source]¶

Fit and transform in one step.

Return type:: ndarray

set_fit_request(*, logits='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

logits (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for logits parameter in fit.
self (TemperatureScaling)

Returns:

self (object) – The updated object.

Return type:

TemperatureScaling

set_transform_request(*, logits='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the transform method.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

logits (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for logits parameter in transform.
self (TemperatureScaling)

Returns:

self (object) – The updated object.

Return type:

TemperatureScaling

class endgame.calibration.PlattScaling(prior_correction=True, max_iter=100)[source]¶

Bases: BaseEstimator, TransformerMixin

Platt scaling (sigmoid calibration) for binary classification.

Fits logistic regression: P(y=1|f) = 1 / (1 + exp(A*f + B))

Parameters:

prior_correction (bool, default=True) – Apply prior correction for imbalanced datasets. Uses Platt’s method with adjusted target probabilities.
max_iter (int, default=100) – Maximum optimization iterations.

A_¶

Learned slope parameter.

Type:: float

B_¶

Learned intercept parameter.

Type:: float

Examples

>>> platt = PlattScaling()
>>> platt.fit(scores_val, y_val)
>>> calibrated = platt.transform(scores_test)

fit(scores, y)[source]¶

Fit Platt scaling parameters.

Parameters:

scores (array-like of shape (n_samples,)) – Raw scores or decision function values.
y (array-like of shape (n_samples,)) – Binary labels (0 or 1).

Return type:

PlattScaling

Returns:

self

transform(scores)[source]¶

Apply Platt scaling.

Parameters:: scores (array-like) – Raw scores.
Return type:: ndarray
Returns:: ndarray – Calibrated probabilities.

fit_transform(scores, y)[source]¶

Fit and transform.

Return type:: ndarray

set_fit_request(*, scores='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

scores (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for scores parameter in fit.
self (PlattScaling)

Returns:

self (object) – The updated object.

Return type:

PlattScaling

set_transform_request(*, scores='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the transform method.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

scores (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for scores parameter in transform.
self (PlattScaling)

Returns:

self (object) – The updated object.

Return type:

PlattScaling

class endgame.calibration.BetaCalibration(parameters='abm')[source]¶

Bases: BaseEstimator, TransformerMixin

Beta calibration for improved probability estimates.

More flexible than Platt scaling, handles different miscalibration patterns.

Fits: calibrated = 1 / (1 + 1/exp(c*log(p/(1-p)) + d*log(p) + e*log(1-p)))

Can be simplified to three-parameter form: calibrated = 1 / (1 + exp(-(a*logit(p) + b)))

Parameters:: parameters (str, default='abm') – Parameterization: - ‘abm’: Three parameters (a, b, m) - most common - ‘full’: Five parameters (more flexible, may overfit)

a_, b_, m_

Learned parameters (abm mode).

Type:: float

References

Kull et al. “Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers” (2017)

Examples

>>> beta_cal = BetaCalibration()
>>> beta_cal.fit(proba_val, y_val)
>>> calibrated = beta_cal.transform(proba_test)

fit(proba, y)[source]¶

Fit beta calibration parameters.

Parameters:

proba (array-like) – Predicted probabilities for positive class.
y (array-like) – Binary labels.

Return type:

BetaCalibration

Returns:

self

transform(proba)[source]¶

Apply beta calibration.

Parameters:: proba (array-like) – Predicted probabilities.
Return type:: ndarray
Returns:: ndarray – Calibrated probabilities.

fit_transform(proba, y)[source]¶

Fit and transform.

Return type:: ndarray

set_fit_request(*, proba='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

proba (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for proba parameter in fit.
self (BetaCalibration)

Returns:

self (object) – The updated object.

Return type:

BetaCalibration

set_transform_request(*, proba='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the transform method.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

proba (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for proba parameter in transform.
self (BetaCalibration)

Returns:

self (object) – The updated object.

Return type:

BetaCalibration

class endgame.calibration.IsotonicCalibration(out_of_bounds='clip')[source]¶

Bases: BaseEstimator, TransformerMixin

Isotonic regression calibration.

Non-parametric calibration that preserves ranking. Fits a monotonically increasing step function mapping predicted probabilities to calibrated probabilities.

Best for large calibration sets (>1000 samples) where the flexibility doesn’t lead to overfitting.

Parameters:: out_of_bounds (str, default='clip') – How to handle predictions outside training range: - ‘clip’: Clip to [min, max] of training range - ‘nan’: Return NaN for out-of-bounds

isotonic_¶

Fitted isotonic regression model.

Type:: IsotonicRegression

Examples

>>> iso = IsotonicCalibration()
>>> iso.fit(proba_val, y_val)
>>> calibrated = iso.transform(proba_test)

fit(proba, y)[source]¶

Fit isotonic regression.

Parameters:

proba (array-like) – Predicted probabilities.
y (array-like) – Binary labels.

Return type:

IsotonicCalibration

Returns:

self

transform(proba)[source]¶

Apply isotonic calibration.

Parameters:: proba (array-like) – Predicted probabilities.
Return type:: ndarray
Returns:: ndarray – Calibrated probabilities.

fit_transform(proba, y)[source]¶

Fit and transform.

Return type:: ndarray

set_fit_request(*, proba='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

proba (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for proba parameter in fit.
self (IsotonicCalibration)

Returns:

self (object) – The updated object.

Return type:

IsotonicCalibration

set_transform_request(*, proba='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the transform method.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

proba (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for proba parameter in transform.
self (IsotonicCalibration)

Returns:

self (object) – The updated object.

Return type:

IsotonicCalibration

class endgame.calibration.HistogramBinning(n_bins=10, strategy='uniform')[source]¶

Bases: BaseEstimator, TransformerMixin

Histogram binning calibration.

Divides probability space into bins and maps each bin to the empirical frequency of positives within that bin.

Simple and interpretable, but can be unreliable with few samples.

Parameters:

n_bins (int, default=10) – Number of bins.
strategy (str, default='uniform') – Binning strategy: - ‘uniform’: Equal-width bins - ‘quantile’: Equal-frequency bins

bin_edges_¶

Edges of calibration bins.

Type:: ndarray

bin_calibrations_¶

Calibrated probability for each bin.

Type:: ndarray

Examples

>>> hb = HistogramBinning(n_bins=15)
>>> hb.fit(proba_val, y_val)
>>> calibrated = hb.transform(proba_test)

fit(proba, y)[source]¶

Fit histogram binning.

Parameters:

proba (array-like) – Predicted probabilities.
y (array-like) – Binary labels.

Return type:

HistogramBinning

Returns:

self

transform(proba)[source]¶

Apply histogram binning calibration.

Parameters:: proba (array-like) – Predicted probabilities.
Return type:: ndarray
Returns:: ndarray – Calibrated probabilities.

fit_transform(proba, y)[source]¶

Fit and transform.

Return type:: ndarray

set_fit_request(*, proba='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

proba (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for proba parameter in fit.
self (HistogramBinning)

Returns:

self (object) – The updated object.

Return type:

HistogramBinning

set_transform_request(*, proba='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the transform method.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

proba (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for proba parameter in transform.
self (HistogramBinning)

Returns:

self (object) – The updated object.

Return type:

HistogramBinning

class endgame.calibration.VennABERS(estimator=None, inductive=True, precision=0.001)[source]¶

Bases: BaseEstimator, ClassifierMixin

Venn-ABERS predictors for well-calibrated probabilities.

Provides probability intervals [p0, p1] rather than point estimates. The intervals have theoretical validity guarantees under exchangeability.

For a new test point, computes: - p1: calibrated probability assuming the true label is 1 - p0: calibrated probability assuming the true label is 0

The final probability estimate can be taken as the geometric mean or other combination of p0 and p1.

Parameters:

estimator (sklearn-compatible classifier, optional) – Base classifier with predict_proba. If None, only transform methods are available.
inductive (bool, default=True) – Use inductive (split) Venn-ABERS. If False, uses full (computationally expensive) Venn-ABERS.
precision (float, default=0.001) – Precision for isotonic regression calibration points.

estimator_¶

Fitted base classifier.

Type:: estimator

p0_calibrator_¶

Calibrator for label=0 assumption.

Type:: IsotonicRegression

p1_calibrator_¶

Calibrator for label=1 assumption.

Type:: IsotonicRegression

cal_scores_¶

Calibration set scores.

Type:: ndarray

cal_labels_¶

Calibration set labels.

Type:: ndarray

Examples

>>> from sklearn.ensemble import RandomForestClassifier
>>> from endgame.calibration import VennABERS
>>>
>>> va = VennABERS(RandomForestClassifier(n_estimators=100))
>>> va.fit(X_train, y_train, X_cal, y_cal)
>>>
>>> # Get probability intervals
>>> p0, p1 = va.predict_proba_interval(X_test)
>>>
>>> # Get point estimate (geometric mean)
>>> proba = va.predict_proba(X_test)[:, 1]

fit(X_train, y_train, X_cal=None, y_cal=None, cal_size=0.2)[source]¶

Fit Venn-ABERS predictor.

Parameters:

X_train (array-like) – Training features.
y_train (array-like) – Training labels.
X_cal (array-like, optional) – Calibration features.
y_cal (array-like, optional) – Calibration labels.
cal_size (float, default=0.2) – Fraction for calibration if not provided separately.

Return type:

VennABERS

Returns:

self

predict_proba_interval(X)[source]¶

Predict probability intervals.

Parameters:

X (array-like) – Test samples or scores.

Return type:

tuple[ndarray, ndarray]

Returns:

p0 (ndarray) – Lower probability bounds (assuming label=0).
p1 (ndarray) – Upper probability bounds (assuming label=1).

predict_proba(X)[source]¶

Predict calibrated probabilities.

Uses geometric mean of interval endpoints as point estimate.

Parameters:: X (array-like) – Test samples.
Return type:: ndarray
Returns:: ndarray of shape (n_samples, 2) – Calibrated class probabilities.

predict(X)[source]¶

Predict class labels.

Parameters:: X (array-like) – Test samples.
Return type:: ndarray
Returns:: ndarray – Predicted class labels.

interval_width(X)[source]¶

Compute uncertainty (interval width) for each prediction.

Parameters:: X (array-like) – Test samples.
Return type:: ndarray
Returns:: ndarray – Interval widths (p1 - p0).

set_fit_request(*, X_cal='$UNCHANGED$', X_train='$UNCHANGED$', cal_size='$UNCHANGED$', y_cal='$UNCHANGED$', y_train='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the fit method.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_cal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_cal parameter in fit.
X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.
cal_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for cal_size parameter in fit.
y_cal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_cal parameter in fit.
y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.
self (VennABERS)

Returns:

self (object) – The updated object.

Return type:

VennABERS

set_score_request(*, sample_weight='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
self (VennABERS)

Returns:

self (object) – The updated object.

Return type:

VennABERS

class endgame.calibration.CalibrationAnalyzer(n_bins=10, strategy='uniform')[source]¶

Bases: object

Analyze and visualize model calibration.

Computes comprehensive calibration metrics and generates diagnostic plots.

Parameters:

n_bins (int, default=10) – Number of bins for binning-based metrics.
strategy (str, default='uniform') – Binning strategy: ‘uniform’ or ‘quantile’.

Examples

>>> analyzer = CalibrationAnalyzer(n_bins=15)
>>> report = analyzer.analyze(y_true, y_proba)
>>> print(report)
>>>
>>> # Visualize
>>> analyzer.plot_reliability_diagram(y_true, y_proba)
>>> analyzer.plot_confidence_histogram(y_proba)

analyze(y_true, y_proba)[source]¶

Compute comprehensive calibration metrics.

Parameters:

y_true (array-like) – True binary labels.
y_proba (array-like) – Predicted probabilities for positive class.

Return type:

CalibrationReport

Returns:

CalibrationReport – Comprehensive calibration analysis results.

plot_reliability_diagram(y_true, y_proba, ax=None, show_histogram=True, show_ece=True, title='Reliability Diagram')[source]¶

Plot reliability (calibration) diagram.

A well-calibrated model has points close to the diagonal.

Parameters:

y_true (array-like) – True binary labels.
y_proba (array-like) – Predicted probabilities.
ax (matplotlib axes, optional) – Axes to plot on.
show_histogram (bool, default=True) – Show histogram of predictions at bottom.
show_ece (bool, default=True) – Show ECE value on plot.
title (str, default='Reliability Diagram') – Plot title.

Returns:

matplotlib axes

plot_confidence_histogram(y_proba, y_true=None, ax=None, title='Confidence Distribution')[source]¶

Plot histogram of prediction confidences.

Parameters:

y_proba (array-like) – Predicted probabilities.
y_true (array-like, optional) – True labels for coloring by correctness.
ax (matplotlib axes, optional) – Axes to plot on.
title (str) – Plot title.

Returns:

matplotlib axes

compare_calibrations(y_true, probas_dict, ax=None, title='Calibration Comparison')[source]¶

Compare calibration of multiple models.

Parameters:

y_true (array-like) – True labels.
probas_dict (dict) – Dictionary mapping model names to predicted probabilities.
ax (matplotlib axes, optional) – Axes to plot on.
title (str) – Plot title.

Returns:

matplotlib axes

class endgame.calibration.CalibrationReport(ece, mce, brier_score, log_loss, reliability, resolution, uncertainty, bin_edges, bin_accuracies, bin_confidences, bin_counts, mean_confidence, accuracy, overconfidence)[source]¶

Bases: object

Container for calibration analysis results.

Parameters:

ece (float)
mce (float)
brier_score (float)
log_loss (float)
reliability (float)
resolution (float)
uncertainty (float)
bin_edges (ndarray)
bin_accuracies (ndarray)
bin_confidences (ndarray)
bin_counts (ndarray)
mean_confidence (float)
accuracy (float)
overconfidence (float)

ece: float¶

mce: float¶

brier_score: float¶

log_loss: float¶

reliability: float¶

resolution: float¶

uncertainty: float¶

bin_edges: ndarray¶

bin_accuracies: ndarray¶

bin_confidences: ndarray¶

bin_counts: ndarray¶

mean_confidence: float¶

accuracy: float¶

overconfidence: float¶

endgame.calibration.expected_calibration_error(y_true, y_proba, n_bins=10, strategy='uniform')[source]¶

Compute Expected Calibration Error (ECE).

ECE measures the difference between predicted confidence and actual accuracy. ECE = Σ (|B_m| / n) × |acc(B_m) - conf(B_m)|

Parameters:

y_true (array-like) – True binary labels.
y_proba (array-like) – Predicted probabilities for positive class.
n_bins (int, default=10) – Number of bins.
strategy (str, default='uniform') – Binning strategy: ‘uniform’ or ‘quantile’.

Return type:

float

Returns:

float – Expected Calibration Error (lower is better, 0 is perfectly calibrated).

Examples

>>> y_true = np.array([0, 0, 1, 1])
>>> y_proba = np.array([0.2, 0.4, 0.6, 0.9])
>>> ece = expected_calibration_error(y_true, y_proba)

endgame.calibration.maximum_calibration_error(y_true, y_proba, n_bins=10, strategy='uniform')[source]¶

Compute Maximum Calibration Error (MCE).

MCE is the maximum calibration error across all bins. Useful for identifying worst-case miscalibration.

Parameters:

y_true (array-like) – True binary labels.
y_proba (array-like) – Predicted probabilities.
n_bins (int, default=10) – Number of bins.
strategy (str, default='uniform') – Binning strategy.

Return type:

float

Returns:

float – Maximum Calibration Error.

endgame.calibration.brier_score_decomposition(y_true, y_proba, n_bins=10)[source]¶

Decompose Brier score into reliability, resolution, and uncertainty.

Brier Score = Reliability - Resolution + Uncertainty

Reliability: Measures calibration (lower is better)
Resolution: Measures how much predictions differ from base rate (higher is better)
Uncertainty: Base rate entropy (constant for a dataset)

Parameters:

y_true (array-like) – True binary labels.
y_proba (array-like) – Predicted probabilities.
n_bins (int, default=10) – Number of bins for decomposition.

Return type:

WSGIEnvironment[Text, float]

Returns:

dict – Dictionary with ‘brier_score’, ‘reliability’, ‘resolution’, ‘uncertainty’.

Examples

>>> decomp = brier_score_decomposition(y_true, y_proba)
>>> print(f"Reliability: {decomp['reliability']:.4f}")
>>> print(f"Resolution: {decomp['resolution']:.4f}")