Fairness

endgame.fairness.demographic_parity(y_true, y_pred, sensitive_attr)[source]

Compute demographic parity (statistical parity) across groups.

Demographic parity requires that the positive prediction rate is equal across all groups defined by the sensitive attribute:

P(Y_hat = 1 | A = a) = P(Y_hat = 1 | A = b) for all a, b

Parameters:
  • y_true (array-like of shape (n_samples,)) – Ground truth labels (not used in computation but included for API consistency).

  • y_pred (array-like of shape (n_samples,)) – Predicted labels (binary: 0 or 1).

  • sensitive_attr (array-like of shape (n_samples,)) – Sensitive attribute defining groups.

Return type:

WSGIEnvironment[Text, Any]

Returns:

dict – Dictionary with keys:

  • "group_rates" : dict mapping group -> positive prediction rate

  • "max_disparity" : float, max difference between any two group rates

  • "ratio" : float, min(rate) / max(rate), 1.0 means perfect parity

  • "privileged_group" : group with highest positive rate

  • "unprivileged_group" : group with lowest positive rate

Examples

>>> import numpy as np
>>> y_true = np.array([1, 0, 1, 0, 1, 0])
>>> y_pred = np.array([1, 0, 1, 0, 0, 0])
>>> sensitive = np.array(["A", "A", "A", "B", "B", "B"])
>>> result = demographic_parity(y_true, y_pred, sensitive)
>>> result["group_rates"]
{'A': 0.6666666666666666, 'B': 0.0}
endgame.fairness.equalized_odds(y_true, y_pred, sensitive_attr)[source]

Compute equalized odds across groups.

Equalized odds requires that the true positive rate (TPR) and false positive rate (FPR) are equal across groups:

P(Y_hat = 1 | Y = y, A = a) = P(Y_hat = 1 | Y = y, A = b)

Parameters:
  • y_true (array-like of shape (n_samples,)) – Ground truth labels (binary: 0 or 1).

  • y_pred (array-like of shape (n_samples,)) – Predicted labels (binary: 0 or 1).

  • sensitive_attr (array-like of shape (n_samples,)) – Sensitive attribute defining groups.

Return type:

WSGIEnvironment[Text, Any]

Returns:

dict – Dictionary with keys:

  • "group_tpr" : dict mapping group -> true positive rate

  • "group_fpr" : dict mapping group -> false positive rate

  • "tpr_disparity" : float, max difference in TPR across groups

  • "fpr_disparity" : float, max difference in FPR across groups

  • "max_disparity" : float, max of tpr_disparity and fpr_disparity

  • "satisfied" : bool, True if max_disparity < 0.05

Examples

>>> import numpy as np
>>> y_true = np.array([1, 0, 1, 0, 1, 0])
>>> y_pred = np.array([1, 0, 1, 1, 0, 0])
>>> sensitive = np.array(["A", "A", "A", "B", "B", "B"])
>>> result = equalized_odds(y_true, y_pred, sensitive)
>>> "group_tpr" in result and "group_fpr" in result
True
endgame.fairness.disparate_impact(y_true, y_pred, sensitive_attr)[source]

Compute disparate impact ratio across groups.

Disparate impact measures the ratio of positive prediction rates between the least and most favored groups. The four-fifths rule considers a ratio below 0.8 as evidence of adverse impact.

DI = P(Y_hat = 1 | A = unprivileged) / P(Y_hat = 1 | A = privileged)

Parameters:
  • y_true (array-like of shape (n_samples,)) – Ground truth labels (not used in computation but included for API consistency).

  • y_pred (array-like of shape (n_samples,)) – Predicted labels (binary: 0 or 1).

  • sensitive_attr (array-like of shape (n_samples,)) – Sensitive attribute defining groups.

Return type:

WSGIEnvironment[Text, Any]

Returns:

dict – Dictionary with keys:

  • "group_rates" : dict mapping group -> positive prediction rate

  • "disparate_impact_ratio" : float, min(rate)/max(rate)

  • "four_fifths_satisfied" : bool, True if ratio >= 0.8

  • "privileged_group" : group with highest positive rate

  • "unprivileged_group" : group with lowest positive rate

Examples

>>> import numpy as np
>>> y_true = np.array([1, 0, 1, 0, 1, 0])
>>> y_pred = np.array([1, 1, 1, 0, 1, 0])
>>> sensitive = np.array(["A", "A", "A", "B", "B", "B"])
>>> result = disparate_impact(y_true, y_pred, sensitive)
>>> result["four_fifths_satisfied"]
True
endgame.fairness.calibration_by_group(y_true, y_pred, sensitive_attr, y_proba=None, n_bins=10)[source]

Compute calibration metrics per sensitive group.

Evaluates whether predicted probabilities (or predicted labels) are equally well-calibrated across groups. When y_proba is provided, computes Brier score and expected calibration error (ECE) per group. When only y_pred is provided, computes accuracy per group.

Parameters:
  • y_true (array-like of shape (n_samples,)) – Ground truth labels (binary: 0 or 1).

  • y_pred (array-like of shape (n_samples,)) – Predicted labels (binary: 0 or 1).

  • sensitive_attr (array-like of shape (n_samples,)) – Sensitive attribute defining groups.

  • y_proba (array-like of shape (n_samples,), optional) – Predicted probabilities for the positive class. If provided, Brier score and ECE are computed per group.

  • n_bins (int, default=10) – Number of bins for ECE computation.

Return type:

WSGIEnvironment[Text, Any]

Returns:

dict – Dictionary with keys:

  • "group_accuracy" : dict mapping group -> accuracy

  • "group_brier_score" : dict mapping group -> Brier score (only if y_proba is provided)

  • "group_ece" : dict mapping group -> ECE (only if y_proba is provided)

  • "accuracy_disparity" : float, max difference in accuracy

  • "max_brier_disparity" : float, max difference in Brier score (only if y_proba is provided)

Examples

>>> import numpy as np
>>> y_true = np.array([1, 0, 1, 0, 1, 0])
>>> y_pred = np.array([1, 0, 1, 0, 0, 0])
>>> sensitive = np.array(["A", "A", "A", "B", "B", "B"])
>>> result = calibration_by_group(y_true, y_pred, sensitive)
>>> "group_accuracy" in result
True
class endgame.fairness.ReweighingPreprocessor(sensitive_attr_index=None)[source]

Bases: BaseEstimator, TransformerMixin

Compute sample weights to achieve demographic parity.

Assigns higher weights to under-represented (group, label) combinations and lower weights to over-represented ones, so that the weighted label distribution is independent of the sensitive attribute.

For each (group g, label y) cell the weight is:

w(g, y) = [ P(Y=y) * P(A=g) ] / P(Y=y, A=g)

This is a pre-processing method: use the returned weights as the sample_weight argument in downstream estimators.

Parameters:

sensitive_attr_index (int or str, optional) – Column index (int) or column name (str) in X that contains the sensitive attribute. If None, the sensitive_attr parameter must be provided to fit / transform.

groups_

Unique groups seen during fit.

Type:

np.ndarray

labels_

Unique labels seen during fit.

Type:

np.ndarray

weight_map_

Mapping (group, label) -> weight.

Type:

dict

Examples

>>> import numpy as np
>>> from endgame.fairness import ReweighingPreprocessor
>>> X = np.array([[1, 0], [2, 0], [3, 1], [4, 1]])
>>> y = np.array([0, 1, 0, 1])
>>> sensitive = np.array(["A", "A", "B", "B"])
>>> rw = ReweighingPreprocessor()
>>> rw.fit(X, y, sensitive_attr=sensitive)
ReweighingPreprocessor()
>>> weights = rw.transform(X, y, sensitive_attr=sensitive)
>>> weights.shape
(4,)
fit(X, y, sensitive_attr=None, **fit_params)[source]

Compute reweighing weights from training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Training labels.

  • sensitive_attr (array-like of shape (n_samples,), optional) – Sensitive attribute values. Required if sensitive_attr_index was not set in the constructor.

  • **fit_params (dict) – Ignored. Present for API compatibility.

Return type:

ReweighingPreprocessor

Returns:

self – Fitted preprocessor.

transform(X, y=None, sensitive_attr=None, **transform_params)[source]

Return per-sample weights for bias correction.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,), optional) – Labels. Required to look up weights.

  • sensitive_attr (array-like of shape (n_samples,), optional) – Sensitive attribute values.

  • **transform_params (dict) – Ignored.

Return type:

ndarray

Returns:

np.ndarray of shape (n_samples,) – Sample weights.

Raises:

ValueError – If y is not provided.

fit_transform(X, y=None, sensitive_attr=None, **fit_params)[source]

Fit and return sample weights in one step.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,), optional) – Labels.

  • sensitive_attr (array-like of shape (n_samples,), optional) – Sensitive attribute values.

  • **fit_params (dict) – Ignored.

Return type:

ndarray

Returns:

np.ndarray of shape (n_samples,) – Sample weights.

set_fit_request(*, sensitive_attr='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sensitive_attr (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sensitive_attr parameter in fit.

  • self (ReweighingPreprocessor)

Returns:

self (object) – The updated object.

Return type:

ReweighingPreprocessor

set_transform_request(*, sensitive_attr='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sensitive_attr (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sensitive_attr parameter in transform.

  • self (ReweighingPreprocessor)

Returns:

self (object) – The updated object.

Return type:

ReweighingPreprocessor

class endgame.fairness.ExponentiatedGradient(estimator=None, constraint='demographic_parity', constraint_weight=0.5, max_iter=50, random_state=None)[source]

Bases: BaseEstimator, ClassifierMixin

Fairness-constrained classification via exponentiated gradient reduction.

Wraps any sklearn-compatible binary classifier and trains it under a fairness constraint (demographic parity or equalized odds) using the fairlearn library’s ExponentiatedGradient algorithm.

This is an in-processing method: the fairness constraint is enforced during training.

Parameters:
  • estimator (sklearn estimator) – Base binary classifier to wrap. Must implement fit and predict.

  • constraint (str, default="demographic_parity") –

    Fairness constraint to enforce. One of:

    • "demographic_parity" : equalize selection rates

    • "equalized_odds" : equalize TPR and FPR

    • "true_positive_rate_parity" : equalize TPR (equal opportunity)

    • "error_rate_parity" : equalize error rates

  • constraint_weight (float, default=0.5) – Trade-off parameter. Higher values enforce the constraint more strictly at the cost of overall accuracy. Must be in (0, 1].

  • max_iter (int, default=50) – Maximum number of iterations for the exponentiated gradient solver.

  • random_state (int or None, default=None) – Random seed for reproducibility.

mitigator_

The fitted fairlearn mitigator.

Type:

fairlearn.reductions.ExponentiatedGradient

classes_

Unique class labels.

Type:

np.ndarray

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from endgame.fairness import ExponentiatedGradient
>>> clf = ExponentiatedGradient(
...     estimator=LogisticRegression(),
...     constraint="demographic_parity",
... )
>>> clf.fit(X_train, y_train, sensitive_attr=sensitive_train)
ExponentiatedGradient(...)
>>> y_pred = clf.predict(X_test)
fit(X, y, sensitive_attr=None, **fit_params)[source]

Fit the fairness-constrained classifier.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples,)) – Training labels.

  • sensitive_attr (array-like of shape (n_samples,)) – Sensitive attribute for fairness constraint.

  • **fit_params (dict) – Additional parameters (ignored).

Return type:

ExponentiatedGradient

Returns:

self – Fitted estimator.

Raises:
predict(X)[source]

Predict class labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

np.ndarray of shape (n_samples,) – Predicted class labels.

predict_proba(X)[source]

Predict class probabilities.

Uses the internal randomized classifier to return soft predictions. Falls back to hard predictions if the mitigator does not support _pmf_predict.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

np.ndarray of shape (n_samples, n_classes) – Class probability estimates.

set_fit_request(*, sensitive_attr='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sensitive_attr (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sensitive_attr parameter in fit.

  • self (ExponentiatedGradient)

Returns:

self (object) – The updated object.

Return type:

ExponentiatedGradient

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (ExponentiatedGradient)

Returns:

self (object) – The updated object.

Return type:

ExponentiatedGradient

class endgame.fairness.CalibratedEqOdds(estimator=None, cost_weight=1.0, grid_size=101, random_state=None)[source]

Bases: BaseEstimator, ClassifierMixin

Post-processing threshold adjustment for equalized odds.

Adjusts per-group classification thresholds on predicted probabilities to equalize true positive and false positive rates across groups. Finds optimal thresholds via grid search on calibration data.

This is a post-processing method: it wraps a trained classifier and adjusts its decisions without retraining.

Parameters:
  • estimator (sklearn classifier) – A fitted classifier with predict_proba.

  • cost_weight (float, default=1.0) – Relative cost of false negatives vs false positives. Higher values favor higher TPR (at the cost of higher FPR).

  • grid_size (int, default=101) – Number of threshold candidates to evaluate per group.

  • random_state (int or None, default=None) – Random seed (currently unused, reserved for future stochastic extensions).

thresholds_

Mapping group -> optimal classification threshold.

Type:

dict

groups_

Unique groups seen during fit.

Type:

np.ndarray

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from endgame.fairness import CalibratedEqOdds
>>> base = LogisticRegression().fit(X_train, y_train)
>>> ceqo = CalibratedEqOdds(estimator=base)
>>> ceqo.fit(X_cal, y_cal, sensitive_attr=sensitive_cal)
CalibratedEqOdds(...)
>>> y_pred = ceqo.predict(X_test, sensitive_attr=sensitive_test)
fit(X, y, sensitive_attr=None, **fit_params)[source]

Find per-group thresholds that equalize odds on calibration data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Calibration features.

  • y (array-like of shape (n_samples,)) – Calibration labels.

  • sensitive_attr (array-like of shape (n_samples,)) – Sensitive attribute values.

  • **fit_params (dict) – Ignored.

Return type:

CalibratedEqOdds

Returns:

self – Fitted post-processor.

Raises:

ValueError – If sensitive_attr is not provided or the base estimator lacks predict_proba.

predict(X, sensitive_attr=None)[source]

Predict class labels using per-group thresholds.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • sensitive_attr (array-like of shape (n_samples,)) – Sensitive attribute values.

Return type:

ndarray

Returns:

np.ndarray of shape (n_samples,) – Predicted class labels.

Raises:

ValueError – If sensitive_attr is not provided.

predict_proba(X)[source]

Return raw probabilities from the base estimator.

Post-processing adjusts thresholds, not probabilities. This method exposes the underlying predicted probabilities for transparency.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.

Return type:

ndarray

Returns:

np.ndarray of shape (n_samples, n_classes) – Class probability estimates from the base estimator.

set_fit_request(*, sensitive_attr='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sensitive_attr (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sensitive_attr parameter in fit.

  • self (CalibratedEqOdds)

Returns:

self (object) – The updated object.

Return type:

CalibratedEqOdds

set_predict_request(*, sensitive_attr='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sensitive_attr (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sensitive_attr parameter in predict.

  • self (CalibratedEqOdds)

Returns:

self (object) – The updated object.

Return type:

CalibratedEqOdds

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

  • self (CalibratedEqOdds)

Returns:

self (object) – The updated object.

Return type:

CalibratedEqOdds

class endgame.fairness.FairnessReport(estimator, X, y, sensitive_attr, sensitive_name='sensitive_attr', threshold=0.5)[source]

Bases: object

Generate an HTML fairness report for a binary classifier.

Computes demographic parity, equalized odds, disparate impact, and per-group calibration metrics, then renders them into a standalone HTML document.

Parameters:
  • estimator (sklearn classifier) – A fitted classifier with predict and optionally predict_proba.

  • X (array-like of shape (n_samples, n_features)) – Evaluation features.

  • y (array-like of shape (n_samples,)) – Ground truth labels (binary: 0 or 1).

  • sensitive_attr (array-like of shape (n_samples,)) – Sensitive attribute values.

  • sensitive_name (str, default="sensitive_attr") – Human-readable name for the sensitive attribute (used in report headings).

  • threshold (float, default=0.5) – Classification threshold for converting probabilities to labels (only used when the estimator supports predict_proba).

Examples

>>> from sklearn.linear_model import LogisticRegression
>>> from endgame.fairness import FairnessReport
>>> model = LogisticRegression().fit(X_train, y_train)
>>> report = FairnessReport(model, X_test, y_test, sensitive_test)
>>> html = report.generate()
>>> report.save("report.html")
compute_metrics()[source]

Compute all fairness metrics.

Return type:

WSGIEnvironment[Text, Any]

Returns:

dict – Dictionary with keys "demographic_parity", "equalized_odds", "disparate_impact", "calibration", "y_pred", "y_proba".

generate()[source]

Generate the full HTML fairness report.

Return type:

Text

Returns:

str – Complete HTML document as a string.

save(path)[source]

Save the HTML report to a file.

Parameters:

path (str) – Output file path.

Return type:

Text

Returns:

str – The path the report was saved to.