# Calibration Guide

Endgame provides a comprehensive calibration module covering conformal prediction,
Venn-ABERS calibration, and classical probability calibration methods. All classes
follow the sklearn interface (`fit`, `predict`, `predict_proba`).

## Conformal Prediction (Classification)

`ConformalClassifier` wraps any classifier to produce prediction sets that contain
the true label with at least `1 - alpha` marginal coverage. No distributional
assumptions are required beyond exchangeability.

```python
from endgame.calibration import ConformalClassifier
from endgame.models import LGBMWrapper
from sklearn.model_selection import train_test_split

X_train, X_cal, y_train, y_cal = train_test_split(
    X, y, test_size=0.2, random_state=42
)

base = LGBMWrapper(preset='endgame')
base.fit(X_train, y_train)

cc = ConformalClassifier(
    estimator=base,
    alpha=0.1,          # target miscoverage rate; 90% coverage guaranteed
    method='lac',       # 'lac' (softmax-based) or 'aps' (adaptive prediction sets)
)

cc.fit(X_cal, y_cal)   # calibrate on hold-out set

# Returns a list of sets, one per test point
prediction_sets = cc.predict(X_test)
for i, pset in enumerate(prediction_sets[:5]):
    print(f"Sample {i}: possible classes = {pset}")

# Standard hard prediction uses the singleton with highest score
preds = cc.predict(X_test)

# Empirical coverage on a labelled evaluation set
cov = cc.coverage_score(X_eval, y_eval)
print(f"Empirical coverage: {cov:.3f}")  # should be >= 0.90
```

The `'aps'` score (Adaptive Prediction Sets) produces smaller, class-conditional
sets at the cost of slightly weaker marginal guarantees. Use `'lac'` (Least
Ambiguous Classifier) for standard coverage.

## Conformal Prediction (Regression)

`ConformalRegressor` produces prediction intervals with guaranteed marginal
coverage. The width of intervals adapts automatically to the local difficulty of
each test point when a difficulty estimator is provided.

```python
from endgame.calibration import ConformalRegressor
from endgame.models import LGBMWrapper

base = LGBMWrapper(preset='endgame')
base.fit(X_train, y_train)

cr = ConformalRegressor(
    estimator=base,
    alpha=0.05,          # 95% coverage
    method='split',      # 'split' (fast) or 'cv+' (cross-conformal, slower)
)

cr.fit(X_cal, y_cal)

# Returns a tuple of (lower, upper) arrays
lower, upper = cr.predict_interval(X_test)

widths = upper - lower
print(f"Median interval width: {np.median(widths):.4f}")

cov = cr.coverage_score(X_eval, y_eval)
print(f"Empirical coverage: {cov:.3f}")
```

## Conformalized Quantile Regression (CQR)

`ConformizedQuantileRegressor` combines a quantile regressor with conformal
calibration to produce adaptive intervals. Intervals are wider where the model is
less certain, unlike split conformal which uses a fixed residual threshold.

```python
from endgame.calibration import ConformizedQuantileRegressor
from endgame.models import LGBMWrapper

# Base model must support quantile regression
qr = LGBMWrapper(objective='quantile', preset='endgame')

cqr = ConformizedQuantileRegressor(
    estimator=qr,
    alpha=0.1,           # 90% coverage target
    quantile_low=0.05,   # lower quantile for the base regressor
    quantile_high=0.95,  # upper quantile for the base regressor
)

cqr.fit(X_train, y_train, X_cal=X_cal, y_cal=y_cal)

lower, upper = cqr.predict_interval(X_test)
```

CQR is the recommended method when prediction intervals of varying width are
needed. The conformity score is `max(q_low - y, y - q_high)`, so the calibration
step only stretches or shrinks the raw quantile interval by a single scalar.

## Venn-ABERS Calibration

`VennABERS` produces well-calibrated probability estimates without requiring a
specific parametric form. It is guaranteed to be calibrated in a strong sense
(individual calibration) under no distributional assumptions.

```python
from endgame.calibration import VennABERS
from endgame.models import LGBMWrapper

base = LGBMWrapper(preset='endgame')
base.fit(X_train, y_train)

va = VennABERS(estimator=base)
va.fit(X_cal, y_cal)

# Returns point probabilities (geometric mean of the interval bounds)
proba = va.predict_proba(X_test)

# Returns the full Venn-ABERS interval [p0, p1] per sample
intervals = va.predict_interval(X_test)
p0, p1 = intervals[:, 0], intervals[:, 1]

# Interval width indicates epistemic uncertainty
uncertainty = p1 - p0
```

Unlike Platt scaling or isotonic regression, Venn-ABERS does not require tuning
and is valid for small calibration sets. It is particularly useful when the base
model has poorly calibrated raw probabilities (e.g., a gradient boosting model).

## Classical Probability Calibration

### Temperature Scaling

Temperature scaling divides the logits of a neural network (or any model exposing
logits) by a single learnable scalar `T`. It is the most common post-hoc
calibration technique for deep learning.

```python
from endgame.calibration import TemperatureScaling

ts = TemperatureScaling()
ts.fit(logits_cal, y_cal)    # calibrate on logits (pre-softmax)

calibrated_proba = ts.predict_proba(logits_test)
print(f"Learned temperature: {ts.temperature_:.4f}")
```

### Platt Scaling

Platt scaling fits a logistic regression on the model's raw scores. It is
effective when the raw scores are approximately normally distributed by class.

```python
from endgame.calibration import PlattScaling

ps = PlattScaling()
ps.fit(scores_cal, y_cal)    # 1D array of decision scores

calibrated_proba = ps.predict_proba(scores_test)
```

### Beta Calibration

Beta calibration maps scores through a Beta CDF, offering more flexibility than
Platt scaling for scores bounded in [0, 1] (e.g., already-softmaxed probabilities).

```python
from endgame.calibration import BetaCalibration

bc = BetaCalibration()
bc.fit(proba_cal, y_cal)   # uncalibrated probabilities in [0, 1]

calibrated_proba = bc.predict_proba(proba_test)
```

### Isotonic Calibration

Isotonic regression fits a non-parametric monotone mapping from scores to
probabilities. It can perfectly fit calibration data but may overfit with small
calibration sets.

```python
from endgame.calibration import IsotonicCalibration

ic = IsotonicCalibration()
ic.fit(proba_cal, y_cal)

calibrated_proba = ic.predict_proba(proba_test)
```

## Evaluating Calibration Quality

`CalibrationAnalyzer` computes multiple calibration diagnostics and generates
reliability diagrams.

```python
from endgame.calibration import CalibrationAnalyzer

analyzer = CalibrationAnalyzer(n_bins=10, strategy='uniform')
analyzer.fit(proba_test, y_test)

# Scalar metrics
print(f"ECE  : {analyzer.ece_:.4f}")   # Expected Calibration Error
print(f"MCE  : {analyzer.mce_:.4f}")   # Maximum Calibration Error
print(f"Brier: {analyzer.brier_:.4f}") # Brier Score

# Reliability diagram (matplotlib figure)
fig = analyzer.plot_reliability_diagram(title="Model Calibration")
fig.savefig("reliability.png", dpi=150)

# Per-bin breakdown
print(analyzer.bin_stats_)  # DataFrame: bin_lower, bin_upper, fraction_pos, mean_conf, count
```

## Choosing a Calibration Method

| Method | Best for |
|---|---|
| `TemperatureScaling` | Neural networks with logit access; large calibration sets |
| `PlattScaling` | SVM or other margin-based models; unimodal score distributions |
| `BetaCalibration` | Models outputting probabilities; flexible boundary handling |
| `IsotonicCalibration` | Large calibration sets; non-monotone miscalibration patterns |
| `VennABERS` | Small calibration sets; no distributional assumptions; individual guarantees |
| `ConformalClassifier` | Hard prediction sets with coverage guarantees |
| `ConformalRegressor` | Prediction intervals with coverage guarantees |
| `ConformizedQuantileRegressor` | Adaptive-width intervals; heteroscedastic regression |

## See Also

- [API Reference: calibration](../api/calibration)
- [Ensembles Guide](ensembles.md) for combining calibrated models
- [Models Guide](models.md) for base model options