Validation¶
- class endgame.validation.AdversarialValidator(estimator=None, sample_frac=1.0, cv=5, threshold=0.7, random_state=None, verbose=False)[source]¶
Bases:
EndgameEstimatorDetects train/test distribution drift using adversarial validation.
Trains a classifier to distinguish train from test data. High AUC (>0.5) indicates distribution drift. Feature importances identify drifting features.
This is a critical technique documented across winning solutions to prevent leaderboard overfitting when CV doesn’t correlate with public LB.
- Parameters:
estimator (sklearn-compatible classifier, optional) – The classifier used for adversarial validation. If None, uses LightGBM if available, else RandomForest.
sample_frac (float, default=1.0) – Fraction of data to use (for large datasets).
cv (int, default=5) – Number of cross-validation folds.
threshold (float, default=0.7) – AUC threshold above which to flag significant drift.
random_state (int, optional) – Random seed for reproducibility.
verbose (bool, default=False) – Enable verbose output.
Examples
>>> from endgame.validation import AdversarialValidator >>> av = AdversarialValidator(threshold=0.6) >>> result = av.check_drift(X_train, X_test) >>> print(f"Drift AUC: {result.auc_score:.3f}") >>> if result.drift_severity == 'severe': ... # Remove drifted features ... drop_cols = result.drifted_features[:5]
- check_drift(X_train, X_test)[source]¶
Check for distribution drift between train and test data.
- Parameters:
X_train (array-like of shape (n_train_samples, n_features)) – Training features.
X_test (array-like of shape (n_test_samples, n_features)) – Test features.
- Return type:
AdversarialValidationResult- Returns:
AdversarialValidationResult – Result containing: - auc_score: float (>0.5 indicates drift) - drifted_features: List[str] (features with high importance) - feature_importances: Dict[str, float] - drift_severity: str (‘none’, ‘mild’, ‘moderate’, ‘severe’)
- get_test_like_samples(X_train, y_train, X_test, top_pct=0.2)[source]¶
Get training samples most similar to test distribution.
Uses adversarial validation predictions to identify training samples that the classifier thinks look like test samples.
- Parameters:
X_train (array-like) – Training features.
y_train (array-like) – Training labels.
X_test (array-like) – Test features.
top_pct (float, default=0.2) – Top percentage of test-like samples to return.
- Return type:
- Returns:
X_selected (array-like) – Selected training features.
y_selected (array-like) – Selected training labels.
- class endgame.validation.PurgedTimeSeriesSplit(n_splits=5, purge_gap=0, embargo_pct=0.01, max_train_size=None)[source]¶
Bases:
BaseCrossValidatorTime series CV with purging and embargo to prevent lookahead bias.
Essential for financial competitions (Optiver, Jane Street) where temporal leakage can severely overfit models.
Purging removes samples between train and validation that might contain information about the validation period.
Embargo adds a gap after validation to prevent using future information.
- Parameters:
Examples
>>> cv = PurgedTimeSeriesSplit(n_splits=5, purge_gap=10, embargo_pct=0.01) >>> for train_idx, val_idx in cv.split(X): ... # train_idx ends purge_gap samples before val_idx starts ... pass
- split(X, y=None, groups=None)[source]¶
Generate train/validation indices with purging and embargo.
- Parameters:
X (array-like) – Training data.
y (array-like, optional) – Target variable (ignored).
groups (array-like, optional) – Group labels (ignored).
- Yields:
train_idx (ndarray) – Training indices for this fold.
val_idx (ndarray) – Validation indices for this fold.
- Return type:
- class endgame.validation.StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=None)[source]¶
Bases:
BaseCrossValidatorStratified K-Fold that respects groups.
Combines stratification (maintaining class balance) with group constraints (keeping all samples from a group in the same fold).
Essential when samples are related (e.g., patient_id, user_id) to prevent data leakage.
- Parameters:
Examples
>>> cv = StratifiedGroupKFold(n_splits=5) >>> for train_idx, val_idx in cv.split(X, y, groups=patient_ids): ... # No patient appears in both train and val ... pass
- split(X, y, groups)[source]¶
Generate stratified group-aware train/validation indices.
- Parameters:
X (array-like) – Training data.
y (array-like) – Target variable for stratification.
groups (array-like) – Group labels (e.g., patient_id).
- Yields:
train_idx (ndarray) – Training indices for this fold.
val_idx (ndarray) – Validation indices for this fold.
- Return type:
- class endgame.validation.MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=None)[source]¶
Bases:
BaseCrossValidatorStratified K-Fold for multilabel classification.
Maintains label distribution across folds for multilabel problems using iterative stratification.
- Parameters:
Examples
>>> # y is shape (n_samples, n_labels) with binary labels >>> cv = MultilabelStratifiedKFold(n_splits=5) >>> for train_idx, val_idx in cv.split(X, y): ... # Label proportions maintained across folds ... pass
- split(X, y, groups=None)[source]¶
Generate multilabel-stratified train/validation indices.
Uses iterative stratification algorithm to maintain label proportions.
- Parameters:
X (array-like) – Training data.
y (array-like of shape (n_samples, n_labels)) – Multilabel target matrix.
groups (array-like, optional) – Ignored.
- Yields:
train_idx (ndarray) – Training indices for this fold.
val_idx (ndarray) – Validation indices for this fold.
- Return type:
- class endgame.validation.AdversarialKFold(n_splits=5, test_similarity_threshold=0.5, random_state=None)[source]¶
Bases:
BaseCrossValidatorK-Fold that weights folds by test-similarity.
Uses adversarial validation to identify training samples that look most like test data, then ensures each fold has similar proportions of test-like samples.
- Parameters:
Examples
>>> cv = AdversarialKFold(n_splits=5) >>> for train_idx, val_idx in cv.split(X_train, y, X_test=X_test): ... # Each fold has similar proportion of test-like samples ... pass
- fit(X_train, X_test)[source]¶
Compute test similarity scores for training samples.
- Parameters:
X_train (array-like) – Training features.
X_test (array-like) – Test features.
- Return type:
- Returns:
self
- split(X, y=None, groups=None, X_test=None)[source]¶
Generate adversarial-aware train/validation indices.
- Parameters:
X (array-like) – Training data.
y (array-like, optional) – Target variable.
groups (array-like, optional) – Ignored.
X_test (array-like, optional) – Test data for computing similarity (if not already fit).
- Yields:
train_idx (ndarray) – Training indices for this fold.
val_idx (ndarray) – Validation indices for this fold.
- Return type:
- set_fit_request(*, X_test='$UNCHANGED$', X_train='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
X_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
X_testparameter infit.X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
X_trainparameter infit.self (AdversarialKFold)
- Returns:
self (object) – The updated object.
- Return type:
- set_split_request(*, X_test='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
splitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tosplitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tosplit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
X_test (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
X_testparameter insplit.self (AdversarialKFold)
- Returns:
self (object) – The updated object.
- Return type:
- class endgame.validation.RepeatedStratifiedGroupKFold(n_splits=5, n_repeats=3, random_state=None)[source]¶
Bases:
BaseCrossValidatorRepeated Stratified Group K-Fold.
Runs multiple iterations of StratifiedGroupKFold with different random seeds for more robust CV estimates.
- Parameters:
- class endgame.validation.CombinatorialPurgedKFold(n_folds=10, n_test_folds=2, purge_gap=0, embargo_pct=0.0)[source]¶
Bases:
BaseCrossValidatorCombinatorial Purged Cross-Validation for time series/financial data.
Implements the CPCV method from Marcos López de Prado’s “Advances in Financial Machine Learning” (Chapter 12). This method:
Divides data into N sequential groups (folds)
Uses combinations of k groups as test sets (C(N,k) total splits)
Applies purging to remove training samples that overlap with test labels
Applies embargo to remove training samples too close to test periods
This generates multiple “backtest paths” that can be recombined to compute statistics like the distribution of Sharpe ratios, enabling detection of backtest overfitting.
- Parameters:
n_folds (int, default=10) – Number of sequential groups to divide the data into. Must be >= 3.
n_test_folds (int, default=2) – Number of folds to use as test set in each split. Must be >= 1 and < n_folds. Total number of splits = C(n_folds, n_test_folds).
purge_gap (int, default=0) – Number of samples to purge (remove) from training set at boundaries with test set. These are samples whose labels might overlap with the test period.
embargo_pct (float, default=0.0) – Percentage of total samples to embargo after each test period. Embargo removes training samples that occur immediately after test samples to prevent lookahead bias from label leakage.
- fold_bounds_¶
Start and end indices for each fold (set after split is called).
Notes
The key insight of CPCV is that standard k-fold CV produces only ONE backtest path (the concatenation of all test folds). CPCV produces MULTIPLE backtest paths by using combinations of test folds, enabling statistical analysis of strategy performance across different scenarios.
For example, with n_folds=6 and n_test_folds=2: - Standard KFold: 6 splits, 1 backtest path - CPCV: C(6,2)=15 splits, multiple backtest paths
References
López de Prado, M. (2018). “Advances in Financial Machine Learning”. Chapter 12: Backtesting through Cross-Validation.
Examples
>>> from endgame.validation import CombinatorialPurgedKFold >>> import numpy as np >>> >>> # Financial time series with 1000 samples >>> X = np.random.randn(1000, 10) >>> y = np.random.randn(1000) >>> >>> # Use 6 folds, 2 test folds per split, with purging and embargo >>> cpcv = CombinatorialPurgedKFold( ... n_folds=6, ... n_test_folds=2, ... purge_gap=10, ... embargo_pct=0.01, ... ) >>> >>> print(f"Number of splits: {cpcv.get_n_splits()}") # 15 splits >>> >>> for train_idx, test_idx in cpcv.split(X): ... # Train model on train_idx, evaluate on test_idx ... pass >>> >>> # Get backtest paths for strategy analysis >>> paths = cpcv.get_test_paths(X) >>> print(f"Number of backtest paths: {len(paths)}")
- property n_test_paths: int¶
Number of reconstructible backtest paths.
Each path is a complete sequence through the data using different combinations of the test folds.
- split(X, y=None, groups=None)[source]¶
Generate combinatorial purged train/test splits.
- Parameters:
X (array-like) – Training data. Used only to determine the number of samples.
y (array-like, optional) – Target variable (ignored, but accepted for sklearn compatibility).
groups (array-like, optional) – Group labels (ignored).
- Yields:
train_idx (np.ndarray) – Training indices for this split (purged and embargoed).
test_idx (np.ndarray) – Test indices for this split.
- Return type:
- get_test_paths(X)[source]¶
Reconstruct all possible backtest paths from the splits.
A backtest path is a sequence of test sets that together cover the entire dataset in temporal order. CPCV allows reconstructing multiple such paths from the combinatorial splits.
- get_fold_info(X)[source]¶
Get detailed information about the fold structure.
- Parameters:
X (array-like) – Training data.
- Return type:
- Returns:
Dict[str, Any] – Dictionary containing: - n_samples: Total number of samples - n_folds: Number of folds - n_test_folds: Number of test folds per split - n_splits: Total number of splits - n_test_paths: Number of backtest paths - fold_sizes: List of fold sizes - purge_gap: Purge gap setting - embargo_size: Embargo size in samples
- endgame.validation.cross_validate_oof(estimator, X, y, cv=5, scoring=None, fit_params=None, return_models=True, return_indices=False, groups=None, verbose=False)[source]¶
Perform cross-validation and return out-of-fold predictions.
This is the standard approach for building stacked ensembles and getting unbiased training set predictions.
- Parameters:
estimator (sklearn-compatible estimator) – The model to cross-validate.
X (array-like of shape (n_samples, n_features)) – Training features.
y (array-like of shape (n_samples,)) – Target values.
cv (int or CV splitter, default=5) – Cross-validation strategy.
scoring (str or callable, optional) – Scoring metric. If None, uses estimator’s default.
fit_params (dict, optional) – Additional parameters to pass to estimator.fit().
return_models (bool, default=True) – Whether to return trained models from each fold.
return_indices (bool, default=False) – Whether to return train/val indices for each fold.
groups (array-like, optional) – Group labels for group-aware CV.
verbose (bool, default=False) – Print fold scores during cross-validation.
- Return type:
OOFResult- Returns:
OOFResult –
oof_predictions: Out-of-fold predictions
fold_scores: Validation score for each fold
mean_score: Mean score across folds
std_score: Standard deviation of scores
models: List of trained models (if return_models=True)
fold_indices: List of (train_idx, val_idx) tuples
Examples
>>> from endgame.validation import cross_validate_oof >>> result = cross_validate_oof(model, X, y, cv=5, scoring='roc_auc') >>> print(f"CV Score: {result.mean_score:.4f} ± {result.std_score:.4f}")
- endgame.validation.check_cv_lb_correlation(cv_scores, lb_scores)[source]¶
Compute correlation between CV and leaderboard scores.
Helps validate CV strategy by checking if CV improvements translate to LB improvements.
- Parameters:
- Return type:
- Returns:
Dict[str, float] –
pearson: Pearson correlation coefficient
spearman: Spearman rank correlation
rmse: RMSE between normalized scores
Examples
>>> cv_scores = [0.85, 0.86, 0.87, 0.88] >>> lb_scores = [0.82, 0.83, 0.84, 0.85] >>> result = check_cv_lb_correlation(cv_scores, lb_scores) >>> print(f"Correlation: {result['pearson']:.3f}")
- class endgame.validation.NestedCV(estimator=None, search=None, outer_cv=5, scoring='auto', return_oof=True, random_state=None, verbose=0)[source]¶
Bases:
objectNested cross-validation for unbiased model evaluation.
The inner loop performs model selection (hyperparameter tuning or algorithm comparison) and the outer loop estimates generalization performance using the best model from each inner fold.
- Parameters:
estimator (estimator or None) – Base estimator to evaluate. If search is provided, this is ignored (the search object contains the estimator).
search (estimator with fit/predict or None) – A search object (e.g., GridSearchCV, RandomizedSearchCV, OptunaOptimizer) that performs inner-loop model selection. Must have best_estimator_ and best_params_ after fitting. If None, estimator is used directly without inner tuning.
outer_cv (int or CV splitter, default=5) – Number of outer folds or a CV splitter object.
scoring (str or callable, default='auto') – Scoring metric. ‘auto’ uses accuracy for classifiers, r2 for regressors. Can be a string key or a callable(y_true, y_pred).
return_oof (bool, default=True) – Whether to return out-of-fold predictions.
random_state (int or None, default=None) – Random state for reproducibility.
verbose (int, default=0) – Verbosity level. 0=silent, 1=progress, 2=detailed.
Example
>>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.model_selection import GridSearchCV >>> >>> # With hyperparameter search >>> search = GridSearchCV( ... RandomForestClassifier(random_state=42), ... param_grid={'n_estimators': [50, 100, 200]}, ... cv=3, scoring='accuracy', refit=True ... ) >>> ncv = NestedCV(search=search, outer_cv=5) >>> result = ncv.evaluate(X, y) >>> >>> # Without search (just evaluate a fixed model) >>> ncv = NestedCV(estimator=RandomForestClassifier(n_estimators=100)) >>> result = ncv.evaluate(X, y)
- evaluate(X, y, groups=None)[source]¶
Run nested cross-validation.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training features.
y (array-like of shape (n_samples,)) – Target values.
groups (array-like of shape (n_samples,), optional) – Group labels for GroupKFold-style splitting.
- Return type:
- Returns:
NestedCVResult – Results containing scores, best params, and OOF predictions.
- class endgame.validation.NestedCVResult(outer_scores=<factory>, mean_score=0.0, std_score=0.0, best_params=<factory>, oof_predictions=None, inner_scores=<factory>, scoring='accuracy')[source]¶
Bases:
objectResults from nested cross-validation.
- Parameters:
- oof_predictions¶
Out-of-fold predictions (if return_oof=True).
- Type:
ndarray or None