Utils¶
- endgame.utils.quadratic_weighted_kappa(y_true, y_pred, labels=None)[source]¶
Quadratic Weighted Kappa (QWK) metric.
Used in education competitions (e.g., essay scoring). Measures agreement between two ratings with quadratic weighting.
- Parameters:
y_true (array-like) – True labels.
y_pred (array-like) – Predicted labels.
labels (List[int], optional) – List of labels to use for the confusion matrix.
- Return type:
- Returns:
float – QWK score in range [-1, 1], where 1 is perfect agreement.
Examples
>>> y_true = [1, 2, 3, 4, 5] >>> y_pred = [1, 2, 3, 4, 4] >>> qwk = quadratic_weighted_kappa(y_true, y_pred)
- endgame.utils.map_at_k(y_true, y_pred, k=5)[source]¶
Mean Average Precision @ K.
For ranking competitions where each sample has multiple relevant items.
- Parameters:
- Return type:
- Returns:
float – MAP@K score.
Examples
>>> y_true = [[1, 2, 3], [4, 5]] >>> y_pred = [[1, 3, 5, 2, 4], [4, 1, 5, 2, 3]] >>> score = map_at_k(y_true, y_pred, k=5)
- endgame.utils.ndcg_at_k(y_true, y_pred, k=10)[source]¶
Normalized Discounted Cumulative Gain @ K.
Used in ranking competitions.
- endgame.utils.competition_metric(metric_name)[source]¶
Get metric function by name.
Handles both sklearn metrics and competition-specific metrics.
- class endgame.utils.SubmissionHelper(id_col='id', target_col='target', float_precision=6)[source]¶
Bases:
objectHelper for generating properly formatted submission files.
Handles common submission formats for Kaggle competitions.
- Parameters:
Examples
>>> helper = SubmissionHelper(id_col='Id', target_col='Prediction') >>> helper.to_csv(predictions, ids, 'submission.csv') >>> helper.validate('submission.csv', 'sample_submission.csv')
- to_csv(predictions, ids=None, filepath='submission.csv', sample_submission=None)[source]¶
Generate submission CSV file.
- Parameters:
- Return type:
- Returns:
str – Path to generated submission file.
- validate(submission_path, sample_submission_path)[source]¶
Validate submission against sample submission.
- class endgame.utils.SeedEverything(seed=42, restore=False)[source]¶
Bases:
objectContext manager for reproducible experiments.
Sets random seeds on entry and optionally restores state on exit.
- Parameters:
Examples
>>> with SeedEverything(42): ... # Reproducible code here ... pass
>>> seed_ctx = SeedEverything(42) >>> with seed_ctx: ... result = train_model()
- endgame.utils.seed_everything(seed=42)[source]¶
Set random seeds for reproducibility.
Sets seeds for: - Python random - NumPy - PyTorch (if available) - TensorFlow (if available) - CUDA (if available)
Also sets environment variables for deterministic behavior.
Examples
>>> from endgame.utils import seed_everything >>> seed_everything(42)
- endgame.utils.sharpe_ratio(returns, risk_free_rate=0.0, annualization_factor=252.0)[source]¶
Calculate the annualized Sharpe ratio.
- Parameters:
- Return type:
- Returns:
float – Annualized Sharpe ratio.
Examples
>>> returns = np.random.randn(252) * 0.01 + 0.0005 # Daily returns >>> sr = sharpe_ratio(returns)
- endgame.utils.sharpe_ratio_std(sharpe, n_obs, skewness=0.0, kurtosis=3.0)[source]¶
Calculate the standard error of the Sharpe ratio estimate.
Uses the Lo (2002) / Mertens (2002) correction for non-normality.
- Parameters:
- Return type:
- Returns:
float – Standard error of the Sharpe ratio.
Notes
The formula accounts for: - Sampling variability - Non-normal returns (skewness and fat tails)
References
Lo, A. (2002). “The Statistics of Sharpe Ratios.” Financial Analysts Journal, 58(4), 36-52.
- endgame.utils.probabilistic_sharpe_ratio(sharpe, benchmark_sharpe, n_obs, skewness=0.0, kurtosis=3.0)[source]¶
Calculate the Probabilistic Sharpe Ratio (PSR).
PSR is the probability that the true Sharpe ratio exceeds the benchmark, accounting for non-normality of returns.
- Parameters:
- Return type:
- Returns:
float – Probability in [0, 1] that true SR > benchmark SR.
Examples
>>> # Test if strategy beats SR = 0 >>> psr = probabilistic_sharpe_ratio(sharpe=1.5, benchmark_sharpe=0, ... n_obs=252, skewness=-0.2, kurtosis=4.0) >>> print(f"Probability true SR > 0: {psr:.2%}")
Notes
PSR corrects for: - Sample length (finite track record) - Non-normal returns (skewness and fat tails)
It does NOT correct for multiple testing - use DSR for that.
References
Bailey, D.H. and López de Prado, M. (2012). “The Sharpe Ratio Efficient Frontier.” Journal of Risk, 15(2), 3-44.
- endgame.utils.expected_max_sharpe(n_trials, sharpe_std, mean_sharpe=0.0)[source]¶
Calculate expected maximum Sharpe ratio under null hypothesis.
This is the expected maximum SR when all strategies have true SR = mean_sharpe, but we observe inflated values due to multiple testing.
- Parameters:
- Return type:
- Returns:
float – Expected maximum Sharpe ratio E[max{SR_i}].
Notes
Uses the approximation from Bailey & López de Prado (2014):
E[max{SR}] ≈ μ + σ * [(1-γ)*Φ^(-1)(1-1/N) + γ*Φ^(-1)(1-1/(N*e))]
where γ is the Euler-Mascheroni constant.
Examples
>>> # After 100 trials, what SR do we expect by chance? >>> e_max = expected_max_sharpe(n_trials=100, sharpe_std=0.5) >>> print(f"Expected max SR: {e_max:.2f}")
- endgame.utils.deflated_sharpe_ratio(sharpe, n_trials, sharpe_std_trials, n_obs, skewness=0.0, kurtosis=3.0, mean_sharpe_null=0.0)[source]¶
Calculate the Deflated Sharpe Ratio (DSR).
DSR corrects for multiple testing by computing the probability that the observed Sharpe ratio exceeds the expected maximum SR under the null hypothesis that all strategies have zero true SR.
- Parameters:
sharpe (float) – Estimated Sharpe ratio of the selected strategy.
n_trials (int) – Number of independent trials/strategies tested.
sharpe_std_trials (float) – Standard deviation of Sharpe ratios across all trials.
n_obs (int) – Number of observations (track record length).
skewness (float, default=0.0) – Skewness of returns.
kurtosis (float, default=3.0) – Kurtosis of returns (not excess kurtosis).
mean_sharpe_null (float, default=0.0) – Mean Sharpe ratio under null hypothesis.
- Return type:
- Returns:
float – Deflated Sharpe Ratio in [0, 1].
Examples
>>> # Tested 100 strategies, best has SR = 2.0 >>> dsr = deflated_sharpe_ratio( ... sharpe=2.0, ... n_trials=100, ... sharpe_std_trials=0.5, ... n_obs=252, ... skewness=-0.3, ... kurtosis=4.5, ... ) >>> print(f"DSR: {dsr:.2%}") >>> # If DSR < 0.95, the strategy may be a statistical fluke
Notes
DSR answers: “What is the probability that this strategy would have beaten random chance, given that we tested N strategies?”
A DSR of 0.95 means there’s a 95% probability that the strategy’s performance is real and not due to overfitting from multiple testing.
References
Bailey, D.H. and López de Prado, M. (2014). “The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality.” The Journal of Portfolio Management, 40(5), 94-107.
- endgame.utils.analyze_sharpe(returns, n_trials=1, sharpe_std_trials=None, all_sharpes=None, risk_free_rate=0.0, annualization_factor=252.0, significance_level=0.05)[source]¶
Comprehensive Sharpe ratio analysis with multiple testing correction.
- Parameters:
returns (np.ndarray) – Array of periodic returns for the selected strategy.
n_trials (int, default=1) – Number of independent trials/strategies tested.
sharpe_std_trials (float, optional) – Standard deviation of Sharpe ratios across all trials. If not provided and all_sharpes is given, computed from all_sharpes. If neither provided, estimated as 1/sqrt(n_obs).
all_sharpes (np.ndarray, optional) – Sharpe ratios of all tested strategies (for computing variance).
risk_free_rate (float, default=0.0) – Risk-free rate (same period as returns).
annualization_factor (float, default=252.0) – Factor to annualize Sharpe ratio.
significance_level (float, default=0.05) – Significance level for hypothesis testing.
- Return type:
- Returns:
SharpeAnalysis – Comprehensive analysis results.
Examples
>>> # Single strategy analysis >>> returns = np.random.randn(252) * 0.01 + 0.0005 >>> analysis = analyze_sharpe(returns) >>> print(f"SR: {analysis.sharpe_ratio:.2f}") >>> print(f"PSR (SR > 0): {analysis.probabilistic_sharpe:.2%}")
>>> # Multiple testing scenario >>> all_sharpes = np.random.randn(100) * 0.5 # 100 strategies tested >>> best_idx = np.argmax(all_sharpes) >>> analysis = analyze_sharpe( ... returns=best_returns, ... n_trials=100, ... all_sharpes=all_sharpes, ... ) >>> print(f"DSR: {analysis.deflated_sharpe:.2%}") >>> print(f"Significant: {analysis.is_significant}")
- endgame.utils.minimum_track_record_length(sharpe, benchmark_sharpe=0.0, confidence=0.95, skewness=0.0, kurtosis=3.0)[source]¶
Calculate minimum track record length needed for statistical significance.
Answers: “How many observations do we need to be confident that the strategy’s Sharpe ratio is real?”
- Parameters:
- Return type:
- Returns:
int – Minimum number of observations needed.
Examples
>>> # How long to verify SR = 1.0 strategy? >>> n_min = minimum_track_record_length(sharpe=1.0) >>> print(f"Need at least {n_min} observations")
Notes
This is the “MinTRL” from Bailey & López de Prado (2012).
A strategy with SR = 2.0 and normal returns needs only ~16 observations. A strategy with SR = 0.5 needs ~256 observations!
- endgame.utils.haircut_sharpe_ratio(sharpe, n_trials, sharpe_std_trials=0.5)[source]¶
Apply haircut to Sharpe ratio for multiple testing.
Returns an adjusted Sharpe ratio that accounts for data mining.
- Parameters:
- Return type:
- Returns:
Tuple[float, float] – (haircut_sharpe, haircut_percent) - haircut_sharpe: Adjusted Sharpe ratio - haircut_percent: Percentage reduction applied
Examples
>>> sr_adj, haircut = haircut_sharpe_ratio(sharpe=2.0, n_trials=100) >>> print(f"Adjusted SR: {sr_adj:.2f} (haircut: {haircut:.1%})")
Notes
The haircut is the expected maximum SR under null hypothesis. The adjusted SR is: SR_adjusted = SR_observed - E[max{SR}|null]
- endgame.utils.estimate_n_independent_trials(sharpe_ratios, method='variance')[source]¶
Estimate effective number of independent trials from correlated strategies.
When strategies are correlated, the effective number of independent trials is less than the total number tested.
- Parameters:
sharpe_ratios (np.ndarray) – Array of Sharpe ratios from all tested strategies.
method (str, default="variance") – Method to estimate N: - “variance”: Use variance ratio (conservative) - “count”: Just use the raw count (anti-conservative)
- Return type:
- Returns:
int – Estimated number of independent trials.
Notes
López de Prado (2018) recommends using clustering (ONC algorithm) for more accurate estimation. This function provides simpler heuristics.
- endgame.utils.multiple_testing_summary(sharpe_ratios, returns_list=None, n_obs=252, significance_level=0.05)[source]¶
Generate a summary report for multiple testing analysis.
- Parameters:
sharpe_ratios (np.ndarray) – Sharpe ratios of all tested strategies.
returns_list (List[np.ndarray], optional) – List of return arrays for each strategy (for detailed stats).
n_obs (int, default=252) – Number of observations per strategy.
significance_level (float, default=0.05) – Significance level for testing.
- Return type:
WSGIEnvironment- Returns:
dict – Summary statistics including: - n_trials: Total strategies tested - n_effective: Estimated independent trials - best_sharpe: Highest observed SR - expected_max: Expected max SR under null - best_dsr: DSR of best strategy - haircut: Haircut percentage - n_significant: Number passing DSR threshold