Benchmark¶
- class endgame.benchmark.SuiteLoader(suite='sklearn-classic', max_datasets=None, max_samples=None, max_features=None, cache_dir=None, random_state=42, verbose=True)[source]¶
Bases:
objectLoad benchmark datasets from various sources.
Supports OpenML benchmark suites, sklearn built-in datasets, and custom datasets. Provides standardized interface for benchmark experiments.
- Parameters:
suite (str or List[int]) – Suite name (e.g., “OpenML-CC18”) or list of OpenML task IDs.
max_datasets (int, optional) – Maximum number of datasets to load.
max_samples (int, optional) – Maximum samples per dataset (larger datasets are sampled).
max_features (int, optional) – Maximum features per dataset.
cache_dir (str, optional) – Directory for caching downloaded datasets.
random_state (int, default=42) – Random seed for sampling.
verbose (bool, default=True) – Enable verbose output.
Examples
>>> loader = SuiteLoader(suite="sklearn-classic") >>> for dataset in loader.load(): ... print(f"{dataset.name}: {dataset.n_samples} samples, {dataset.n_features} features")
>>> loader = SuiteLoader(suite="OpenML-CC18", max_datasets=10) >>> datasets = list(loader.load())
- load()[source]¶
Load datasets from the suite.
- Yields:
DatasetInfo – Dataset information and data.
- Return type:
Generator[DatasetInfo, None, None]
- class endgame.benchmark.DatasetInfo(name, task_type, X, y, feature_names=<factory>, categorical_indicator=<factory>, n_samples=0, n_features=0, n_classes=0, class_distribution=<factory>, source='unknown', openml_id=None, cv_splits=None, metadata=<factory>)[source]¶
Bases:
objectContainer for dataset information and data.
- Parameters:
- task_type¶
Type of ML task.
- Type:
TaskType
- X¶
Feature matrix.
- Type:
np.ndarray
- y¶
Target variable.
- Type:
np.ndarray
- cv_splits¶
Predefined cross-validation splits.
- Type:
Optional[List[Tuple[np.ndarray, np.ndarray]]]
- task_type: TaskType¶
- class endgame.benchmark.MetaProfiler(groups=None, use_pymfe=True, landmarking_cv=3, random_state=42, verbose=False)[source]¶
Bases:
objectExtract meta-features from datasets for meta-learning.
Uses pymfe when available, with fallback to pure numpy/sklearn implementations.
- Parameters:
groups (List[str], optional) – Meta-feature groups to extract. Default: [“simple”, “statistical”, “info-theory”]. Options: “simple”, “statistical”, “info-theory”, “landmarking”, “complexity”.
use_pymfe (bool, default=True) – Use pymfe library when available (more comprehensive features).
landmarking_cv (int, default=3) – Number of CV folds for landmarking meta-features.
random_state (int, default=42) – Random seed for reproducibility.
verbose (bool, default=False) – Enable verbose output.
Examples
>>> profiler = MetaProfiler(groups=["simple", "statistical"]) >>> meta_features = profiler.profile(X, y) >>> print(meta_features.features)
>>> # With landmarking >>> profiler = MetaProfiler(groups=["simple", "landmarking"]) >>> meta_features = profiler.profile(X, y)
- profile(X, y, categorical_indicator=None, task_type='classification')[source]¶
Extract meta-features from a dataset.
- Parameters:
X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
y (np.ndarray) – Target variable of shape (n_samples,).
categorical_indicator (List[bool], optional) – Boolean mask indicating categorical features.
task_type (str, default="classification") – Type of task: “classification” or “regression”.
- Return type:
- Returns:
MetaFeatureSet – Extracted meta-features.
- class endgame.benchmark.MetaFeatureSet(features=<factory>, groups=<factory>, extraction_time=0.0, errors=<factory>)[source]¶
Bases:
objectContainer for extracted meta-features.
- Parameters:
- class endgame.benchmark.ExperimentTracker(name='benchmark', auto_save=False, save_path=None)[source]¶
Bases:
objectTrack and store experiment results.
Provides methods for logging experiments, querying results, and exporting to various formats.
- Parameters:
Examples
>>> tracker = ExperimentTracker(name="my_benchmark") >>> tracker.log_experiment( ... dataset_name="iris", ... model_name="RandomForest", ... metrics={"accuracy": 0.95, "f1": 0.94}, ... hyperparameters={"n_estimators": 100}, ... ) >>> df = tracker.to_dataframe()
- log_experiment(dataset_name, model_name, metrics, hyperparameters=None, pipeline_config=None, meta_features=None, cv_scores=None, fit_time=0.0, predict_time=0.0, memory_mb=0.0, n_samples=0, n_features=0, task_type='classification', dataset_id=None, status='success', error_message=None, tags=None, notes='', model_structure=None)[source]¶
Log a single experiment.
- Parameters:
dataset_name (str) – Name of the dataset.
model_name (str) – Name of the model/pipeline.
hyperparameters (Dict, optional) – Model hyperparameters.
pipeline_config (Dict, optional) – Full pipeline configuration.
meta_features (Dict, optional) – Dataset meta-features.
cv_scores (List[float], optional) – Per-fold CV scores.
fit_time (float) – Training time in seconds.
predict_time (float) – Prediction time in seconds.
memory_mb (float) – Peak memory usage in MB.
n_samples (int) – Number of samples.
n_features (int) – Number of features.
task_type (str) – Task type.
dataset_id (str, optional) – External dataset ID.
status (str) – Experiment status.
error_message (str, optional) – Error message if failed.
tags (List[str], optional) – Tags for filtering.
notes (str) – Additional notes.
model_structure (str | None)
- Return type:
- Returns:
ExperimentRecord – The logged experiment record.
- log_failure(dataset_name, model_name, error_message, **kwargs)[source]¶
Log a failed experiment.
- Return type:
- Parameters:
- property records: list[ExperimentRecord]¶
Get all experiment records.
- get_by_dataset(dataset_name)[source]¶
Get records for a specific dataset.
- Return type:
- Parameters:
dataset_name (str)
- get_by_model(model_name)[source]¶
Get records for a specific model.
- Return type:
- Parameters:
model_name (str)
- to_dataframe(include_meta_features=True)[source]¶
Convert to DataFrame.
- Parameters:
include_meta_features (bool, default=True) – Include meta-features as columns.
- Returns:
DataFrame – Polars DataFrame (or Pandas if Polars unavailable).
- load(path)[source]¶
Load results from file.
- Parameters:
path (str) – Input path.
- Return type:
- Returns:
self
- merge(other, deduplicate=True)[source]¶
Merge another tracker into this one.
- Parameters:
other (ExperimentTracker) – Tracker to merge.
deduplicate (bool, default=True) – Skip records with duplicate config_hash.
- Return type:
- Returns:
self
- save_to_master(path=None, deduplicate=True)[source]¶
Save results to master database, appending to existing records.
This is the primary method for building a meta-learning dataset. New experiments are appended to the master database, with duplicate configurations (same dataset + model + hyperparameters) skipped.
- Parameters:
- Return type:
- Returns:
int – Number of new records added.
Examples
>>> tracker = ExperimentTracker() >>> # ... run experiments ... >>> n_added = tracker.save_to_master() >>> print(f"Added {n_added} new experiments to master database")
- classmethod load_master(path=None)[source]¶
Load the master meta-learning database.
- Parameters:
path (str or Path, optional) – Path to master database. Defaults to ~/.endgame/meta_learning_db.parquet
- Return type:
- Returns:
ExperimentTracker – Tracker with all historical experiments.
Examples
>>> tracker = ExperimentTracker.load_master() >>> print(f"Master database has {len(tracker)} experiments")
- static get_master_db_path()[source]¶
Get the default master database path.
- Return type:
- Returns:
Path – Default path: ~/.endgame/meta_learning_db.parquet
- class endgame.benchmark.ExperimentRecord(experiment_id='', timestamp='', dataset_name='', dataset_id=None, model_name='', pipeline_config=<factory>, hyperparameters=<factory>, metrics=<factory>, meta_features=<factory>, cv_scores=None, fit_time=0.0, predict_time=0.0, memory_mb=0.0, n_samples=0, n_features=0, task_type='classification', status='pending', error_message=None, tags=<factory>, notes='', model_structure=None, config_hash='')[source]¶
Bases:
objectSingle experiment record.
- Parameters:
- pipeline_config¶
Serialized pipeline configuration.
- Type:
Dict
- hyperparameters¶
Model hyperparameters.
- Type:
Dict
- endgame.benchmark.get_experiment_hash(dataset_name, model_name, hyperparameters, task_type='classification')[source]¶
Generate a unique hash for an experiment configuration.
This hash is used to detect duplicate experiments in the master database. Two experiments are considered duplicates if they have the same: - dataset name - model name - hyperparameters - task type
- Parameters:
- Return type:
- Returns:
str – SHA256 hash (first 16 characters) uniquely identifying this config.
- class endgame.benchmark.BenchmarkRunner(suite='sklearn-classic', config=None, max_datasets=None, fast_run=False, verbose=True, **kwargs)[source]¶
Bases:
objectRun systematic benchmarks across datasets and models.
Orchestrates the complete benchmark workflow: 1. Load datasets from benchmark suite 2. Profile datasets (extract meta-features) 3. Run cross-validation for each model on each dataset 4. Record results with full provenance
- Parameters:
suite (str, default="sklearn-classic") – Benchmark suite name.
config (BenchmarkConfig, optional) – Full configuration object.
max_datasets (int, optional) – Override maximum number of datasets.
fast_run (bool, default=False) – Quick run with reduced settings.
verbose (bool, default=True) – Enable verbose output.
**kwargs – Additional configuration parameters.
Examples
>>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.linear_model import LogisticRegression >>> >>> models = [ ... ("RF", RandomForestClassifier(n_estimators=100, random_state=42)), ... ("LR", LogisticRegression(max_iter=1000)), ... ] >>> >>> runner = BenchmarkRunner(suite="sklearn-classic") >>> results = runner.run(models) >>> print(results.summary()) >>> >>> # Save results >>> results.save("benchmark_results.parquet")
- run(models, output_file=None, continue_on_error=True)[source]¶
Run benchmark on all models and datasets.
- Parameters:
models (List[Union[Tuple[str, BaseEstimator], Tuple[str, BaseEstimator, BaseEstimator]]]) –
List of model specifications. Each can be either: - (name, estimator): Single estimator used for all tasks - (name, classifier, regressor): Pair of estimators, classifier used for
classification tasks and regressor for regression tasks. Either can be None to skip that task type.
output_file (str, optional) – Path to save results.
continue_on_error (bool, default=True) – Continue if a model fails on a dataset.
- Return type:
- Returns:
ExperimentTracker – Tracker with all experiment results.
- property tracker: ExperimentTracker¶
Get the experiment tracker.
- property datasets: list[DatasetInfo]¶
Get loaded datasets.
- property meta_features: dict[str, MetaFeatureSet]¶
Get extracted meta-features.
- class endgame.benchmark.BenchmarkConfig(suite='sklearn-classic', max_datasets=None, max_samples=None, cv_folds=5, scoring_classification=<factory>, scoring_regression=<factory>, profile_datasets=True, profile_groups=<factory>, cache_meta_features=True, meta_features_cache_dir=None, timeout_per_fit=300, n_jobs=1, random_state=42, verbose=True, skip_completed=True)[source]¶
Bases:
objectConfiguration for benchmark runs.
- Parameters:
- endgame.benchmark.quick_benchmark(model, model_name='model', suite='quick-test', **kwargs)[source]¶
Quick benchmark a single model on test datasets.
- Parameters:
- Return type:
- Returns:
ExperimentTracker – Results tracker.
Examples
>>> from sklearn.ensemble import RandomForestClassifier >>> results = quick_benchmark(RandomForestClassifier(), "RF") >>> print(results.summary())
- endgame.benchmark.compare_models(models, suite='sklearn-classic', **kwargs)[source]¶
Compare multiple models on benchmark datasets.
- Parameters:
- Return type:
- Returns:
ExperimentTracker – Results tracker.
- class endgame.benchmark.ResultsAnalyzer(tracker, metric='accuracy', higher_is_better=True, significance_level=0.05)[source]¶
Bases:
objectAnalyze and compare benchmark results.
Provides methods for: - Ranking models across datasets - Statistical significance testing - Critical difference diagrams - Performance profiles - Meta-feature correlation analysis
- Parameters:
tracker (ExperimentTracker) – Tracker containing experiment results.
metric (str, default="accuracy") – Primary metric for comparisons.
higher_is_better (bool, default=True) – Whether higher metric values are better.
significance_level (float, default=0.05) – Alpha level for statistical tests.
Examples
>>> analyzer = ResultsAnalyzer(tracker, metric="accuracy") >>> rankings = analyzer.rank_models() >>> print(rankings) >>> >>> # Statistical comparison >>> comparison = analyzer.compare_models("RF", "XGBoost") >>> print(f"P-value: {comparison.p_value}")
- classmethod from_pivot(pivot, metric='accuracy', higher_is_better=True, significance_level=0.05)[source]¶
Create a ResultsAnalyzer from a pivot dict.
Convenience factory for external experiment systems that already have results in {dataset: {method: score}} form.
- Parameters:
pivot (Dict[str, Dict[str, float]]) – Mapping of dataset_name -> {method_name: score}.
metric (str, default="accuracy") – Name of the metric the scores represent.
higher_is_better (bool, default=True) – Whether higher metric values are better.
significance_level (float, default=0.05) – Alpha level for statistical tests.
- Return type:
- Returns:
ResultsAnalyzer – Analyzer ready for ranking, comparison, and statistical tests.
Examples
>>> pivot = { ... "iris": {"RF": 0.95, "XGB": 0.96}, ... "wine": {"RF": 0.97, "XGB": 0.95}, ... } >>> analyzer = ResultsAnalyzer.from_pivot(pivot, metric="accuracy") >>> print(analyzer.summary_table())
- property df¶
Get results as DataFrame.
- get_pivot_table(metric=None)[source]¶
Get pivot table of models vs datasets.
- Parameters:
metric (str, optional) – Metric to use. If None, uses default metric.
- Returns:
DataFrame – Pivot table with models as rows, datasets as columns.
- rank_models(method=RankingMethod.MEAN_RANK, metric=None)[source]¶
Rank models across all datasets.
- Parameters:
method (RankingMethod) – Ranking method to use.
metric (str, optional) – Metric to rank by.
- Return type:
- Returns:
Dict[str, float] – Model name to rank/score mapping (sorted).
- compare_models(model_a, model_b, metric=None, test='wilcoxon')[source]¶
Compare two models statistically.
- get_dataset_summary(dataset_name, metric=None)[source]¶
Get detailed summary for a specific dataset.
- class endgame.benchmark.RankingMethod(*values)[source]¶
-
Methods for ranking models.
- MEAN_SCORE = 'mean_score'¶
- MEAN_RANK = 'mean_rank'¶
- WIN_COUNT = 'win_count'¶
- BORDA_COUNT = 'borda_count'¶
- FRIEDMAN = 'friedman'¶
- class endgame.benchmark.MetaLearner(approach='ranking', base_estimator=None, metric='accuracy', n_top_models=3, random_state=42, verbose=False)[source]¶
Bases:
objectLearn to predict optimal models from dataset meta-features.
Trains a meta-model that predicts which model will perform best on a new dataset based on its meta-features.
- Parameters:
approach (str, default="ranking") – Meta-learning approach: - “ranking”: Predict model rankings - “classification”: Predict best model (classification) - “regression”: Predict model scores (regression)
base_estimator (BaseEstimator, optional) – Base model for meta-learning. If None, uses RandomForest.
metric (str, default="accuracy") – Target metric to optimize.
n_top_models (int, default=3) – Number of top models to consider for recommendations.
random_state (int, default=42) – Random seed.
verbose (bool, default=False) – Enable verbose output.
Examples
>>> # Train meta-learner from benchmark results >>> meta_learner = MetaLearner() >>> meta_learner.fit(tracker) >>> >>> # Get recommendation for new dataset >>> recommendation = meta_learner.recommend(X_new, y_new) >>> print(f"Best model: {recommendation.model_name}")
- fit(tracker, metric=None)[source]¶
Fit meta-learner from benchmark results.
- Parameters:
tracker (ExperimentTracker) – Tracker containing benchmark results.
metric (str, optional) – Override target metric.
- Return type:
- Returns:
self
- recommend(X, y, categorical_indicator=None, task_type='classification')[source]¶
Get model recommendation for a new dataset.
- Parameters:
- Return type:
ModelRecommendation- Returns:
ModelRecommendation – Recommended model with confidence and alternatives.
- recommend_from_features(meta_features)[source]¶
Get recommendation from pre-computed meta-features.
- Parameters:
meta_features (MetaFeatureSet or Dict) – Pre-computed meta-features.
- Return type:
ModelRecommendation- Returns:
ModelRecommendation – Recommended model.
- class endgame.benchmark.PipelineRecommender(meta_learner=None, preprocessing_options=None, verbose=False)[source]¶
Bases:
objectRecommend complete pipelines (preprocessing + model) for new datasets.
Extends MetaLearner to recommend full preprocessing pipelines in addition to models.
- Parameters:
meta_learner (MetaLearner, optional) – Pre-trained meta-learner.
preprocessing_options (List[str], default=["none", "scaling", "imputation"]) – Available preprocessing options.
verbose (bool, default=False) – Enable verbose output.
Examples
>>> recommender = PipelineRecommender() >>> recommender.fit(tracker) >>> pipeline = recommender.recommend_pipeline(X, y) >>> print(pipeline)
- fit(tracker, **kwargs)[source]¶
Fit recommender from benchmark results.
- Return type:
- Parameters:
tracker (ExperimentTracker)
- class endgame.benchmark.BenchmarkReportGenerator(tracker, title='Endgame Benchmark Report')[source]¶
Bases:
objectGenerate HTML reports from benchmark results.
- Parameters:
tracker (ExperimentTracker) – The experiment tracker with benchmark results.
title (str, optional) – Report title.
Examples
>>> from endgame.benchmark import BenchmarkRunner, BenchmarkReportGenerator >>> runner = BenchmarkRunner(suite="sklearn-classic") >>> tracker = runner.run(models) >>> report = BenchmarkReportGenerator(tracker) >>> report.generate("benchmark_report.html")
- add_interpretability_output(model_name, dataset_name, output, output_type='text')[source]¶
Add interpretability output for a model.
- endgame.benchmark.extract_interpretability_outputs(models, X_sample, y_sample, dataset_name, feature_names=None)[source]¶
Extract interpretability outputs from fitted models.
- Parameters:
- Return type:
- Returns:
Dict[str, str] – Dictionary mapping model names to their interpretability outputs.
- class endgame.benchmark.LearningCurveExperiment(suite, config=None, max_datasets=None, verbose=True)[source]¶
Bases:
objectRun learning curve experiments across datasets.
Implements the LCDB (Learning Curve Database) protocol for systematic evaluation of sample efficiency.
- Parameters:
suite (str or List[DatasetInfo]) – Benchmark suite name or list of datasets.
config (LearningCurveConfig, optional) – Experiment configuration.
max_datasets (int, optional) – Maximum number of datasets.
verbose (bool) – Enable verbose output.
Examples
>>> from endgame.benchmark import LearningCurveExperiment, LearningCurveConfig >>> from endgame.models import LGBMWrapper >>> >>> config = LearningCurveConfig(anchors=[0.1, 0.5, 1.0], n_seeds=3) >>> exp = LearningCurveExperiment(suite="sklearn-classic", config=config) >>> >>> models = [ ... ("LGBM", LGBMWrapper(preset="fast")), ... ] >>> results = exp.run(models) >>> print(results.summary())
- class endgame.benchmark.LearningCurveConfig(anchors=<factory>, n_seeds=5, cv_folds=0, test_fraction=0.2, metrics_classification=<factory>, metrics_regression=<factory>, timeout_per_fit=600, random_state=42, verbose=True)[source]¶
Bases:
objectConfiguration for learning curve experiments.
- Parameters:
anchors (List[float]) – Training set fractions (LCDB protocol default).
n_seeds (int) – Number of random seeds per anchor point.
cv_folds (int) – Cross-validation folds per seed (0 = holdout only).
test_fraction (float) – Holdout test set fraction.
metrics_classification (List[str]) – Metrics for classification tasks.
metrics_regression (List[str]) – Metrics for regression tasks.
timeout_per_fit (int) – Timeout per model fit in seconds.
random_state (int) – Base random seed.
verbose (bool) – Enable verbose output.
- class endgame.benchmark.LearningCurveResults(records=<factory>, config=None)[source]¶
Bases:
objectContainer for learning curve results with analysis methods.
- Parameters:
records (list[LearningCurveRecord])
config (LearningCurveConfig | None)
- records¶
All experiment records.
- Type:
List[LearningCurveRecord]
- config¶
Configuration used.
- Type:
- records: list[LearningCurveRecord]¶
- config: LearningCurveConfig | None = None¶
- add_record(record)[source]¶
Add a record to results.
- Parameters:
record (LearningCurveRecord)
- to_dataframe()[source]¶
Convert results to DataFrame.
- Returns:
DataFrame – Results in tabular format.
- save(path)[source]¶
Save results to file.
- Parameters:
path (str) – Output path (.parquet, .csv, or .json).
- get_learning_curve(dataset, model, metric='accuracy')[source]¶
Get learning curve for a specific dataset/model.
- compute_aulc(dataset, model, metric='accuracy')[source]¶
Compute Area Under Learning Curve.
Higher AULC indicates better sample efficiency (learns faster).
- class endgame.benchmark.LearningCurveRecord(dataset_name, model_name, anchor, n_train, seed, metrics, fit_time, status='success', error_message=None)[source]¶
Bases:
objectSingle learning curve data point.
- Parameters:
- endgame.benchmark.quick_learning_curve(model, X, y, anchors=None, n_seeds=3, test_fraction=0.2, random_state=42)[source]¶
Quick learning curve for a single model/dataset.
- Parameters:
- Return type:
- Returns:
anchors (ndarray) – Training fractions.
means (ndarray) – Mean accuracies.
stds (ndarray) – Standard deviations.
- endgame.benchmark.make_rotated_blobs(n_samples=1000, n_features=10, n_classes=3, rotation_angle=45.0, cluster_std=1.0, noise=0.0, random_state=None)[source]¶
Generate synthetic dataset with known rotation.
Creates Gaussian blobs that are axis-aligned in a rotated coordinate system. rotation learning should be able to recover the rotation and achieve high accuracy by axis-aligned splits in the rotated space.
This is the critical control experiment from the paper. Standard GBDTs fail on this because the decision boundaries are diagonal, while rotation learning should match MLP performance by learning the rotation.
- Parameters:
n_samples (int, default=1000) – Number of samples.
n_features (int, default=10) – Number of features.
n_classes (int, default=3) – Number of classes (blob centers).
rotation_angle (float, default=45.0) – Rotation angle in degrees applied pairwise to features.
cluster_std (float, default=1.0) – Standard deviation of clusters before rotation.
noise (float, default=0.0) – Additional Gaussian noise after rotation.
random_state (int, optional) – Random seed.
- Return type:
- Returns:
DatasetInfo – Synthetic dataset with metadata including ground truth rotation.
Examples
>>> from endgame.benchmark.synthetic import make_rotated_blobs >>> dataset = make_rotated_blobs(n_samples=500, rotation_angle=45.0) >>> print(dataset.name) synthetic_rotated_45 >>> print(dataset.metadata['ground_truth_rotation'].shape) (10, 10)
Generate dataset with hidden linear structure.
The true decision boundary is simple (axis-aligned) in a rotated coordinate system. This tests whether rotation learning can discover the useful feature combinations.
- Parameters:
n_samples (int, default=1000) – Number of samples.
n_features (int, default=20) – Total number of features.
n_informative (int, default=5) – Number of truly informative features.
structure_type (str, default='diagonal') – Type of hidden structure: - ‘diagonal’: Linear combination of pairs - ‘block’: Block structure in feature space - ‘random’: Random orthogonal transformation
flip_y (float, default=0.01) – Fraction of labels to flip (noise).
random_state (int, optional) – Random seed.
- Return type:
- Returns:
DatasetInfo – Synthetic dataset with hidden structure.
- endgame.benchmark.make_xor_rotated(n_samples=1000, n_features=10, rotation_angle=45.0, noise=0.1, random_state=None)[source]¶
Generate XOR problem in rotated space.
Classic XOR problem where the decision boundary is the product of two features, but rotated so that axis-aligned trees fail.
- Parameters:
- Return type:
- Returns:
DatasetInfo – XOR dataset with rotation.
- endgame.benchmark.make_regression_rotated(n_samples=1000, n_features=10, n_informative=5, rotation_angle=45.0, noise=0.1, random_state=None)[source]¶
Generate regression dataset with rotated structure.
Linear regression problem where the true coefficients are axis-aligned in a rotated space.
- Parameters:
n_samples (int, default=1000) – Number of samples.
n_features (int, default=10) – Total features.
n_informative (int, default=5) – Number of features with non-zero coefficients.
rotation_angle (float, default=45.0) – Rotation angle.
noise (float, default=0.1) – Target noise level.
random_state (int, optional) – Random seed.
- Return type:
- Returns:
DatasetInfo – Regression dataset.
- endgame.benchmark.get_synthetic_suite(random_state=42)[source]¶
Get dictionary of all synthetic datasets for benchmarking.
Returns a comprehensive suite of synthetic datasets designed to test rotation learning methods.
- Parameters:
random_state (int, default=42) – Random seed for reproducibility.
- Return type:
WSGIEnvironment[Text,DatasetInfo]- Returns:
Dict[str, DatasetInfo] – Dictionary mapping dataset names to DatasetInfo objects.
Examples
>>> from endgame.benchmark.synthetic import get_synthetic_suite >>> suite = get_synthetic_suite() >>> for name, dataset in suite.items(): ... print(f"{name}: {dataset.n_samples} samples, {dataset.n_features} features")
- endgame.benchmark.get_control_dataset(random_state=42)[source]¶
Get the primary control dataset from the paper.
This is the Synthetic Rotated dataset used as the critical control experiment. Standard GBDTs should fail here while rotation learning should recover the rotation and match MLP performance.
- Parameters:
random_state (int, default=42) – Random seed.
- Return type:
- Returns:
DatasetInfo – The control dataset.