Anomaly Detection

class endgame.anomaly.IsolationForestDetector(n_estimators=200, contamination='auto', max_samples='auto', max_features=1.0, bootstrap=True, n_jobs=-1, random_state=None, warm_start=False)[source]

Bases: BaseEstimator, OutlierMixin

Isolation Forest with competition-tuned defaults.

This wrapper provides sensible defaults optimized for competition performance: - Higher n_estimators (200 vs sklearn’s 100) - Bootstrap sampling enabled - max_features tuned for high-dimensional data - Consistent scoring convention (higher = more anomalous)

Parameters:
  • n_estimators (int, default=200) – Number of isolation trees. More trees = more stable anomaly scores.

  • contamination (float or 'auto', default='auto') – Expected proportion of anomalies. ‘auto’ uses heuristic based on training data distribution.

  • max_samples (int or float or 'auto', default='auto') – Number of samples to draw for each tree. - ‘auto’: min(256, n_samples) - int: exact number of samples - float: fraction of samples

  • max_features (float or int, default=1.0) – Features to draw for each tree. - float: fraction of features - int: exact number of features

  • bootstrap (bool, default=True) – Whether to bootstrap samples. True improves diversity.

  • n_jobs (int, default=-1) – Parallel jobs for fitting trees. -1 uses all cores.

  • random_state (int or None, default=None) – Random seed for reproducibility.

  • warm_start (bool, default=False) – Reuse trees from previous fit and add more.

model_

Fitted sklearn IsolationForest instance.

Type:

IsolationForest

threshold_

Decision threshold for binary anomaly classification.

Type:

float

Examples

>>> from endgame.anomaly import IsolationForestDetector
>>> detector = IsolationForestDetector(contamination=0.1)
>>> detector.fit(X_train)
>>> scores = detector.decision_function(X_test)  # Higher = more anomalous
>>> labels = detector.predict(X_test)  # 1 = anomaly, 0 = normal
fit(X, y=None)[source]

Fit the Isolation Forest on training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (ignored) – Not used, present for API consistency.

Return type:

IsolationForestDetector

Returns:

self (IsolationForestDetector) – Fitted detector.

decision_function(X)[source]

Compute anomaly scores for samples.

Higher scores indicate more anomalous samples (opposite of sklearn convention).

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to score.

Return type:

ndarray

Returns:

scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.

predict(X)[source]

Predict anomaly labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to classify.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

fit_predict(X, y=None)[source]

Fit and predict anomaly labels.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training samples.

  • y (ignored) – Not used, present for API consistency.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

score_samples(X)[source]

Return raw anomaly scores (average path length).

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to score.

Return type:

ndarray

Returns:

scores (ndarray of shape (n_samples,)) – Average path lengths (lower = more anomalous).

class endgame.anomaly.LocalOutlierFactorDetector(n_neighbors=20, contamination='auto', algorithm='auto', leaf_size=30, metric='minkowski', p=2, novelty=True, n_jobs=-1)[source]

Bases: BaseEstimator, OutlierMixin

Local Outlier Factor with competition-tuned defaults.

LOF compares the local density of a point with that of its neighbors. Points with substantially lower density are considered outliers. Effective for detecting local anomalies in non-uniform distributions.

Parameters:
  • n_neighbors (int, default=20) – Number of neighbors for density estimation. Higher values make the detector more robust but may miss small local anomalies.

  • contamination (float or 'auto', default='auto') – Expected proportion of anomalies. Used for threshold setting.

  • algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto') – Algorithm for nearest neighbor queries.

  • leaf_size (int, default=30) – Leaf size for tree algorithms.

  • metric (str or callable, default='minkowski') – Distance metric for neighbor queries.

  • p (int, default=2) – Power parameter for Minkowski metric (2 = Euclidean).

  • novelty (bool, default=True) – Whether to use LOF for novelty detection (scoring new samples). True enables predict() and decision_function() on unseen data.

  • n_jobs (int, default=-1) – Parallel jobs for neighbor queries. -1 uses all cores.

model_

Fitted sklearn LOF instance.

Type:

LocalOutlierFactor

threshold_

Decision threshold for binary classification.

Type:

float

Examples

>>> from endgame.anomaly import LocalOutlierFactorDetector
>>> detector = LocalOutlierFactorDetector(contamination=0.1)
>>> detector.fit(X_train)
>>> scores = detector.decision_function(X_test)  # Higher = more anomalous
>>> labels = detector.predict(X_test)  # 1 = anomaly, 0 = normal
fit(X, y=None)[source]

Fit the LOF model on training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data (assumed to be mostly normal).

  • y (ignored) – Not used, present for API consistency.

Return type:

LocalOutlierFactorDetector

Returns:

self (LocalOutlierFactorDetector) – Fitted detector.

decision_function(X)[source]

Compute anomaly scores for samples.

Higher scores indicate more anomalous samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to score.

Return type:

ndarray

Returns:

scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.

predict(X)[source]

Predict anomaly labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to classify.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

fit_predict(X, y=None)[source]

Fit and predict anomaly labels on training data.

Note: For LOF, this uses the transductive scores computed during fit, not the inductive scores from predict().

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training samples.

  • y (ignored) – Not used, present for API consistency.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

score_samples(X)[source]

Return negative LOF scores (sklearn convention).

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to score.

Return type:

ndarray

Returns:

scores (ndarray of shape (n_samples,)) – Negative LOF scores (higher = more normal).

class endgame.anomaly.GritBotDetector(max_conditions=4, filtering_level=50.0, contamination=0.01, min_cases=None, categorical_features=None, n_jobs=1, random_state=None)[source]

Bases: BaseEstimator, OutlierMixin

GritBot-style anomaly detection via recursive partitioning.

GritBot finds anomalies by recursively partitioning data to find homogeneous subsets, then identifying values that are surprising given the subset context. This approach is particularly effective for: - Data with mixed attribute types - Context-dependent anomalies (value is only anomalous in certain contexts) - Interpretable anomaly explanations

Parameters:
  • max_conditions (int, default=4) – Maximum number of conditions (splits) defining a subset context.

  • filtering_level (float, default=50.0) – Controls sensitivity (0-100). Higher = fewer but more confident anomalies. - 0: MINABNORM=4 (more sensitive) - 50: MINABNORM=8 (default) - 100: MINABNORM=20 (very conservative)

  • contamination (float, default=0.01) – Maximum expected proportion of anomalies.

  • min_cases (int or None, default=None) – Minimum cases in a subset to check for anomalies. None uses max(35, 0.5% of data).

  • categorical_features (list or None, default=None) – Indices of categorical features. If None, auto-detected.

  • n_jobs (int, default=1) – Parallel jobs (currently not used, reserved for future).

  • random_state (int or None, default=None) – Random seed for reproducibility.

anomalies_

Detected anomalies with full context.

Type:

list[Anomaly]

anomaly_indices_

Indices of detected anomaly cases.

Type:

np.ndarray

anomaly_scores_

Scores for each sample (higher = more anomalous).

Type:

np.ndarray

References

Quinlan, J.R. (2010). GritBot GPL Edition. Rulequest Research.

Examples

>>> from endgame.anomaly import GritBotDetector
>>> detector = GritBotDetector(filtering_level=50)
>>> detector.fit(X_train)
>>> scores = detector.decision_function(X_test)
>>> labels = detector.predict(X_test)  # 1 = anomaly
>>>
>>> # Get interpretable anomaly explanations
>>> for anomaly in detector.anomalies_[:5]:
...     print(f"Case {anomaly.case_idx}: feature {anomaly.feature_idx}")
...     print(f"  Value: {anomaly.value}, Expected: {anomaly.expected_value}")
...     print(f"  Context: {anomaly.context.conditions}")
fit(X, y=None)[source]

Fit the GritBot detector and find anomalies.

Parameters:
Return type:

GritBotDetector

Returns:

self (GritBotDetector) – Fitted detector.

decision_function(X)[source]

Compute anomaly scores for samples.

Higher scores indicate more anomalous samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to score.

Return type:

ndarray

Returns:

scores (ndarray of shape (n_samples,)) – Anomaly scores.

predict(X)[source]

Predict anomaly labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to classify.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

fit_predict(X, y=None)[source]

Fit and return anomaly labels for training data.

Return type:

ndarray

Parameters:

X (ArrayLike)

get_anomaly_report(max_anomalies=10)[source]

Generate a human-readable anomaly report.

Parameters:

max_anomalies (int, default=10) – Maximum anomalies to include in report.

Return type:

Text

Returns:

report (str) – Formatted anomaly report.

class endgame.anomaly.Anomaly(case_idx, feature_idx, value, score, context, group_size, group_mean=0.0, group_std=0.0, expected_value=None)[source]

Bases: object

Detected anomaly with context.

Parameters:
case_idx: int
feature_idx: int
value: Any
score: float
context: AnomalyContext
group_size: int
group_mean: float = 0.0
group_std: float = 0.0
expected_value: Any = None
class endgame.anomaly.AnomalyContext(conditions=<factory>)[source]

Bases: object

Context conditions that define when an anomaly occurs.

Parameters:

conditions (list[tuple[int, str, Any, Any]])

conditions: list[tuple[int, str, Any, Any]]
class endgame.anomaly.PyODDetector(algorithm='ECOD', contamination=0.1, random_state=None, **kwargs)[source]

Bases: BaseEstimator, OutlierMixin

Universal wrapper for PyOD anomaly detection algorithms.

This wrapper provides a unified sklearn-compatible interface to all PyOD algorithms, with consistent scoring conventions and automatic hyperparameter defaults.

Parameters:
  • algorithm (str, default='ECOD') – Name of the PyOD algorithm. See PYOD_ALGORITHMS for available options. Popular choices: - ‘ECOD’: Empirical Cumulative Distribution (fast, parameter-free) - ‘COPOD’: Copula-Based (fast, parameter-free) - ‘IForest’: Isolation Forest - ‘LOF’: Local Outlier Factor - ‘KNN’: K-Nearest Neighbors - ‘HBOS’: Histogram-Based (very fast) - ‘PCA’: Principal Component Analysis - ‘AutoEncoder’: Deep learning autoencoder

  • contamination (float, default=0.1) – Expected proportion of anomalies.

  • random_state (int or None, default=None) – Random seed for reproducibility.

  • **kwargs (dict) – Additional algorithm-specific parameters passed to the PyOD model.

model_

Fitted PyOD detector instance.

Type:

PyOD model

threshold_

Decision threshold for binary classification.

Type:

float

Examples

>>> from endgame.anomaly import PyODDetector, PYOD_ALGORITHMS
>>>
>>> # List available algorithms
>>> print(list(PYOD_ALGORITHMS.keys()))
>>>
>>> # Fast parameter-free detection
>>> detector = PyODDetector(algorithm='ECOD')
>>> detector.fit(X_train)
>>> scores = detector.decision_function(X_test)
>>>
>>> # KNN-based detection
>>> detector = PyODDetector(algorithm='KNN', n_neighbors=15)
>>> detector.fit(X_train)
>>> labels = detector.predict(X_test)
>>>
>>> # Deep learning detector
>>> detector = PyODDetector(
...     algorithm='AutoEncoder',
...     hidden_neurons=[128, 64, 64, 128],
...     epochs=50
... )
>>> detector.fit(X_train)
fit(X, y=None)[source]

Fit the PyOD detector on training data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (ignored) – Not used, present for API consistency.

Return type:

PyODDetector

Returns:

self (PyODDetector) – Fitted detector.

decision_function(X)[source]

Compute anomaly scores for samples.

Higher scores indicate more anomalous samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to score.

Return type:

ndarray

Returns:

scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.

predict(X)[source]

Predict anomaly labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to classify.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

fit_predict(X, y=None)[source]

Fit and predict anomaly labels.

Return type:

ndarray

Parameters:

X (ArrayLike)

predict_proba(X)[source]

Predict anomaly probabilities.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to classify.

Return type:

ndarray

Returns:

proba (ndarray of shape (n_samples, 2)) – Probabilities for [normal, anomaly] classes.

predict_confidence(X)[source]

Return prediction confidence scores.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to score.

Return type:

ndarray

Returns:

confidence (ndarray of shape (n_samples,)) – Confidence scores (higher = more confident prediction).

property available_algorithms: list[str]

List of available PyOD algorithms.

endgame.anomaly.create_detector_ensemble(algorithms=None, contamination=0.1, random_state=None)[source]

Create an ensemble of diverse PyOD detectors.

Parameters:
  • algorithms (list of str or None, default=None) – Algorithms to include. None uses a default diverse set: [‘ECOD’, ‘COPOD’, ‘IForest’, ‘LOF’, ‘KNN’, ‘HBOS’]

  • contamination (float, default=0.1) – Expected proportion of anomalies.

  • random_state (int or None, default=None) – Random seed for reproducibility.

Return type:

list[PyODDetector]

Returns:

detectors (list of PyODDetector) – List of configured detectors ready for fitting.

Examples

>>> from endgame.anomaly import create_detector_ensemble
>>> detectors = create_detector_ensemble(contamination=0.05)
>>> for det in detectors:
...     det.fit(X_train)
>>> # Combine scores
>>> scores = np.mean([d.decision_function(X_test) for d in detectors], axis=0)