Anomaly Detection¶

class endgame.anomaly.IsolationForestDetector(n_estimators=200, contamination='auto', max_samples='auto', max_features=1.0, bootstrap=True, n_jobs=-1, random_state=None, warm_start=False)[source]¶

Bases: BaseEstimator, OutlierMixin

Isolation Forest with competition-tuned defaults.

This wrapper provides sensible defaults optimized for competition performance: - Higher n_estimators (200 vs sklearn’s 100) - Bootstrap sampling enabled - max_features tuned for high-dimensional data - Consistent scoring convention (higher = more anomalous)

Parameters:

n_estimators (int, default=200) – Number of isolation trees. More trees = more stable anomaly scores.
contamination (float or 'auto', default='auto') – Expected proportion of anomalies. ‘auto’ uses heuristic based on training data distribution.
max_samples (int or float or 'auto', default='auto') – Number of samples to draw for each tree. - ‘auto’: min(256, n_samples) - int: exact number of samples - float: fraction of samples
max_features (float or int, default=1.0) – Features to draw for each tree. - float: fraction of features - int: exact number of features
bootstrap (bool, default=True) – Whether to bootstrap samples. True improves diversity.
n_jobs (int, default=-1) – Parallel jobs for fitting trees. -1 uses all cores.
random_state (int or None, default=None) – Random seed for reproducibility.
warm_start (bool, default=False) – Reuse trees from previous fit and add more.

model_¶

Fitted sklearn IsolationForest instance.

Type:: IsolationForest

threshold_¶

Decision threshold for binary anomaly classification.

Type:: float

Examples

>>> from endgame.anomaly import IsolationForestDetector
>>> detector = IsolationForestDetector(contamination=0.1)
>>> detector.fit(X_train)
>>> scores = detector.decision_function(X_test)  # Higher = more anomalous
>>> labels = detector.predict(X_test)  # 1 = anomaly, 0 = normal

fit(X, y=None)[source]¶

Fit the Isolation Forest on training data.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (ignored) – Not used, present for API consistency.

Return type:

IsolationForestDetector

Returns:

self (IsolationForestDetector) – Fitted detector.

decision_function(X)[source]¶

Compute anomaly scores for samples.

Higher scores indicate more anomalous samples (opposite of sklearn convention).

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to score.
Return type:: ndarray
Returns:: scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.

predict(X)[source]¶

Predict anomaly labels.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to classify.
Return type:: ndarray
Returns:: labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

fit_predict(X, y=None)[source]¶

Fit and predict anomaly labels.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training samples.
y (ignored) – Not used, present for API consistency.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

score_samples(X)[source]¶

Return raw anomaly scores (average path length).

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to score.
Return type:: ndarray
Returns:: scores (ndarray of shape (n_samples,)) – Average path lengths (lower = more anomalous).

class endgame.anomaly.LocalOutlierFactorDetector(n_neighbors=20, contamination='auto', algorithm='auto', leaf_size=30, metric='minkowski', p=2, novelty=True, n_jobs=-1)[source]¶

Bases: BaseEstimator, OutlierMixin

Local Outlier Factor with competition-tuned defaults.

LOF compares the local density of a point with that of its neighbors. Points with substantially lower density are considered outliers. Effective for detecting local anomalies in non-uniform distributions.

Parameters:

n_neighbors (int, default=20) – Number of neighbors for density estimation. Higher values make the detector more robust but may miss small local anomalies.
contamination (float or 'auto', default='auto') – Expected proportion of anomalies. Used for threshold setting.
algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto') – Algorithm for nearest neighbor queries.
leaf_size (int, default=30) – Leaf size for tree algorithms.
metric (str or callable, default='minkowski') – Distance metric for neighbor queries.
p (int, default=2) – Power parameter for Minkowski metric (2 = Euclidean).
novelty (bool, default=True) – Whether to use LOF for novelty detection (scoring new samples). True enables predict() and decision_function() on unseen data.
n_jobs (int, default=-1) – Parallel jobs for neighbor queries. -1 uses all cores.

model_¶

Fitted sklearn LOF instance.

Type:: LocalOutlierFactor

threshold_¶

Decision threshold for binary classification.

Type:: float

Examples

>>> from endgame.anomaly import LocalOutlierFactorDetector
>>> detector = LocalOutlierFactorDetector(contamination=0.1)
>>> detector.fit(X_train)
>>> scores = detector.decision_function(X_test)  # Higher = more anomalous
>>> labels = detector.predict(X_test)  # 1 = anomaly, 0 = normal

fit(X, y=None)[source]¶

Fit the LOF model on training data.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data (assumed to be mostly normal).
y (ignored) – Not used, present for API consistency.

Return type:

LocalOutlierFactorDetector

Returns:

self (LocalOutlierFactorDetector) – Fitted detector.

decision_function(X)[source]¶

Compute anomaly scores for samples.

Higher scores indicate more anomalous samples.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to score.
Return type:: ndarray
Returns:: scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.

predict(X)[source]¶

Predict anomaly labels.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to classify.
Return type:: ndarray
Returns:: labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

fit_predict(X, y=None)[source]¶

Fit and predict anomaly labels on training data.

Note: For LOF, this uses the transductive scores computed during fit, not the inductive scores from predict().

Parameters:

X (array-like of shape (n_samples, n_features)) – Training samples.
y (ignored) – Not used, present for API consistency.

Return type:

ndarray

Returns:

labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

score_samples(X)[source]¶

Return negative LOF scores (sklearn convention).

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to score.
Return type:: ndarray
Returns:: scores (ndarray of shape (n_samples,)) – Negative LOF scores (higher = more normal).

class endgame.anomaly.GritBotDetector(max_conditions=4, filtering_level=50.0, contamination=0.01, min_cases=None, categorical_features=None, n_jobs=1, random_state=None)[source]¶

Bases: BaseEstimator, OutlierMixin

GritBot-style anomaly detection via recursive partitioning.

GritBot finds anomalies by recursively partitioning data to find homogeneous subsets, then identifying values that are surprising given the subset context. This approach is particularly effective for: - Data with mixed attribute types - Context-dependent anomalies (value is only anomalous in certain contexts) - Interpretable anomaly explanations

Parameters:

max_conditions (int, default=4) – Maximum number of conditions (splits) defining a subset context.
filtering_level (float, default=50.0) – Controls sensitivity (0-100). Higher = fewer but more confident anomalies. - 0: MINABNORM=4 (more sensitive) - 50: MINABNORM=8 (default) - 100: MINABNORM=20 (very conservative)
contamination (float, default=0.01) – Maximum expected proportion of anomalies.
min_cases (int or None, default=None) – Minimum cases in a subset to check for anomalies. None uses max(35, 0.5% of data).
categorical_features (list or None, default=None) – Indices of categorical features. If None, auto-detected.
n_jobs (int, default=1) – Parallel jobs (currently not used, reserved for future).
random_state (int or None, default=None) – Random seed for reproducibility.

anomalies_¶

Detected anomalies with full context.

Type:: list[Anomaly]

anomaly_indices_¶

Indices of detected anomaly cases.

Type:: np.ndarray

anomaly_scores_¶

Scores for each sample (higher = more anomalous).

Type:: np.ndarray

References

Quinlan, J.R. (2010). GritBot GPL Edition. Rulequest Research.

Examples

>>> from endgame.anomaly import GritBotDetector
>>> detector = GritBotDetector(filtering_level=50)
>>> detector.fit(X_train)
>>> scores = detector.decision_function(X_test)
>>> labels = detector.predict(X_test)  # 1 = anomaly
>>>
>>> # Get interpretable anomaly explanations
>>> for anomaly in detector.anomalies_[:5]:
...     print(f"Case {anomaly.case_idx}: feature {anomaly.feature_idx}")
...     print(f"  Value: {anomaly.value}, Expected: {anomaly.expected_value}")
...     print(f"  Context: {anomaly.context.conditions}")

fit(X, y=None)[source]¶

Fit the GritBot detector and find anomalies.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (ignored) – Not used.

Return type:

GritBotDetector

Returns:

self (GritBotDetector) – Fitted detector.

decision_function(X)[source]¶

Compute anomaly scores for samples.

Higher scores indicate more anomalous samples.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to score.
Return type:: ndarray
Returns:: scores (ndarray of shape (n_samples,)) – Anomaly scores.

predict(X)[source]¶

Predict anomaly labels.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to classify.
Return type:: ndarray
Returns:: labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

fit_predict(X, y=None)[source]¶

Fit and return anomaly labels for training data.

Return type:: ndarray
Parameters:: X (ArrayLike)

get_anomaly_report(max_anomalies=10)[source]¶

Generate a human-readable anomaly report.

Parameters:: max_anomalies (int, default=10) – Maximum anomalies to include in report.
Return type:: Text
Returns:: report (str) – Formatted anomaly report.

class endgame.anomaly.Anomaly(case_idx, feature_idx, value, score, context, group_size, group_mean=0.0, group_std=0.0, expected_value=None)[source]¶

Bases: object

Detected anomaly with context.

Parameters:

case_idx (int)
feature_idx (int)
value (Any)
score (float)
context (AnomalyContext)
group_size (int)
group_mean (float)
group_std (float)
expected_value (Any)

case_idx: int¶

feature_idx: int¶

value: Any¶

score: float¶

context: AnomalyContext¶

group_size: int¶

group_mean: float = 0.0¶

group_std: float = 0.0¶

expected_value: Any = None¶

class endgame.anomaly.AnomalyContext(conditions=<factory>)[source]¶

Bases: object

Context conditions that define when an anomaly occurs.

Parameters:: conditions (list[tuple[int, str, Any, Any]])

conditions: list[tuple[int, str, Any, Any]]¶

class endgame.anomaly.PyODDetector(algorithm='ECOD', contamination=0.1, random_state=None, **kwargs)[source]¶

Bases: BaseEstimator, OutlierMixin

Universal wrapper for PyOD anomaly detection algorithms.

This wrapper provides a unified sklearn-compatible interface to all PyOD algorithms, with consistent scoring conventions and automatic hyperparameter defaults.

Parameters:

algorithm (str, default='ECOD') – Name of the PyOD algorithm. See PYOD_ALGORITHMS for available options. Popular choices: - ‘ECOD’: Empirical Cumulative Distribution (fast, parameter-free) - ‘COPOD’: Copula-Based (fast, parameter-free) - ‘IForest’: Isolation Forest - ‘LOF’: Local Outlier Factor - ‘KNN’: K-Nearest Neighbors - ‘HBOS’: Histogram-Based (very fast) - ‘PCA’: Principal Component Analysis - ‘AutoEncoder’: Deep learning autoencoder
contamination (float, default=0.1) – Expected proportion of anomalies.
random_state (int or None, default=None) – Random seed for reproducibility.
**kwargs (dict) – Additional algorithm-specific parameters passed to the PyOD model.

model_¶

Fitted PyOD detector instance.

Type:: PyOD model

threshold_¶

Decision threshold for binary classification.

Type:: float

Examples

>>> from endgame.anomaly import PyODDetector, PYOD_ALGORITHMS
>>>
>>> # List available algorithms
>>> print(list(PYOD_ALGORITHMS.keys()))
>>>
>>> # Fast parameter-free detection
>>> detector = PyODDetector(algorithm='ECOD')
>>> detector.fit(X_train)
>>> scores = detector.decision_function(X_test)
>>>
>>> # KNN-based detection
>>> detector = PyODDetector(algorithm='KNN', n_neighbors=15)
>>> detector.fit(X_train)
>>> labels = detector.predict(X_test)
>>>
>>> # Deep learning detector
>>> detector = PyODDetector(
...     algorithm='AutoEncoder',
...     hidden_neurons=[128, 64, 64, 128],
...     epochs=50
... )
>>> detector.fit(X_train)

fit(X, y=None)[source]¶

Fit the PyOD detector on training data.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.
y (ignored) – Not used, present for API consistency.

Return type:

PyODDetector

Returns:

self (PyODDetector) – Fitted detector.

decision_function(X)[source]¶

Compute anomaly scores for samples.

Higher scores indicate more anomalous samples.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to score.
Return type:: ndarray
Returns:: scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.

predict(X)[source]¶

Predict anomaly labels.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to classify.
Return type:: ndarray
Returns:: labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.

fit_predict(X, y=None)[source]¶

Fit and predict anomaly labels.

Return type:: ndarray
Parameters:: X (ArrayLike)

predict_proba(X)[source]¶

Predict anomaly probabilities.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to classify.
Return type:: ndarray
Returns:: proba (ndarray of shape (n_samples, 2)) – Probabilities for [normal, anomaly] classes.

predict_confidence(X)[source]¶

Return prediction confidence scores.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to score.
Return type:: ndarray
Returns:: confidence (ndarray of shape (n_samples,)) – Confidence scores (higher = more confident prediction).

property available_algorithms: list[str]¶: List of available PyOD algorithms.

endgame.anomaly.create_detector_ensemble(algorithms=None, contamination=0.1, random_state=None)[source]¶

Create an ensemble of diverse PyOD detectors.

Parameters:

algorithms (list of str or None, default=None) – Algorithms to include. None uses a default diverse set: [‘ECOD’, ‘COPOD’, ‘IForest’, ‘LOF’, ‘KNN’, ‘HBOS’]
contamination (float, default=0.1) – Expected proportion of anomalies.
random_state (int or None, default=None) – Random seed for reproducibility.

Return type:

list[PyODDetector]

Returns:

detectors (list of PyODDetector) – List of configured detectors ready for fitting.

Examples

>>> from endgame.anomaly import create_detector_ensemble
>>> detectors = create_detector_ensemble(contamination=0.05)
>>> for det in detectors:
...     det.fit(X_train)
>>> # Combine scores
>>> scores = np.mean([d.decision_function(X_test) for d in detectors], axis=0)