Anomaly Detection¶
- class endgame.anomaly.IsolationForestDetector(n_estimators=200, contamination='auto', max_samples='auto', max_features=1.0, bootstrap=True, n_jobs=-1, random_state=None, warm_start=False)[source]¶
Bases:
BaseEstimator,OutlierMixinIsolation Forest with competition-tuned defaults.
This wrapper provides sensible defaults optimized for competition performance: - Higher n_estimators (200 vs sklearn’s 100) - Bootstrap sampling enabled - max_features tuned for high-dimensional data - Consistent scoring convention (higher = more anomalous)
- Parameters:
n_estimators (int, default=200) – Number of isolation trees. More trees = more stable anomaly scores.
contamination (float or 'auto', default='auto') – Expected proportion of anomalies. ‘auto’ uses heuristic based on training data distribution.
max_samples (int or float or 'auto', default='auto') – Number of samples to draw for each tree. - ‘auto’: min(256, n_samples) - int: exact number of samples - float: fraction of samples
max_features (float or int, default=1.0) – Features to draw for each tree. - float: fraction of features - int: exact number of features
bootstrap (bool, default=True) – Whether to bootstrap samples. True improves diversity.
n_jobs (int, default=-1) – Parallel jobs for fitting trees. -1 uses all cores.
random_state (int or None, default=None) – Random seed for reproducibility.
warm_start (bool, default=False) – Reuse trees from previous fit and add more.
- model_¶
Fitted sklearn IsolationForest instance.
- Type:
IsolationForest
Examples
>>> from endgame.anomaly import IsolationForestDetector >>> detector = IsolationForestDetector(contamination=0.1) >>> detector.fit(X_train) >>> scores = detector.decision_function(X_test) # Higher = more anomalous >>> labels = detector.predict(X_test) # 1 = anomaly, 0 = normal
- fit(X, y=None)[source]¶
Fit the Isolation Forest on training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (ignored) – Not used, present for API consistency.
- Return type:
- Returns:
self (IsolationForestDetector) – Fitted detector.
- decision_function(X)[source]¶
Compute anomaly scores for samples.
Higher scores indicate more anomalous samples (opposite of sklearn convention).
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to score.
- Return type:
- Returns:
scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.
- predict(X)[source]¶
Predict anomaly labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to classify.
- Return type:
- Returns:
labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.
- fit_predict(X, y=None)[source]¶
Fit and predict anomaly labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training samples.
y (ignored) – Not used, present for API consistency.
- Return type:
- Returns:
labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.
- score_samples(X)[source]¶
Return raw anomaly scores (average path length).
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to score.
- Return type:
- Returns:
scores (ndarray of shape (n_samples,)) – Average path lengths (lower = more anomalous).
- class endgame.anomaly.LocalOutlierFactorDetector(n_neighbors=20, contamination='auto', algorithm='auto', leaf_size=30, metric='minkowski', p=2, novelty=True, n_jobs=-1)[source]¶
Bases:
BaseEstimator,OutlierMixinLocal Outlier Factor with competition-tuned defaults.
LOF compares the local density of a point with that of its neighbors. Points with substantially lower density are considered outliers. Effective for detecting local anomalies in non-uniform distributions.
- Parameters:
n_neighbors (int, default=20) – Number of neighbors for density estimation. Higher values make the detector more robust but may miss small local anomalies.
contamination (float or 'auto', default='auto') – Expected proportion of anomalies. Used for threshold setting.
algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto') – Algorithm for nearest neighbor queries.
leaf_size (int, default=30) – Leaf size for tree algorithms.
metric (str or callable, default='minkowski') – Distance metric for neighbor queries.
p (int, default=2) – Power parameter for Minkowski metric (2 = Euclidean).
novelty (bool, default=True) – Whether to use LOF for novelty detection (scoring new samples). True enables predict() and decision_function() on unseen data.
n_jobs (int, default=-1) – Parallel jobs for neighbor queries. -1 uses all cores.
- model_¶
Fitted sklearn LOF instance.
- Type:
LocalOutlierFactor
Examples
>>> from endgame.anomaly import LocalOutlierFactorDetector >>> detector = LocalOutlierFactorDetector(contamination=0.1) >>> detector.fit(X_train) >>> scores = detector.decision_function(X_test) # Higher = more anomalous >>> labels = detector.predict(X_test) # 1 = anomaly, 0 = normal
- fit(X, y=None)[source]¶
Fit the LOF model on training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data (assumed to be mostly normal).
y (ignored) – Not used, present for API consistency.
- Return type:
- Returns:
self (LocalOutlierFactorDetector) – Fitted detector.
- decision_function(X)[source]¶
Compute anomaly scores for samples.
Higher scores indicate more anomalous samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to score.
- Return type:
- Returns:
scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.
- predict(X)[source]¶
Predict anomaly labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to classify.
- Return type:
- Returns:
labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.
- fit_predict(X, y=None)[source]¶
Fit and predict anomaly labels on training data.
Note: For LOF, this uses the transductive scores computed during fit, not the inductive scores from predict().
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training samples.
y (ignored) – Not used, present for API consistency.
- Return type:
- Returns:
labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.
- score_samples(X)[source]¶
Return negative LOF scores (sklearn convention).
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to score.
- Return type:
- Returns:
scores (ndarray of shape (n_samples,)) – Negative LOF scores (higher = more normal).
- class endgame.anomaly.GritBotDetector(max_conditions=4, filtering_level=50.0, contamination=0.01, min_cases=None, categorical_features=None, n_jobs=1, random_state=None)[source]¶
Bases:
BaseEstimator,OutlierMixinGritBot-style anomaly detection via recursive partitioning.
GritBot finds anomalies by recursively partitioning data to find homogeneous subsets, then identifying values that are surprising given the subset context. This approach is particularly effective for: - Data with mixed attribute types - Context-dependent anomalies (value is only anomalous in certain contexts) - Interpretable anomaly explanations
- Parameters:
max_conditions (int, default=4) – Maximum number of conditions (splits) defining a subset context.
filtering_level (float, default=50.0) – Controls sensitivity (0-100). Higher = fewer but more confident anomalies. - 0: MINABNORM=4 (more sensitive) - 50: MINABNORM=8 (default) - 100: MINABNORM=20 (very conservative)
contamination (float, default=0.01) – Maximum expected proportion of anomalies.
min_cases (int or None, default=None) – Minimum cases in a subset to check for anomalies. None uses max(35, 0.5% of data).
categorical_features (list or None, default=None) – Indices of categorical features. If None, auto-detected.
n_jobs (int, default=1) – Parallel jobs (currently not used, reserved for future).
random_state (int or None, default=None) – Random seed for reproducibility.
- anomaly_indices_¶
Indices of detected anomaly cases.
- Type:
np.ndarray
- anomaly_scores_¶
Scores for each sample (higher = more anomalous).
- Type:
np.ndarray
References
Quinlan, J.R. (2010). GritBot GPL Edition. Rulequest Research.
Examples
>>> from endgame.anomaly import GritBotDetector >>> detector = GritBotDetector(filtering_level=50) >>> detector.fit(X_train) >>> scores = detector.decision_function(X_test) >>> labels = detector.predict(X_test) # 1 = anomaly >>> >>> # Get interpretable anomaly explanations >>> for anomaly in detector.anomalies_[:5]: ... print(f"Case {anomaly.case_idx}: feature {anomaly.feature_idx}") ... print(f" Value: {anomaly.value}, Expected: {anomaly.expected_value}") ... print(f" Context: {anomaly.context.conditions}")
- fit(X, y=None)[source]¶
Fit the GritBot detector and find anomalies.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (ignored) – Not used.
- Return type:
- Returns:
self (GritBotDetector) – Fitted detector.
- decision_function(X)[source]¶
Compute anomaly scores for samples.
Higher scores indicate more anomalous samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to score.
- Return type:
- Returns:
scores (ndarray of shape (n_samples,)) – Anomaly scores.
- predict(X)[source]¶
Predict anomaly labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to classify.
- Return type:
- Returns:
labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.
- class endgame.anomaly.Anomaly(case_idx, feature_idx, value, score, context, group_size, group_mean=0.0, group_std=0.0, expected_value=None)[source]¶
Bases:
objectDetected anomaly with context.
- Parameters:
- context: AnomalyContext¶
- class endgame.anomaly.AnomalyContext(conditions=<factory>)[source]¶
Bases:
objectContext conditions that define when an anomaly occurs.
- class endgame.anomaly.PyODDetector(algorithm='ECOD', contamination=0.1, random_state=None, **kwargs)[source]¶
Bases:
BaseEstimator,OutlierMixinUniversal wrapper for PyOD anomaly detection algorithms.
This wrapper provides a unified sklearn-compatible interface to all PyOD algorithms, with consistent scoring conventions and automatic hyperparameter defaults.
- Parameters:
algorithm (str, default='ECOD') – Name of the PyOD algorithm. See PYOD_ALGORITHMS for available options. Popular choices: - ‘ECOD’: Empirical Cumulative Distribution (fast, parameter-free) - ‘COPOD’: Copula-Based (fast, parameter-free) - ‘IForest’: Isolation Forest - ‘LOF’: Local Outlier Factor - ‘KNN’: K-Nearest Neighbors - ‘HBOS’: Histogram-Based (very fast) - ‘PCA’: Principal Component Analysis - ‘AutoEncoder’: Deep learning autoencoder
contamination (float, default=0.1) – Expected proportion of anomalies.
random_state (int or None, default=None) – Random seed for reproducibility.
**kwargs (dict) – Additional algorithm-specific parameters passed to the PyOD model.
- model_¶
Fitted PyOD detector instance.
- Type:
PyOD model
Examples
>>> from endgame.anomaly import PyODDetector, PYOD_ALGORITHMS >>> >>> # List available algorithms >>> print(list(PYOD_ALGORITHMS.keys())) >>> >>> # Fast parameter-free detection >>> detector = PyODDetector(algorithm='ECOD') >>> detector.fit(X_train) >>> scores = detector.decision_function(X_test) >>> >>> # KNN-based detection >>> detector = PyODDetector(algorithm='KNN', n_neighbors=15) >>> detector.fit(X_train) >>> labels = detector.predict(X_test) >>> >>> # Deep learning detector >>> detector = PyODDetector( ... algorithm='AutoEncoder', ... hidden_neurons=[128, 64, 64, 128], ... epochs=50 ... ) >>> detector.fit(X_train)
- fit(X, y=None)[source]¶
Fit the PyOD detector on training data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training data.
y (ignored) – Not used, present for API consistency.
- Return type:
- Returns:
self (PyODDetector) – Fitted detector.
- decision_function(X)[source]¶
Compute anomaly scores for samples.
Higher scores indicate more anomalous samples.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to score.
- Return type:
- Returns:
scores (ndarray of shape (n_samples,)) – Anomaly scores. Higher = more anomalous.
- predict(X)[source]¶
Predict anomaly labels.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to classify.
- Return type:
- Returns:
labels (ndarray of shape (n_samples,)) – 1 for anomalies, 0 for normal samples.
- fit_predict(X, y=None)[source]¶
Fit and predict anomaly labels.
- Return type:
- Parameters:
X (ArrayLike)
- predict_proba(X)[source]¶
Predict anomaly probabilities.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to classify.
- Return type:
- Returns:
proba (ndarray of shape (n_samples, 2)) – Probabilities for [normal, anomaly] classes.
- predict_confidence(X)[source]¶
Return prediction confidence scores.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to score.
- Return type:
- Returns:
confidence (ndarray of shape (n_samples,)) – Confidence scores (higher = more confident prediction).
- endgame.anomaly.create_detector_ensemble(algorithms=None, contamination=0.1, random_state=None)[source]¶
Create an ensemble of diverse PyOD detectors.
- Parameters:
algorithms (list of str or None, default=None) – Algorithms to include. None uses a default diverse set: [‘ECOD’, ‘COPOD’, ‘IForest’, ‘LOF’, ‘KNN’, ‘HBOS’]
contamination (float, default=0.1) – Expected proportion of anomalies.
random_state (int or None, default=None) – Random seed for reproducibility.
- Return type:
- Returns:
detectors (list of PyODDetector) – List of configured detectors ready for fitting.
Examples
>>> from endgame.anomaly import create_detector_ensemble >>> detectors = create_detector_ensemble(contamination=0.05) >>> for det in detectors: ... det.fit(X_train) >>> # Combine scores >>> scores = np.mean([d.decision_function(X_test) for d in detectors], axis=0)