Case-Explainer Documentation

Case-Explainer provides model-agnostic explanations through training set precedent and nearest neighbor correspondence.

Overview

While some explainability methods provide feature importance scores, case-based explainability answers: “Why was this prediction made?” by showing similar training examples.

Instead of: “Feature X has importance 0.45”

You get: “This sample is classified as X because it resembles these 5 training examples”

Key Features

Model-agnostic: Works with any classifier (sklearn, XGBoost, neural networks, etc.)
Correspondence metric: Quantifies agreement between prediction and neighbors
Multiple indexing strategies: K-D Tree, Ball Tree, or brute force
Automatic scaling: Optional feature standardization
Metadata tracking: Attach provenance data to training samples
Sklearn-compatible API: Familiar interface for ML practitioners
Batch explanations: Explain multiple predictions efficiently

Quick Start

from case_explainer import CaseExplainer
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Train classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Create explainer
explainer = CaseExplainer(
    X_train=X_train,
    y_train=y_train,
    feature_names=['sepal_len', 'sepal_width', 'petal_len', 'petal_width'],
    algorithm='kd_tree'
)

# Explain a prediction
explanation = explainer.explain_instance(X_test[0], k=5, model=clf)
print(f"Correspondence: {explanation.correspondence:.2%}")
print(explanation.summary())

Installation

# From source (current development version)
cd case-explainer
pip install -e .

# Dependencies
pip install numpy scipy scikit-learn matplotlib pandas

Performance

Validated across multiple domains (single runs on reference hardware):

Hardware Trojan Detection: 99.9% average correspondence, 25.7 ms/sample
Credit Card Fraud Detection: 100% average correspondence, 36.4 ms/sample
Medical Diagnosis (Breast Cancer): 93.3% average correspondence, 25.9 ms/sample
Scalability: Tested up to 200k training samples

Note on Correspondence: This metric measures agreement between predictions and retrieved neighbors, not prediction accuracy or quality. High correspondence indicates consistency with training data patterns.

Contents

API Reference

Additional Information