Case-Explainer Documentation
Case-Explainer provides model-agnostic explanations through training set precedent and nearest neighbor correspondence.
Overview
While some explainability methods provide feature importance scores, case-based explainability answers: “Why was this prediction made?” by showing similar training examples.
Instead of: “Feature X has importance 0.45”
You get: “This sample is classified as X because it resembles these 5 training examples”
Key Features
Model-agnostic: Works with any classifier (sklearn, XGBoost, neural networks, etc.)
Correspondence metric: Quantifies agreement between prediction and neighbors
Multiple indexing strategies: K-D Tree, Ball Tree, or brute force
Automatic scaling: Optional feature standardization
Metadata tracking: Attach provenance data to training samples
Sklearn-compatible API: Familiar interface for ML practitioners
Batch explanations: Explain multiple predictions efficiently
Quick Start
from case_explainer import CaseExplainer
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Train classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
# Create explainer
explainer = CaseExplainer(
X_train=X_train,
y_train=y_train,
feature_names=['sepal_len', 'sepal_width', 'petal_len', 'petal_width'],
algorithm='kd_tree'
)
# Explain a prediction
explanation = explainer.explain_instance(X_test[0], k=5, model=clf)
print(f"Correspondence: {explanation.correspondence:.2%}")
print(explanation.summary())
Installation
# From source (current development version)
cd case-explainer
pip install -e .
# Dependencies
pip install numpy scipy scikit-learn matplotlib pandas
Performance
Validated across multiple domains (single runs on reference hardware):
Hardware Trojan Detection: 99.9% average correspondence, 25.7 ms/sample
Credit Card Fraud Detection: 100% average correspondence, 36.4 ms/sample
Medical Diagnosis (Breast Cancer): 93.3% average correspondence, 25.9 ms/sample
Scalability: Tested up to 200k training samples
Note on Correspondence: This metric measures agreement between predictions and retrieved neighbors, not prediction accuracy or quality. High correspondence indicates consistency with training data patterns.
Contents
API Reference