Metrics
Distance and correspondence metrics for case-based explanations.
Functions
compute_correspondence
- case_explainer.metrics.compute_correspondence(neighbors, predicted_class, distance_weighted=True, class_weights=None)[source]
Quantify agreement between prediction and retrieved neighbors.
Based on refined Method 2 formula from hardware trojan detection pipeline: weight(class_c) = sum_{i in neighbors with class c} class_weight_c / (distance_i + 1)^3
Correspondence = weight(predicted_class) / sum(weight(all_classes))
- Parameters:
neighbors (
List[Tuple[int,float,int]]) – List of tuples (index, distance, label) for k nearest neighborspredicted_class (
int) – The predicted class labeldistance_weighted (
bool) – Whether to weight by inverse cubed distance (default: True)class_weights (
Optional[Dict[int,float]]) – Optional weights for each class, e.g., {0: 1.0, 1: 2.0} for imbalanced datasets (default: all weights = 1.0)
- Returns:
float in [0, 1] interpretation: “high” (>0.85), “medium” (0.70-0.85), “low” (<0.70)
- Return type:
correspondence
euclidean_distance
Theory
Correspondence Metric
The correspondence metric quantifies the agreement between a prediction and retrieved nearest neighbors. It uses inverse distance weighting to give more importance to closer neighbors.
Mathematical Definition
For a test sample with predicted class c_pred, given k nearest neighbors:
where \(d_i\) is the distance to neighbor i.
The weight for each class c is:
The correspondence score is:
Interpretation
1.0 (100%): All neighbors have the same class as the prediction (complete neighbor agreement)
0.5 (50%): Neighbors are equally split between classes (no agreement)
0.0 (0%): All neighbors have different classes than the prediction (complete disagreement)
Important: Correspondence measures neighbor agreement, not prediction correctness. High correspondence can occur with incorrect predictions if the training data contains systematic errors or learned incorrect patterns.
Example Interpretation Thresholds (domain-dependent, not universal standards):
High (≥0.85): Strong agreement with retrieved neighbors
Medium (0.70-0.85): Moderate agreement with retrieved neighbors
Low (<0.70): Weak agreement, potentially unusual sample or inconsistent neighbors
Note: Appropriate thresholds should be determined empirically for each specific domain and use case.
Distance Weighting
The cubed inverse distance weighting scheme has several desirable properties:
Emphasizes closest neighbors: Closest neighbors have exponentially more influence
Stable at distance=0: The +1 term prevents division by zero
Smooth falloff: Neighbors further away contribute less but not zero
Alternative weighting schemes can be implemented by modifying the weight function.
Example Usage
Computing Correspondence
from case_explainer.metrics import compute_correspondence
import numpy as np
# Example: 5 neighbors with their distances and labels
neighbor_distances = np.array([0.1, 0.2, 0.3, 0.5, 0.8])
neighbor_labels = np.array([1, 1, 1, 0, 1])
predicted_class = 1
# Compute correspondence
correspondence = compute_correspondence(
neighbor_distances=neighbor_distances,
neighbor_labels=neighbor_labels,
predicted_class=predicted_class
)
print(f"Correspondence: {correspondence:.2%}")
# Output: Correspondence: 94.5%
Distance Computation
from case_explainer.metrics import euclidean_distance
import numpy as np
sample1 = np.array([1.0, 2.0, 3.0])
sample2 = np.array([1.5, 2.5, 3.5])
distance = euclidean_distance(sample1, sample2)
print(f"Distance: {distance:.3f}")
# Output: Distance: 0.866
Understanding Correspondence Behavior
import numpy as np
from case_explainer.metrics import compute_correspondence
# Complete agreement: all neighbors same class
distances = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
labels = np.array([1, 1, 1, 1, 1])
corr = compute_correspondence(distances, labels, predicted_class=1)
print(f"Complete neighbor agreement: {corr:.2%}") # 100%
# Complete disagreement: all neighbors different class
labels = np.array([0, 0, 0, 0, 0])
corr = compute_correspondence(distances, labels, predicted_class=1)
print(f"Complete disagreement: {corr:.2%}") # 0%
# Mixed: 3 same class, 2 different
labels = np.array([1, 1, 1, 0, 0])
corr = compute_correspondence(distances, labels, predicted_class=1)
print(f"Mixed (3:2): {corr:.2%}") # ~80-90% depending on distances
# Effect of distance: closer neighbors matter more
# Close neighbors same class
distances_close = np.array([0.1, 0.2, 0.3, 1.0, 1.5])
labels_mixed = np.array([1, 1, 1, 0, 0])
corr_close = compute_correspondence(distances_close, labels_mixed, predicted_class=1)
# Far neighbors same class
distances_far = np.array([1.0, 1.5, 2.0, 0.1, 0.2])
labels_mixed = np.array([1, 1, 1, 0, 0])
corr_far = compute_correspondence(distances_far, labels_mixed, predicted_class=1)
print(f"Close neighbors same class: {corr_close:.2%}") # Higher
print(f"Far neighbors same class: {corr_far:.2%}") # Lower
Class Weights
For imbalanced datasets, you can adjust correspondence using class weights:
# Without class weights
distances = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
labels = np.array([1, 1, 0, 0, 0]) # Minority class 1
corr = compute_correspondence(distances, labels, predicted_class=1)
print(f"No weights: {corr:.2%}")
# With class weights (weight minority class more)
corr_weighted = compute_correspondence(
distances, labels, predicted_class=1,
class_weights={0: 1.0, 1: 3.0}
)
print(f"With weights: {corr_weighted:.2%}") # Higher
Notes
Performance
Correspondence computation is O(k) where k is the number of neighbors
Distance computation is O(d) where d is the number of features
Both operations are highly vectorized using NumPy for efficiency
Numerical Stability
The +1 term in the weight function prevents division by zero
All distance computations use double precision floats
Correspondence is always in the range [0, 1]
See Also
CaseExplainer: Uses these metrics for generating explanationsExplanation: Contains computed correspondence scores