Explainable AI Architectures:

Methods, Applications, Examples, and Results

Paul Whitten

2025-03-27

Outline

Introduction
Problem
Contributions
Background and Related Work
Property-Based Explainable (PBE) Method

PBE Handwritten Character Results
PBE Hardware Trojan Results

Case-Based Explainable (CBE) Method

CBE Handwritten Character Results
CBE Hardware Trojan Results

Conclusion
Future Work

Introduction

Artificial Intelligence (AI) and Machine Learning (ML) used widely
Applications: Business, Medicine, Transportation
Lack of trust in AI - models are often an opaque box
Many AI systems cannot effectively explain or justify decisions
Explainable AI (XAI)

Problem

Neural Networks cannot explain their inferences
Therefore, there is a lack of trust in these systems
XAI systems are an attempt to explain the inference by a trained NN and increase trust in ML systems
An explanation should be in plain terms

Comparison

Contributions

PBE Method and Architecture
CBE Method and Architecture
Explainability metric - $Ex(c)$
Combining explainability with unexplainability
Metric for Effectiveness of a model - $E(j, c)$
A new metric for model performance - $E_{PARS}$
Communicating when a model can fail - FDR
Correspondence metric - $Corr$

The diagram on the right shows a concept map of this work.

Listed on the left are some contributions

Two methods and explainable architectures are presented. Property and case based.

Each method has strengths with type of data and in applying the methods

Metrics for explainability of property based

Combining explainable components with unexplainable to improve performance

Metric to gauge the effectiveness of a model, j, for a particular class, c

The EPARS metric for such effectiveness that is resilient to data imbalance

Using data in the knowledgbase from training and testing such as the confusion matrix to show when a system may fail using the false discovery rate.

A correspondence metric used in the CBE to show when the training cases agree.

Publications

Four IEEE Conference papers and presentations

2021 - Explainable Artificial Intelligence Methodology for Handwritten Applications
2023 - Explainable Neural Network Recognition of Handwritten Characters
2024 - An AI Architecture with the Capability to Explain Recognition Results
2024 - An AI Architecture with the Capability to Classify and Explain Hardware Trojans
Working papers for journal submission

Background and Related Work

AI Taxonomy - Capability and Functionality

This diagram presents a taxonomy of AI based on capability and function

Capabilities are weak, strong, and super AI

Weak can perform purpose built tasks

The areas under weak that are shaded are the type of AI covered in this research

strong also known as general General can perform at or near human levels

Super can perform above human levels of cognition

Among the weak or also known as Narrow AI there is additional functionality based on reactive or limited memory functionality

Reactive is least can perform singular tasks state is immediate no data saved - chess playing, classification, recognition

Limited memory has more than immediate state save data from past - LLM, Generative AI, Autonomous vehicle

Theory of mind is considering relation to others and how others feel

Self aware elements of consciousness and awareness of its own existence.

AI Taxonomy - Algorithms and Architectures

Multi-layer Feed Forward Neural Network

Inference but no explanation

1995 - LeNet-5 CNN Neural Network

Major improvement in inference performance but still no explanation

2015 - ResNet

Outstanding performance but cannot explain results

XAI Research

1999 - Case-Based Explanation of Non-Case-Based Learning Methods - Caruana et al.
2018 - Explainable neural networks based on additive index models - Vaughan et al.

XAI Research - LIME

2016 - Why should I Trust you - Marco Tulio Ribeiro et al. - LIME

LIME superpixel mask for classification as a Bernese mountain dog

LIME on Handwritten Digits

LIME is an important work for local interpretable model agnostic explanations.

In LIME an image is divided into superpixels and for an input sample and prediction, the data is perturbed to ascertain the superpixels important for the classification. They change superpixels that alter the prediction.

Example of the important superpixels for detecting an image as a Bernese Mountain dog

Examples of LIME on handwritten digits. These are from the first 20 samples in the MNIST test set. We have 12 image pairs that show the original image and then the important superpixels highlighted in pink

In some cases the entire image was important. Note the inconsistency in what is important for the same digit class. In some like digits different regions were important.

These are not the types of explanations that would build trust.

XAI Research - Continued

2017 - A Unified Approach to Interpreting Model Predictions Lundberg et al. - SHAP

Confusion Matrix - One Versus Others

		Predicted Classes
		a	b	c	d
Actual Classes	a	TN	FP	TN	TN
	b	FN	TP	FN	FN
	c	TN	FP	TN	TN
	d	TN	FP	TN	TN

Legend
True Positives (TP)
True Negatives (TN)
False Positives (FP)
False Negatives (FN)

Confusion matrix is a way of capturing performance of a prediction it captures where a predictor gets confused as well as where it succeeds

By performance I refer to how well it predicts

The confusion matrix here has four classes a through d

Counts are typically stored in the cells

This particular confusion matrix is focused on class b and using one versus others approach

True Positives are a correct result in predicting the b

True Negatives are a correct result of predicting something other than b

False Positives are an incorrect result of mistakenly predicting other classes as a b

False Negatives are an incorrect result of predicting an actual b as something else

Confusion matrix is often used in metrics to gauge performance of ML models. We also rely heavily on the confusion matrix in PBE

Performance Metrics

Accuracy = $\frac{TP+TN}{TP+TN+FP+FN}$

Precision = $\frac{TP}{TP+FP}$

Recall = $\frac{TP}{TP+FN}$

Specificity = $\frac{TN}{TN+FP}$

False Discovery Rate (FDR) = $\frac{FP}{FP+TP}$

Imbalance Ratio (IR)

\[ IR = \frac{N_{maj}}{N_{min}} \]

If IR > 1 the data set is imbalanced

F1-Score

Cohen's Kappa

Matthew's Correlation Coefficient

This slide depicts the metrics we used to gauge performance

The metrics on the left are sensitive to imbalanced data

False Discovery Rate or FDR give us a way of characterizing the predictions we get wrong and is useful in telling how a model fails in its predictions.

Imbalance ratio tells us how well the data is balanced. It is the cardinality of majority class over cardinality of minority class

If the imbalance ratio is greater than 1, the data is said to be imbalanced and there are metrics that are resilient to data imbalance

The metrics on the right are resilient to imbalanced data

Accuracy represents correct results over all samples

Precision tells us of all the predictions for a class, how many were correct.

Recall aka sensitivity tells us how many items of an actual class were correctly identified

Specificity indicates how well correct negative values are identified from all negative values.

False Discovery Rate or FDR give us a way of characterizing the predictions we get wrong and is useful in telling how we have been shown to fail.

AUC - receiver operating characteristic curve. ROC is plot of TPR (recall) against FPR

F-1 score is the harmonic mean of recall and precision. Harmonic mean reciprocal of arithmetic mean

Cohen's Kappa was first used to characterize agreement between observers of a psychological experiment.

Cohen's Kappa = (probability of agreement - probability of random agreement) / (1 - probability of random agreement)

MCC is Pearson's phi coefficient

Property-Based Explainable (PBE) Method

PBE Method

Intent: Produce a system that can explain decisions to a user in plain terms by reasoning about the system's decisions in relation to explainable properties.

Explainable Property: An attribute of an input sample that may differentiate between classes and provide rationale for a classification decision to a user.

Property Transform: a property transformation is a function to modify an input sample to bring out an explainable property in the resulting output. A property transformation aims to highlight or exemplify explainable properties in the input.

PBE Method - Steps

PBE Architecture - Goal of Method

PBE Properties and Transforms - MNIST

Property	Transform	Image	Trans.
Stroke	Skeleton
Circle	Hough Circle
Circle	Hough Ellipse
Circle	Multiple Circle and Ellipse
Crossings	Intersection

Property	Transform	Image	Trans.
Endpoints	Endpoints
Enclosed Region	Flood Fill
Enclosed Region	Convex Hull
Line	Hough Line
Corner	Harris Corner

This slide details MNIST transform examples

Table divided into two columns for better layout in slide.

first column is the property name, second is transform. Last two columns are the original input and finally th transformed input

Several of the transforms are simple digital image processing techniques

Stroke property, path followed by the writing implement, achieved by taking the morphological skeleton, one pixel wide connected representation

A few transforms related to circles due to matching some digits - small circles like 6 and 9 , multiple circles like 8 - ellipse for zero

Crossings/intersection which uses the skeleton and takes any points with more than 2 neighbors

Endpoints which is also based on the skeleton and takes points with 1 neighbor.

Enclosed regions which use 2 transforms flood fill and convex hull

Lines in images via the hough line

Inflection points or corners with the harris corner detection algorithm

Transform Training Data

Train ML Models

Build Knowledgebase

Voting Scheme

Voting: Selecting among potentially conflicting opinions from inference engines.

Effectiveness: Characterizes how well an inference engine performs. The effectiveness of an inference engine, $j$, to correctly recognize an item of class $c$ is expressed as $E(j,c)$.

Voting Scheme - Continued

Weighted Effectiveness, $WE(c)$ for a class $c$ is the sum of effectiveness for all IEs, $j$, that voted for $c$

\[ WE(c)=\sum_j E(j, c) \]

Confidence, $Conf(c)$, for a class $c$ is the Weighted Effectiveness of $c$ over the sum of Weighted Effectiveness of all classes that were voted upon

\[ Conf(c)=\frac{WE(c)}{\sum\limits_kWE(k)} \]

Class $c$ with highest confidence wins

Explainability

Some properties and transforms have lacking performance
Addition of unexplainable inference to improve performance
Need a means of quantifying explainability, $Ex(c)$, for a class $c$
Each property transform, $j$, has a an explainbility metric $0 \le X_j \le 1$

\[ Ex(c)=\frac{\sum{E(j,c)X_j}}{\sum{E(j,c)}} \]

Explanation Routine - XAI Block

Assemble the textual rationale composed of
The winning vote with confidence and explainability
Present alternatives voted for with confidence and explainability
Common failures based on historical FDR

Property-Based Explainability Results

Handwritten Character Datasets

Widely used for BENCHMARKING ML architectures

MNIST - 70,000 decimal digit images
EMNIST - Over 800,000 digits, uppercase and lowercase characters
Dataset balanced among classes
Images, therefore high dimensionality or many features

Property Based Architecture
MNIST Aggregate Results

Accuracy

ML Model Type	1 Unexpl.	10 Expl.	10 Expl. 1 Unexpl.
	Architecture
MLP	98.3	96.2	97.9
SVM	97.9	95.4	97.3
CNN	99.4	97.3	98.7
Resnet50	98.9	97.6	98.8

Average Explainability

ML Model Type	1 Unexpl.	10 Expl.	10 Expl. 1 Unexpl.
	Architecture
MLP	0.0	100	67.2
SVM	0.0	100	76.8
CNN	0.0	100	75.5
Resnet50	0.0	100	69.9

Unexplainable (Unexp.)

Explainable (Expl.)

Explainable + Unexplainable

These tables show the aggregate performance of PBE on MNIST

Left shows accuracy of different architectures in columns

The architectures are depicted along the bottom in diagrams

The first architecture is 1 ML model that is not explainable as a benchmark

The second architecture shows 10 fully explainable flows

The third has 10 explainable and 1 unexplainable flow

In terms of accuracy, the explainable system performs the worst, a couple percentage points below benchmark

Adding an unexplainable component we see gets us back near performance of the benchmark

The table on the right shows average explainability for the winning class

Benchmark is not explainable.

Fully explainable system is 100%

Because the mix of explainable and unexplainable, explanability is diminshed.

MNIST Explainable Results: Digit

			Effectiveness			Explainability
F_j	Property	Vote	$E(j,0)$	$E(j,4)$	$E(j,9)$	$Ex(0)$	$Ex(4)$	$Ex(9)$
F₁	Stroke	4		1.0			1.0
F₂	Circle	0	0.039			1.0
F₃	Crossing	0	0.018			1.0
F₄	Ellipse	0	0.004			1.0
F₅	Ell-Cir	0	0.069			1.0
F₆	Endpoint	4		0.974			1.0
F₇	Enc. Reg.	0	0.021			1.0
F₈	Line	9			0.496			1.0
F₉	Con. Hull	4		0.826			1.0
F₁₀	Corner	4		0.538			1.0
F₁₁	Unexp.	4		1.0			0.0
$WE(c)$ / $\sum{E(j,c)X_j}$			0.151	4.337	0.496	0.151	3.337	0.496
Confidence/Expl			3.03%	87.0%	9.96%	100.0%	76.9%	100%

This table is an explainability matrix for a sample, the digit four a the top.

We have columns indicating the flow, property transform and the vote of that property transform.

There are votes for the digit four, zero and nine

Next we show effectiveness. There will be three columns for effectiveness because of the three votes.

Again effectiveness is the ability for the flow that voted to predict a particular class. It is a metric in the knowledgebase

Effectiveness of the votes for the digit four were relatively high compared to the low effectiveness for the zero

Effectiveness for the nine vote was medium

Effectiveness per class or column is summed to get a Weighted effectiveness

Confidence is the weighted effectiveness of a class over weighted effectiveness of all classes voted for

We see the four has the highest confidence and wins with 87% confidence

Explainability is shown in the next three columns

The explainability metric for each flow X_j is shown in the cells for the class the flow voted for

We then calculate the effectiveness weighted effectiveness from taking the product of effectiveness and explainability

The explainability of the answer for four is diminished because the unexplainable flow voted for it

PBE - Response for Digit

Confidence is high, 87%, for interpreting this character as a four due to the stroke, endpoint, convex hull, and corner properties. Explainability was 76.9%. The FDR shows when selecting a four, 1.9% of the time we are incorrect. The most frequent mistakes are that the digit is a nine 0.9% and a seven 0.3% of the cases.

PBE - Alternatives for Digit

Confidence is low, 9.96%, for interpreting this character as a nine due to the line property. Explainability was 100%. The FDR shows when selecting a nine, 2.6% of the time we are incorrect. The most frequent mistakes are that the digit is a four 1.4% of the time and an eight 0.5% of the cases.
Confidence is low, 3.03% for interpreting this character as a zero due to the ellipse-circle, circle, fill, crossing, and ellipse properties. Explainability was 100%. he FDR shows when selecting a zero, 1.4% of the time we are incorrect. The most frequent mistake is that the digit is an eight 0.6% of the time.

EMNIST Aggregate Results

Unexplainable Benchmark

Explainable

Explainable + Unexplainable

EMNIST Explainable Results: Character

			Effectiveness				Explainability
F_j	Property	Vote	$E(j,C)$	$E(j,T)$	$E(j,U)$	$E(j,X)$	$Ex(C)$	$Ex(T)$	$Ex(U)$	$Ex(X)$
F₁	Stroke	C	0.964				1.0
F₂	Circle	C	0.114				1.0
F₃	Crossing	C	0.056				1.0
F₄	Ellipse	T		0.009				1.0
F₅	Ell-Cir	C	0.131				1.0
F₆	Endpoint	C	0.574				1.0
F₇	Enc. Reg.	X				0.005				1.0
F₈	Line	U			0.244				1.0
F₉	Con. Hull	C	0.603				1.0
F₁₀	Corner	C	0.369				1.0
F₁₁	Unexp.	C	0.989				0.0
$WE(c)$ / $\sum{E(j,c)X_j}$			3.801	0.009	0.244	0.005	2.812	0.009	0.244	0.005
Confidence/Expl			73.6%	0.02%	6.02%	0.01%	74.0%	100%	100%	100%

Metrics and PBE

$E_{PARS}$ as Effectiveness

\[ E_{PARS} = P \cdot ACC \cdot R \cdot S \\ \]

$\frac{TN {\cdot} TP^3+TN^2 {\cdot} TP^2}{(TN{+}FP)(TP{+}FP)(TP{+}FN)(TP{+}TN{+}FP{+}FN)}$

Performance of Metrics as Effectiveness on Handwriting

Hardware Trojans

Rare Event Hardware Trojan

Problem

Static trojan detection using netlist features

LGFi - Logic gate fanin
FFi - Flip-flop input
FFo - Flip-flop output
PI - Primary input
PO - Primary output

Highly imbalanced dataset
ML trained to make decisions
Trust in the decisions is lacking - Need Explanations

Hardware Trojan Results

Hardware Trojan - Data Processing

Dataset Characterization

15 Trust-hub netlists - 52k entries
Five Features
Two Classes: Trojan and Non-Trojan
Highly Imbalanced data
Trojan: Non-Trojan - 1:250
Many Duplicates

Training and Test

80% used for training
20% used for test

PBE Architecture - Trojans

Property = grouping of features

PBE Example

Sample

Output

Properties

Case-Based Explainable (CBE) Method

CBE Method

Intent: Explain decisions by providing evidence about similar training cases.

Inspiration: Work by Caruana et al. Case-based explanation of non-case-based learning methods.

Consider training samples as cases precedent. Similar training cases should support a decision.

CBE is not explaining the model behavior, but what was used to train the model that is similar to an input.

CBE Method - Steps

CBE - Steps Detail

Train ML Model

Training Index

Query Scheme

Explanation Routine

Weight of Neighbors and Correspondence

$WN(c) = \sum_{i=1}^{c_i \in k} \frac{bf(c)}{(d_i+1.0)^2}$

$Corr(c) = \frac{WN(c)}{\sum_{j=1}^{c_j \in k}{WN(c_j)}}$

CBE Architecture - Handwriting Results

Aggregate - Correspondence - 97.7%

CBE Architecture - Results for Digit

SVM Prediction: four

Correspondence = 92.3%

Alternatives: nine with 7.7% correspondence

CBE Architecture - Hardware Trojan Results

Aggregate - Correspondence - 97.4%

CBE Example

Sample

Output

SVM - Trojan

Conclusions

Conclusions - Method Strengths

PBE worked well on explainaing handwritten digits
PBE is well suited for high dimensional datasets
CBE worked well on explaining both handwriting and HW trojans
CBE accuracy could be among best

Conclusions - Weaknesses

Explainable properties are difficult to elicit from low dimensional space
Marginal explainability in PBE architecture with trojans
Much more involved to implement the PBE method
Takes longer for CBE to execute due to searcing for neighbor cases

Conclusion - Continued

We successfully addressed these questions with evidence in the links above.

Final Conclusions

CBE outperformed PBE - better accuracy and explanations
Examples of how to answer all of the importent user questions
Research in four published papers with contributions

Two explainable methods
Effectiveness and new E_PARS metric
Confidence metric from PBE decisions
Quantifying exlainability with mix of explainability and unexplainability
Correspondence between neighbors in CBE
When the system can fail with FDR

Future Work

More applications for the methods
Generalizing the property based method
Scaling the case-based method to larger datasets
Expanding the explainable interface for user questions/interrogation.
Large Language Models on the knowledgebase or training index

Live Examples of Explainable MNIST Recognition

Property-Based Example
Case-Based Example

Explainable AI Architectures:

Methods, Applications, Examples, and Results

Outline

Introduction

Problem

Comparison

Contributions

Publications

Background and Related Work

AI Taxonomy - Capability and Functionality

AI Taxonomy - Algorithms and Architectures

Multi-layer Feed Forward Neural Network

1995 - LeNet-5 CNN Neural Network

2015 - ResNet

XAI Research

XAI Research - LIME

XAI Research - Continued

Confusion Matrix - One Versus Others

Performance Metrics

Property-Based Explainable (PBE) Method

PBE Method

PBE Method - Steps

PBE Architecture - Goal of Method

PBE Properties and Transforms - MNIST

Transform Training Data

Train ML Models

Build Knowledgebase

Voting Scheme

Voting Scheme - Continued

Class $c$ with highest confidence wins

Explainability

Explanation Routine - XAI Block

Property-Based Explainability Results

Handwritten Character Datasets

Property Based ArchitectureMNIST Aggregate Results

Accuracy

Average Explainability

MNIST Explainable Results: Digit

PBE - Response for Digit

PBE - Alternatives for Digit

EMNIST Aggregate Results

Unexplainable Benchmark

Explainable

Explainable + Unexplainable

Explainable + Unexplainable

EMNIST Explainable Results: Character

Metrics and PBE

Hardware Trojans

Rare Event Hardware Trojan

Problem

Hardware Trojan Results

Hardware Trojan - Data Processing

PBE Architecture - Trojans

PBE Example

Properties

Case-Based Explainable (CBE) Method

CBE Method

CBE Method - Steps

CBE - Steps Detail

Explanation Routine

Weight of Neighbors and Correspondence

CBE Architecture - Handwriting Results

CBE Architecture - Results for Digit

CBE Architecture - Hardware Trojan Results

CBE Example

Conclusions

Conclusions - Method Strengths

Conclusions - Weaknesses

Conclusion - Continued

Important User Questions Driving XAI:

Final Conclusions

Future Work

Live Examples of Explainable MNIST Recognition

Property Based Architecture
MNIST Aggregate Results