Toward Faithful Neural Network Intrinsic Interpretation With Shapley Additive Self-Attribution
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces the Shapley Additive Self-Attributing Neural Network (SASANet), a novel framework for self-interpreting neural networks. SASANet provides genuine interpretability and enhanced model expressiveness by integrating Shapley value attribution.
Area Of Science
- Artificial Intelligence
- Machine Learning
- Explainable AI (XAI)
Background
- Current self-interpreting neural networks often lack theoretical grounding for interpretability and limit model expressiveness.
- Existing methods struggle to provide genuine, theoretically sound explanations for model predictions.
Purpose Of The Study
- To propose a generic additive self-attribution (ASA) framework to unify existing approaches.
- To introduce a novel Shapley Additive Self-Attributing Neural Network (SASANet) that incorporates Shapley value attribution for enhanced interpretability.
- To address the limitations of current self-interpreting models by improving both interpretability and performance.
Main Methods
- Developed a novel Shapley Additive Self-Attributing Neural Network (SASANet).
- Designed an intermediate sequential schema utilizing marginal contributions (MCs) and an internal distillation procedure.
- Theoretically proved that the intermediate self-attribution values converge to the output's Shapley values.
Main Results
- SASANet achieves high interpretability and outperforms existing self-attributing models in performance.
- SASANet's performance is comparable to commonly used closed-box models.
- The self-attribution method in SASANet offers more accurate and efficient interpretations than post hoc methods.
Conclusions
- SASANet is the first self-interpreting neural network structure to achieve model-wise Shapley attribution.
- The proposed framework enhances both the interpretability and performance of neural networks.
- SASANet offers a theoretically sound and practically effective approach to explainable AI.
Related Concept Videos
Correspondent inference theory, proposed by Jones and Davis in 1965, seeks to explain how individuals infer stable personality traits from observed behaviors. It suggests that people attribute actions to underlying dispositions rather than external circumstances, particularly when the behavior appears intentional and socially significant.Voluntary Behavior and Dispositional AttributionAccording to this theory, individuals are more likely to attribute behavior to personal traits when it appears...
Attribution theory plays a crucial role in social psychology, helping to explain how individuals interpret the causes of behavior. One prominent model within this field is Harold Kelley's covariation theory, which provides a systematic approach to determining whether internal traits or external circumstances drive a person's actions. The model posits that individuals rely on three key types of information—consensus, consistency, and distinctiveness—to make these judgments.Consensus:...
According to some social psychologists, people tend to overemphasize internal factors as explanations—or attributions—for the behavior of other people. They tend to assume that the behavior of another person is a trait of that person, and to underestimate the power of the situation on the behavior of others. They tend to fail to recognize when the behavior of another is due to situational variables, and thus to the person’s state. This erroneous assumption is...
Behavior is a product of both the situation (e.g., cultural influences, social roles, and the presence of bystanders) and of the person (e.g., personality characteristics). Subfields of psychology tend to focus on one influence or behavior over others. Situationism is the view that our behavior and actions are determined by our immediate environment and surroundings. In contrast, dispositionism holds that our behavior is determined by internal factors (Heider, 1958).
Self-serving bias is a cognitive phenomenon in which individuals attribute positive outcomes to internal factors such as their abilities, intelligence, or effort while attributing negative outcomes to external circumstances. This cognitive distortion helps maintain self-esteem but can also impede objective self-assessment.Theoretical Explanations of Self-Serving BiasTwo primary theories explain the self-serving bias: the cognitive explanation and the motivational explanation.The cognitive...
Social psychologists have documented that feeling good about ourselves and maintaining positive self-esteem is a powerful motivator of human behavior (Tavris & Aronson, 2008). In the United States, members of the predominant culture typically think very highly of themselves and view themselves as good people who are above average on many desirable traits (Ehrlinger, Gilovich, & Ross, 2005). Often, our behavior, attitudes, and beliefs are affected when we experience a threat to our...

