Fine-grained Interpretation of Deep Neural Network Models of NLP

Sajjad, Hassan H. (PI)

Dalhousie University

Project: Research project

Description

Our ability to interpret Deep neural networks (DNNs) is an essential step towards explainable AI, to ensure fairness of models, and in achieving models that learn better generalization. In this proposal, we work towards increasing the transparency of DNN models by providing fine-grained interpretation of their representation. Fine-grained interpretation targets how certain knowledge is encoded in the network. It answers questions such as, what is the role of neurons in the network? What task and language knowledge is learned in neurons?, Knowing the role of neurons and their importance in the network paves the way towards transparent, fair, explainable and efficient models. In this proposal, we will make multiple contributions to fine-grained interpretation of DNNs. More specifically, we will propose datasets and algorithms for interpretation. We will work towards causal interpretation that explains the prediction of a model. Moreover, we will develop applications that are enabled by neuron analysis. The specific projects are summarized below: Concept Dataset: Current work in DNN interpretation probes whether human-defined linguistic concepts such as noun, verb, etc. are learned in a model. However, they do not analyze what other latent concepts a model has learned about the task and the language. This project aims at preparing a model-centric concept dataset. We will identify group of words in high dimensional space using their contextualized representation and will manually annotate them under certain criteria. The resulting dataset is a novel multi-facet data and will enable model-centered interpretation. Interpretation Algorithms: Neurons are multivariate in nature and models exhibit redundancy due to training and architectural choices. How to identify neurons that redundantly learn a concept? How to find a group of neurons that work together to represent a concept? Common approaches to neuron analysis largely ignore these questions or require careful evaluation to prove their efficacy. This project aims at improving methods of neuron analysis. We will explore various multivariate feature selection methods such as (sparse) group lasso to analyze neurons. Causal Interpretation: The importance of a concept for a class or for a specific prediction is essential for explainable AI. This project targets concept-based causal interpretation of prediction. We will use gradient and perturbation-based attribution algorithms to identify salient neurons for a prediction and will connect it with neuron analysis to generate concept-based explanations of predictions. Applications: In this project, we will explore applications of neuron analysis, more specifically controlling model's behavior and domain adaptation. The idea is to identify neurons with respect to concepts of interest e.g. gender or domain and develop methods to manipulate those neurons at test time in order to control the behavior of the model in a desirable way.

Status	Active
Effective start/end date	1/1/22 → …

Funding

Natural Sciences and Engineering Research Council of Canada: US$22,284.00

ASJC Scopus Subject Areas

Artificial Intelligence
Information Systems

Access Project

http://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=RGPIN-2022-03943

Fine-grained Interpretation of Deep Neural Network Models of NLP

Project Details

Description

Funding

ASJC Scopus Subject Areas

Access Project