Evaluating Attribution Methods for Explainable NLP with Transformers

Vojtěch Bartička and Ondřej Pražák and Miloslav Konopík and Jakub Sido
TSD (2022)


This paper describes the experimental evaluation of several attribution methods on two NLP tasks: Sentiment analysis and multi-label document classification. Our motivation is to find the best method to use with Transformers to interpret model decisions. For this purpose, we introduce two new evaluation datasets. The first one is derived from Stanford Sentiment Treebank, where the sentiment of individual words is annotated along with the sentiment of the whole sentence. The second dataset comes from Czech Text Document Corpus, where we added keyword information assigned to each category. The keywords were manually assigned to each document and automatically propagated to categories via PMI. We evaluate each attribution method on several models of different sizes. The evaluation results are reasonably consistent across all models and both datasets. It indicates that both datasets with proposed evaluation metrics are suitable for interpretability evaluation. We show how the attribution methods behave concerning model size and task. We also consider practical applications – we show that while some methods perform well, they can be replaced with slightly worse-performing methods requiring significantly less time to compute.



@inproceedings{bartivcka2022evaluating, title={Evaluating Attribution Methods for Explainable NLP with Transformers}, author={Barti{\v{c}}ka, Vojt{\v{e}}ch and Pra{\v{z}}{\'a}k, Ond{\v{r}}ej and Konop{\'\i}k, Miloslav and Sido, Jakub}, booktitle={International Conference on Text, Speech, and Dialogue}, pages={3--15}, year={2022} }
Back to Top