24th EANN 2023, 14 - 17 June 2023, León, Spain

Towards Explaining Shortcut Learning Through Attention Visualization and Adversarial Attacks

Pedro Gonçalo Correia, Henrique Lopes Cardoso


  Since its introduction, the attention-based Transformer architecture has become the de facto standard for building models with state-of-the-art performance on many Natural Language Processing tasks. However, it seems that the success of these models might have to do with their exploitation of dataset artifacts, rendering them unable to generalize to other data and vulnerable to adversarial attacks. On the other hand, the attention mechanism present in all models based on the Transformer, such as BERT-based ones, has been seen by many as a potential way to explain these deep learning models: by visualizing attention weights, it might be possible to gain insights on the reasons behind these opaque models' decisions.  

*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.