Abstract
This work examines the robustness of self-attentive neural networks against adversarial input perturbations. Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks. We also propose a novel attack algorithm for generating more natural adversarial examples that could mislead neural models but not humans. Experimental results show that, compared to recurrent neural models, self-attentive models are more robust against adversarial perturbation. In addition, we provide theoretical explanations for their superior robustness to support our claims.
| Original language | English |
|---|---|
| Title of host publication | ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 1520-1529 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781950737482 |
| DOIs | |
| Publication status | Published - 2020 |
| Externally published | Yes |
| Event | 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Florence, Italy Duration: 28 Jul 2019 → 2 Aug 2019 |
Publication series
| Name | ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference |
|---|
Conference
| Conference | 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 |
|---|---|
| Country/Territory | Italy |
| City | Florence |
| Period | 28/07/19 → 2/08/19 |
Bibliographical note
Publisher Copyright:© 2019 Association for Computational Linguistics