Abstract
User-generated social media data is constantly changing as new trends influence online discussion and personal information is deleted due to privacy concerns. However, traditional NLP models rely on fixed training datasets, which means they are unable to adapt to temporal change'both test distribution shift and deleted training data'without frequent, costly re-training. In this paper, we study temporal adaptation through the task of longitudinal hashtag prediction and propose a nonparametric dense retrieval technique, which does not require re-training, as a simple but effective solution. In experiments on a newly collected, publicly available, year-long Twitter dataset exhibiting temporal distribution shift, our method improves by 64% over the best static parametric baseline while avoiding costly gradient-based re-training. Our approach is also particularly well-suited to dynamically deleted user data in line with data privacy laws, with negligible computational cost/performance loss.
| Original language | English |
|---|---|
| Title of host publication | EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings |
| Editors | Houda Bouamor, Juan Pino, Kalika Bali |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 7302-7311 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798891760608 |
| DOIs | |
| Publication status | Published - 2023 |
| Externally published | Yes |
| Event | 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapore, Singapore Duration: 6 Dec 2023 → 10 Dec 2023 |
Publication series
| Name | EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings |
|---|
Conference
| Conference | 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 |
|---|---|
| Country/Territory | Singapore |
| City | Hybrid, Singapore |
| Period | 6/12/23 → 10/12/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.