Constructing a Psychometric Testbed for Fair Natural Language Processing

Ahmed Abbasi*, David Dobolyi, John P. Lalor, Richard Netemeyer, Kendall Smith, Yi Yang

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

12 Citations (Scopus)

Abstract

Psychometric measures of ability, attitudes, perceptions, and beliefs are crucial for understanding user behavior in various contexts including health, security, e-commerce, and finance. Traditionally, psychometric dimensions have been measured and collected using survey-based methods. Inferring such constructs from user-generated text could allow timely, unobtrusive collection and analysis. In this work we construct a corpus for psychometric natural language processing (NLP) related to important dimensions such as trust, anxiety, numeracy, and literacy, in the health domain. We discuss our multi-step process to align user text with their survey-based response items and provide an overview of the resulting testbed, which encompasses survey-based psychometric measures and accompanying user-generated text from 8,502 respondents. Our testbed also encompasses self-reported demographic information, including race, sex, age, income, and education, allowing for measuring bias and benchmarking fairness of text classification methods. We report preliminary results on use of the text to predict/categorize users' survey response labels and on the fairness of these models. We also discuss the important implications of our work and resulting testbed for future NLP research on psychometrics and fairness.

Original languageEnglish
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages3748-3758
Number of pages11
ISBN (Electronic)9781955917094
DOIs
Publication statusPublished - 2021
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Hybrid, Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/TerritoryDominican Republic
CityHybrid, Punta Cana
Period7/11/2111/11/21

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

Fingerprint

Dive into the research topics of 'Constructing a Psychometric Testbed for Fair Natural Language Processing'. Together they form a unique fingerprint.

Cite this