Abstract
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains understudied. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit setting, where a tuning parameter determines the weight placed on cumulative regret minimization (where we establish near-optimal minimax guarantees) versus simple regret minimization (where we establish state-of-the-art guarantees). Our algorithms work with any function class, are robust to model misspecification, and can be used in continuous arm settings. This flexibility comes from constructing and relying on “conformal arm sets" (CASs). CASs provide a set of arms for every context, encompassing the context-specific optimal arm with a certain probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted with a negative result, which shows that no algorithm can achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.
| Original language | English |
|---|---|
| Title of host publication | Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023 |
| Editors | A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine |
| Publisher | Neural information processing systems foundation |
| ISBN (Electronic) | 9781713899921 |
| Publication status | Published - 2023 |
| Event | 37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States Duration: 10 Dec 2023 → 16 Dec 2023 |
Publication series
| Name | Advances in Neural Information Processing Systems |
|---|---|
| Volume | 36 |
| ISSN (Print) | 1049-5258 |
Conference
| Conference | 37th Conference on Neural Information Processing Systems, NeurIPS 2023 |
|---|---|
| Country/Territory | United States |
| City | New Orleans |
| Period | 10/12/23 → 16/12/23 |
Bibliographical note
Publisher Copyright:© 2023 Neural information processing systems foundation. All rights reserved.