The Multi-Armed Bandit (MAB) problem is a fundamental framework capturing the exploration vs. exploitation trade-off in sequential decision-making. Over the past decades, it has been rigorously studied and widely applied across diverse fields. This thesis explores algorithms for the MAB problem and its contextual version, incorporating variance information and privacy constraints. In the first part, we revisit the UCB-V algorithm, a variant of the canonical UCB that uses variance estimates. By providing an asymptotic characterization of arm-pulling rates, our results reveal a new phase transition phenomenon and instability. Additionally, we offer non-asymptotic bounds in high-probability regimes, leading to a refined regret bound, previously unknown even for more complex variance-aware algorithms. The second part focuses on bandits with contextual information, which often represents user-specific data. This motivates the study of effective learning while maintaining privacy. We introduce local differential privacy (LDP), a stringent privacy notion, to contextual bandits. We design LDP algorithms for stochastic generalized linear bandits to achieve nearly optimal regret bounds. Experiments demonstrate the consistently superior performance of our algorithms under LDP constraints.
| Date of Award | 2024 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Yang WANG (Supervisor) & Can YANG (Supervisor) |
|---|
Online decision making with variance information and privacy constriants
HAN, Y. (Author). 2024
Student thesis: Doctoral thesis