Skip to main navigation Skip to search Skip to main content

Learning in POMDPs is Sample-Efficient with Hindsight Observability

  • Jonathan N. Lee*
  • , Alekh Agarwal
  • , Christoph Dann
  • , Tong Zhang
  • *Corresponding author for this work

Research output: Contribution to journalConference article published in journalpeer-review

Abstract

POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. However, in many realistic problems, more information is either revealed or can be computed during some point of the learning process. Motivated by diverse applications ranging from robotics to data center scheduling, we formulate a Hindsight Observable Markov Decision Process (HOMDP) as a POMDP where the latent states are revealed to the learner in hindsight and only during training. We introduce new algorithms for the tabular and function approximation settings that are provably sample-efficient with hindsight observability, even in POMDPs that would otherwise be statistically intractable. We give a lower bound showing that the tabular algorithm is optimal in its dependence on latent state and observation cardinalities.

Original languageEnglish
Pages (from-to)18714-18732
Number of pages19
JournalProceedings of Machine Learning Research
Volume202
Publication statusPublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: 23 Jul 202329 Jul 2023

Bibliographical note

Publisher Copyright:
© 2023 Proceedings of Machine Learning Research. All rights reserved.

Fingerprint

Dive into the research topics of 'Learning in POMDPs is Sample-Efficient with Hindsight Observability'. Together they form a unique fingerprint.

Cite this