Space-progressive value iteration: An anytime algorithm for a class of POMDPs

Nevin L. Zhang, Weihong Zhang

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

1 Citation (Scopus)

Abstract

Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper,wefirst study a somewhat restrictive class of specialPOMDPscalled almostdiscernible POMDPs and propose an anytime algorithm called space-progressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almostdiscernible POMDPs.We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.

Original languageEnglish
Title of host publicationSymbolic and Quantitative Approaches to Reasoning with Uncertainty - 6th European Conference, ECSQARU 2001, Proceedings
EditorsSalem Benferhat, Philippe Besnard
PublisherSpringer Verlag
Pages72-83
Number of pages12
ISBN (Electronic)3540424644, 9783540424642
DOIs
Publication statusPublished - 2001
Event6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ECSQARU 2001 - Toulouse, France
Duration: 19 Sept 200121 Sept 2001

Publication series

NameLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume2143
ISSN (Print)0302-9743

Conference

Conference6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ECSQARU 2001
Country/TerritoryFrance
CityToulouse
Period19/09/0121/09/01

Bibliographical note

Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2001.

Fingerprint

Dive into the research topics of 'Space-progressive value iteration: An anytime algorithm for a class of POMDPs'. Together they form a unique fingerprint.

Cite this