Optimal control of ergodic continuous-time Markov chains with average sample-path rewards

Xianping Guo*, Xi Ren Cao

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

22 Citations (Scopus)

Abstract

In this paper we study continuous-time Markov decision processes with the average sample-path reward (ASPR) criterion and possibly unbounded transition and reward rates. We propose conditions on the system's primitive data for the existence of e-ASPR-optimal (deterministic) stationary policies in a class of randomized Markov policies satisfying some additional continuity assumptions. The proof of this fact is based on the time discretization technique, the martingale stability theory, and the concept of potential. We also provide both policy and value iteration algorithms for computing, or at least approximating, the e-ASPR-optimal stationary policies. We illustrate with examples our main results as well as the difference between the ASPR and the average expected reward criteria.

Original languageEnglish
Pages (from-to)29-48
Number of pages20
JournalSIAM Journal on Control and Optimization
Volume44
Issue number1
DOIs
Publication statusPublished - 2006

Keywords

  • Average sample-path reward
  • Continuous-time Markov chain
  • Optimal stationary policy
  • Policy and value iteration algorithms

Fingerprint

Dive into the research topics of 'Optimal control of ergodic continuous-time Markov chains with average sample-path rewards'. Together they form a unique fingerprint.

Cite this