Skip to main navigation Skip to search Skip to main content

Optimization of Average Rewards and Bias: Single Class

  • Xi Ren Cao*
  • *Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportBook Chapterpeer-review

Abstract

In this chapter, we study the optimization of the long-run average and bias of single class (or uni-chain) time-nonhomogeneous Markov chains. With confluencity, we define the most central notion in performance optimization, the performance potentials, and discuss its properties. With the performance potentials, we derive the difference formula for the average rewards of any two policies; and based on which we obtain the necessary and sufficient optimality conditions for average rewards. In addition, we study the bias optimality, which optimizes transient performance in the initial period. Bias potentials are defined, and bias optimality conditions are derived. The approach is called relative optimization since it is based on the performance difference formula that gives the difference of the performance measures of any two policies on the entire infinite horizon. The under-selectivity is reflected in the optimality conditions because from the difference formula, it is clear that the optimality conditions do not need to hold in any finite period, or in any “non-frequently” visited sequence of time instants.

Original languageEnglish
Title of host publicationSpringerBriefs in Control, Automation and Robotics
PublisherSpringer
Pages29-58
Number of pages30
DOIs
Publication statusPublished - 2021

Publication series

NameSpringerBriefs in Control, Automation and Robotics
ISSN (Print)2192-6786
ISSN (Electronic)2192-6794

Bibliographical note

Publisher Copyright:
© 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Fingerprint

Dive into the research topics of 'Optimization of Average Rewards and Bias: Single Class'. Together they form a unique fingerprint.

Cite this