Abstract
In this chapter, we study the optimization of the long-run average and bias of single class (or uni-chain) time-nonhomogeneous Markov chains. With confluencity, we define the most central notion in performance optimization, the performance potentials, and discuss its properties. With the performance potentials, we derive the difference formula for the average rewards of any two policies; and based on which we obtain the necessary and sufficient optimality conditions for average rewards. In addition, we study the bias optimality, which optimizes transient performance in the initial period. Bias potentials are defined, and bias optimality conditions are derived. The approach is called relative optimization since it is based on the performance difference formula that gives the difference of the performance measures of any two policies on the entire infinite horizon. The under-selectivity is reflected in the optimality conditions because from the difference formula, it is clear that the optimality conditions do not need to hold in any finite period, or in any “non-frequently” visited sequence of time instants.
| Original language | English |
|---|---|
| Title of host publication | SpringerBriefs in Control, Automation and Robotics |
| Publisher | Springer |
| Pages | 29-58 |
| Number of pages | 30 |
| DOIs | |
| Publication status | Published - 2021 |
Publication series
| Name | SpringerBriefs in Control, Automation and Robotics |
|---|---|
| ISSN (Print) | 2192-6786 |
| ISSN (Electronic) | 2192-6794 |
Bibliographical note
Publisher Copyright:© 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Fingerprint
Dive into the research topics of 'Optimization of Average Rewards and Bias: Single Class'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver