Optimization of Average Rewards: Multi-Chains

Xi Ren Cao*

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportBook Chapterpeer-review

Abstract

In this chapter, we study the optimization of the long-run average of multi-class time-nonhomogeneous Markov chains (TNHMCs). We show that with confluencity, state classification, and relative optimization, we can obtain the necessary and sufficient conditions for optimal policies of the average reward of TNHMCs consisting of multiple confluent classes (multi-chains). The optimality conditions do not need to hold in any finite period, or “non-frequently visited” time sequence. In the analysis, we assume that the limit of the average exists. In general, the performance should be defined as the “liminf” of the average. However, because of the non-linear property of “liminf”, it is not well-defined for branching states, unless the TNHMC is “asynchronous” among different confluent classes. This property is also studied.

Original languageEnglish
Title of host publicationSpringerBriefs in Control, Automation and Robotics
PublisherSpringer
Pages59-78
Number of pages20
DOIs
Publication statusPublished - 2021

Publication series

NameSpringerBriefs in Control, Automation and Robotics
ISSN (Print)2192-6786
ISSN (Electronic)2192-6794

Bibliographical note

Publisher Copyright:
© 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Fingerprint

Dive into the research topics of 'Optimization of Average Rewards: Multi-Chains'. Together they form a unique fingerprint.

Cite this