TY - JOUR
T1 - The potential structure of sample paths and performance sensitivities of Markov systems
AU - Cao, Xi Ren
PY - 2004/12
Y1 - 2004/12
N2 - We study the structure of sample paths of Markov systems by using performance potentials as the fundamental units. With a sample path-based approach, we show that performance sensitivity formulas (performance gradients and performance differences) of Markov systems can be constructed intuitively, by first principles, with performance potentials (or equivalently, perturbation realization factors) as building blocks. In particular, we derive sensitivity formulas for two Markov chains with possibly different state spaces. The proposed approach can be used to obtain flexibly the sensitivity formulas for a wide range of problems, including those with partial information. These formulas are the basis for performance optimization of discrete event dynamic systems, including perturbation analysis, Markov decision processes, and reinforcement learning. The approach thus provides insight on on-line learning and performance optimization and opens up new research directions. Sample path based algorithms can be developed.
AB - We study the structure of sample paths of Markov systems by using performance potentials as the fundamental units. With a sample path-based approach, we show that performance sensitivity formulas (performance gradients and performance differences) of Markov systems can be constructed intuitively, by first principles, with performance potentials (or equivalently, perturbation realization factors) as building blocks. In particular, we derive sensitivity formulas for two Markov chains with possibly different state spaces. The proposed approach can be used to obtain flexibly the sensitivity formulas for a wide range of problems, including those with partial information. These formulas are the basis for performance optimization of discrete event dynamic systems, including perturbation analysis, Markov decision processes, and reinforcement learning. The approach thus provides insight on on-line learning and performance optimization and opens up new research directions. Sample path based algorithms can be developed.
KW - Markov decision processes
KW - Performance sensitivity
KW - Perturbation analysis
KW - Perturbation realization
KW - Reinforcement learning
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000225659200003
UR - https://openalex.org/W2096505218
UR - https://www.scopus.com/pages/publications/11044222936
U2 - 10.1109/TAC.2004.838494
DO - 10.1109/TAC.2004.838494
M3 - Journal Article
SN - 0018-9286
VL - 49
SP - 2129
EP - 2142
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 12
ER -