Abstract
This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.
| Original language | English |
|---|---|
| Pages (from-to) | 696-699 |
| Number of pages | 4 |
| Journal | IEEE Transactions on Automatic Control |
| Volume | 50 |
| Issue number | 5 |
| DOIs | |
| Publication status | Published - May 2005 |
Keywords
- Markov decision processes
- Online estimation
- Perturbation analysis (PA)
- Perturbation realization
- Poisson equations
- Potentials
- Reinforcement learning