A basic formula for online policy gradient algorithms

Xi Ren Cao*

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

37 Citations (Scopus)

Abstract

This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.

Original languageEnglish
Pages (from-to)696-699
Number of pages4
JournalIEEE Transactions on Automatic Control
Volume50
Issue number5
DOIs
Publication statusPublished - May 2005

Keywords

  • Markov decision processes
  • Online estimation
  • Perturbation analysis (PA)
  • Perturbation realization
  • Poisson equations
  • Potentials
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'A basic formula for online policy gradient algorithms'. Together they form a unique fingerprint.

Cite this