Time-in-action RL

Jiangcheng Zhu*, Zhepei Wang, Douglas Mcilwraith, Chao Wu, Chao Xu, Yike Guo

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

1 Citation (Scopus)

Abstract

The authors propose a novel reinforcement learning (RL) framework, where agent behaviour is governed by traditional control theory. This integrated approach, called time-in-action RL, enables RL to be applicable to many real-world systems, where underlying dynamics are known in their control theoretical formalism. The key insight to facilitate this integration is to model the explicit time function, mapping the state-action pair to the time accomplishing the action by its underlying controller. In their framework, they describe an action by its value (action value), and the time that it takes to perform (action time). An action-value results from the policy of RL regarding a state. Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller. RL value network is then trained with embedded time model to predict action time. This approach is tested using a variant of Atari Pong and proved to be convergent.

Original languageEnglish
Pages (from-to)28-37
Number of pages10
JournalIET Cyber-systems and Robotics
Volume1
Issue number1
DOIs
Publication statusPublished - Jun 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2021 Zhejiang University Press.

Keywords

  • RL value network
  • action time
  • action value
  • control theoretical formalism
  • embedded time model
  • explicit time function
  • learning (artificial intelligence)
  • reinforcement learning framework
  • time-in-action RL

Fingerprint

Dive into the research topics of 'Time-in-action RL'. Together they form a unique fingerprint.

Cite this