On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

Shuang Qiu*, Jieping Ye*, Zhaoran Wang*, Zhuoran Yang*

*Corresponding author for this work

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

21 Citations (Scopus)

Abstract

To achieve sample efficiency in reinforcement learning (RL), it necessitates to efficiently explore the underlying environment. Under the offline setting, addressing the exploration challenge lies in collecting an offline dataset with sufficient coverage. Motivated by such a challenge, we study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. Then, given any extrinsic reward, the agent computes the optimal policy via offline RL with data collected in the exploration stage. Moreover, we tackle this problem under the context of function approximation, leveraging powerful function approximators. Specifically, we propose to explore via an optimistic variant of the value-iteration algorithm incorporating kernel and neural function approximations, where we adopt the associated exploration bonus as the exploration reward. Moreover, we design exploration and planning algorithms for both single-agent MDPs and zero-sum Markov games and prove that our methods can achieve Oe(1/ε2) sample complexity for generating a ε-suboptimal policy or ε-approximate Nash equilibrium when given an arbitrary extrinsic reward. To the best of our knowledge, we establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.

Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages8737-8747
Number of pages11
ISBN (Electronic)9781713845065
Publication statusPublished - 2021
Externally publishedYes
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: 18 Jul 202124 Jul 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period18/07/2124/07/21

Bibliographical note

Publisher Copyright:
Copyright © 2021 by the author(s)

Fingerprint

Dive into the research topics of 'On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game'. Together they form a unique fingerprint.

Cite this