Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration

Xiushan Jiang, Yanshuang Wang, Dongya Zhao*, Ling Shi

*Corresponding author for this work

Research output: Contribution to journalJournal Articlepeer-review

51 Citations (Scopus)

Abstract

In this study, the Pareto optimal strategy problem was investigated for multi-player mean-field stochastic systems governed by Itô differential equations using the reinforcement learning (RL) method. A partially model-free solution for Pareto-optimal control was derived. First, by applying the convexity of cost functions, the Pareto optimal control problem was solved using a weighted-sum optimal control problem. Subsequently, using on-policy RL, we present a novel policy iteration (PI) algorithm based on the ℌ-representation technique. In particular, by alternating between the policy evaluation and policy update steps, the Pareto optimal control policy is obtained when no further improvement occurs in system performance, which eliminates directly solving complicated cross-coupled generalized algebraic Riccati equations (GAREs). Practical numerical examples are presented to demonstrate the effectiveness of the proposed algorithm.

Original languageEnglish
Article number140202
JournalScience China Information Sciences
Volume67
Issue number4
DOIs
Publication statusPublished - Apr 2024

Bibliographical note

Publisher Copyright:
© Science China Press 2024.

Keywords

  • Pareto optimal control
  • mean-field stochastic systems
  • policy iteration scheme
  • ℌ-representation

Fingerprint

Dive into the research topics of 'Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration'. Together they form a unique fingerprint.

Cite this