TY - JOUR
T1 - Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration
AU - Jiang, Xiushan
AU - Wang, Yanshuang
AU - Zhao, Dongya
AU - Shi, Ling
N1 - Publisher Copyright:
© Science China Press 2024.
PY - 2024/4
Y1 - 2024/4
N2 - In this study, the Pareto optimal strategy problem was investigated for multi-player mean-field stochastic systems governed by Itô differential equations using the reinforcement learning (RL) method. A partially model-free solution for Pareto-optimal control was derived. First, by applying the convexity of cost functions, the Pareto optimal control problem was solved using a weighted-sum optimal control problem. Subsequently, using on-policy RL, we present a novel policy iteration (PI) algorithm based on the ℌ-representation technique. In particular, by alternating between the policy evaluation and policy update steps, the Pareto optimal control policy is obtained when no further improvement occurs in system performance, which eliminates directly solving complicated cross-coupled generalized algebraic Riccati equations (GAREs). Practical numerical examples are presented to demonstrate the effectiveness of the proposed algorithm.
AB - In this study, the Pareto optimal strategy problem was investigated for multi-player mean-field stochastic systems governed by Itô differential equations using the reinforcement learning (RL) method. A partially model-free solution for Pareto-optimal control was derived. First, by applying the convexity of cost functions, the Pareto optimal control problem was solved using a weighted-sum optimal control problem. Subsequently, using on-policy RL, we present a novel policy iteration (PI) algorithm based on the ℌ-representation technique. In particular, by alternating between the policy evaluation and policy update steps, the Pareto optimal control policy is obtained when no further improvement occurs in system performance, which eliminates directly solving complicated cross-coupled generalized algebraic Riccati equations (GAREs). Practical numerical examples are presented to demonstrate the effectiveness of the proposed algorithm.
KW - Pareto optimal control
KW - mean-field stochastic systems
KW - policy iteration scheme
KW - ℌ-representation
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:001197513900004
UR - https://openalex.org/W4393942615
UR - https://www.scopus.com/pages/publications/85189620636
U2 - 10.1007/s11432-023-3982-y
DO - 10.1007/s11432-023-3982-y
M3 - Journal Article
SN - 1674-733X
VL - 67
JO - Science China Information Sciences
JF - Science China Information Sciences
IS - 4
M1 - 140202
ER -