Nowadays, urban systems are widely deployed in many major cities, e.g. ride-sharing system, express system, take-out food system, emergency medical service system, etc. Having modernized and facilitated the daily life of citizens significantly, these systems are facing severe operation challenges. For example, how to match passengers to drivers in a ride-sharing system, how to dispatch couriers in real time in an express system, etc. Previously, operation problems in urban systems are often tackled by methods in operation research, e.g. optimization, or heuristic algorithms based on practical system settings. For an urban system, as we often want to generate a sequence of real-time actions to maximize the total reward in a long time, reinforcement learning is a proper choice. Besides, as the system is often large and complex, deep learning methods are necessary to capture representative and enriched features of the environment. In this thesis, we investigate how D̲eep R̲einforcement L̲earning, i.e. DRL, can effectively learn operation policies for urban systems. For an urban system, according to how it operates, C̲entral-A̲gent R̲einforcement L̲earning, i.e. CARL, or M̲ulti-A̲gent R̲einforcement L̲earning, i.e. MARL, can be chosen to describe its operation process. For a system whose operation is described by CARL, we focus on how to properly formulate the problem and design each component of the model, i.e. the state, action, and immediate reward, thus to optimize the final target of the system. We adopt the take-out food system as an example and propose a D̲eep R̲einforcement O̲rder P̲acking model, i.e. DROP, to solve the operation problem in it. For a system whose operation can be described by MARL, besides designing each component of the model, we also try to guarantee that agents in the system cooperate with each other properly. We adopt the express system as an example, where there are many couriers working it, and propose a D̲eep R̲einforcement C̲ourier D̲ispatching model, i.e. DRCD, to solve the operation problem in it. DRCD can guarantee the cooperation among couriers to some extent but not globally, therefore, we further propose a C̲ooperative M̲ulti-A̲gent R̲einforcement L̲earning model, i.e. CMARL, to guarantee the cooperation among couriers globally by incorporating another Markov Decision Process along the agent sequence. Experiments based on real-world data are conducted to confirm the superiority of DROP, DRCD, and CMARL, compared with baselines. In MARL, besides cooperation among agents, competition also exists, although it is not common in modern urban systems. We briefly discuss about this scenario at the end to make this thesis complete.
| Date of Award | 2020 |
|---|
| Original language | English |
|---|
| Awarding Institution | - The Hong Kong University of Science and Technology
|
|---|
| Supervisor | Qiang YANG (Supervisor) |
|---|