Skip to main navigation Skip to search Skip to main content

Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control

Research output: Chapter in Book/Conference Proceeding/ReportConference Paper published in a bookpeer-review

Abstract

In this paper, we propose a memory-based Q-learning algorithm called predictive Q-routing (PQ-routing) for adaptive traffic control. We attempt to address two problems encountered in Q-routing (Boyan & Littman, 1994), namely, the inability to fine-tune routing policies under low network load and the inability to learn new optimal policies under decreasing load conditions. Unlike other memory-based reinforcement learning algorithms in which memory is used to keep past experiences to increase learning speed, PQ-routing keeps the best experiences learned and reuses them by predicting the traffic trend. The effectiveness of PQ-routing has been verified under various network topologies and traffic conditions. Simulation results show that PQ-routing is superior to Q-routing in terms of both learning speed and adaptability.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 8, NIPS 1995
EditorsD. Touretzky, M.C. Mozer, M. Hasselmo
PublisherNeural information processing systems foundation
Pages945-951
Number of pages7
ISBN (Electronic)0262201070, 9780262201070
DOIs
Publication statusPublished - 27 Nov 1995
Event8th Advances in Neural Information Processing Systems, NIPS 1995 - Denver, United States
Duration: 27 Nov 199530 Nov 1995

Publication series

NameAdvances in Neural Information Processing Systems
Volume8
ISSN (Print)1049-5258

Conference

Conference8th Advances in Neural Information Processing Systems, NIPS 1995
Country/TerritoryUnited States
CityDenver
Period27/11/9530/11/95

Bibliographical note

Publisher Copyright:
© 1995 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control'. Together they form a unique fingerprint.

Cite this