Skip to main navigation Skip to search Skip to main content

Large Language Model Driven Reinforcement Learning for Portfolio Allocation

  • Meizi LI

Student thesis: Master's thesis

Abstract

This thesis studies when large language models (LLMs) add incremental value to reinforcement learning (RL) for portfolio management beyond strong price-based baselines. Portfolio management matters because investors must allocate capital under transaction costs, drawdown constraints, and regime shifts. Classical models provide static snapshots of this tradeoff, while deep-RL frameworks learn dynamic policies mostly from prices and technical indicators. In parallel, financial LLMs summarise earnings calls, regulatory filings, and news, yet it is unclear how such text signals affect an RL allocator in a portfolio setting.

We instantiate this question as a weekly allocation task over a 20-stock U.S. equity universe from mid-April 2019 to early September 2025, with proportional trading costs and a risk-aware reward. A multistage LLM analyst stack converts earnings calls, SEC filings, and news into structured signals and risk scores, which are fed to a Proximal Policy Optimisation (PPO) agent either as observations or as exogenous penalties. The empirical programme asks: (i) how LLM-derived observations compare with a tuned price-only PPO and equal-weight and minimum-variance baselines; (ii) whether LLM-based risk shaping improves downside protection; and (iii) whether curriculum-style feature introduction outperforms all-in-one multimodal training.

Experiments fix a reproducible environment and systematically vary observation sets, risk coefficients, and training schedules, with logging of turnover, concentration, and portfolio-weight heatmaps. First, a price-only PPO agent consistently matches or exceeds classical baselines in net asset value while respecting similar risk profiles. Second, LLM-enhanced agents change trading style—inducing sector tilts and higher turnover—but do not robustly outperform the price-only backbone, consistent with semi-strong market efficiency in this finite, non-stationary sample. Third, LLM-based risk shaping and curricula offer at best modest, fragile gains relative to a simpler all-in-one backbone.

The thesis contributes an end-to-end LLM-driven, risk-aware portfolio framework, an ablation-centred evaluation of multi-source integration, and design lessons for LLM–RL systems in similar equity-portfolio settings.

Date of Award2026
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorFangzhen LIN (Supervisor)

Cite this

'