Skip to main navigation Skip to search Skip to main content

Towards efficient multi-objective alignment of large language models

  • Rui YANG

Student thesis: Master's thesis

Abstract

This study addresses the challenge of multi-objective alignment of foundation models, particularly Large Language Models (LLMs), with human values and preferences—a crucial step towards developing helpful and harmless AI systems. Fine-tuning large foundation models using reinforcement learning (RL) is often costly and unstable. Additionally, the multi-dimensionality, heterogeneity, and conflicting nature of human preferences further complicate the alignment process. In this paper, we introduce Rewards-in-Context (RiC), a novel approach that conditions the response of a foundation model on multiple rewards within its prompt context and employs supervised fine-tuning for alignment. RiC is characterized by its simplicity and adaptability, requiring only the supervised fine-tuning of a single foundation model and allowing for dynamic adjustment of user preferences during inference. Inspired by the analytical solution of an abstracted convex optimization problem, our dynamic inference-time adjustment method approximates the Pareto-optimal solution for multiple objectives. Empirical evidence demonstrates the efficacy of our method in aligning LLMs to accommodate diverse rewards with only approximately 10% of the GPU hours required by multi-objective RL baselines.
Date of Award2024
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorJunxian HE (Supervisor)

Cite this

'