Skip to main navigation Skip to search Skip to main content

Learning to Scale Logits for Temperature-Conditional GFlowNets

  • Minsu Kim*
  • , Woochang Kim
  • , Joohwan Ko
  • , Jinkyoo Park
  • , Taeyoung Yun
  • , Yoshua Bengio
  • , Ling Pan
  • , Dinghuai Zhang
  • *Corresponding author for this work

Research output: Working paperPreprint

Abstract

GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular graphs. They are trained with the objective of sampling such objects with probability proportional to the object’s reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperatureconditional GFlowNets is the controllability of GFlowNets’ exploration and exploitation through adjusting temperature. We propose a Learning to Scale Logits for temperature-conditional GFlowNets (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy’s logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy’s logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
Original languageEnglish
Publication statusPublished - 2023
Externally publishedYes

Publication series

NamearXiv

Fingerprint

Dive into the research topics of 'Learning to Scale Logits for Temperature-Conditional GFlowNets'. Together they form a unique fingerprint.

Cite this