TY - UNPB
T1 - Learning to Scale Logits for Temperature-Conditional GFlowNets
AU - Kim, Minsu
AU - Kim, Woochang
AU - Ko, Joohwan
AU - Park, Jinkyoo
AU - Yun, Taeyoung
AU - Bengio, Yoshua
AU - Pan, Ling
AU - Zhang, Dinghuai
PY - 2023
Y1 - 2023
N2 - GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular graphs. They are trained with the objective of sampling such objects with probability proportional to the object’s reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperatureconditional GFlowNets is the controllability of GFlowNets’ exploration and exploitation through adjusting temperature. We propose a Learning to Scale Logits for temperature-conditional GFlowNets (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy’s logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy’s logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
AB - GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular graphs. They are trained with the objective of sampling such objects with probability proportional to the object’s reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperatureconditional GFlowNets is the controllability of GFlowNets’ exploration and exploitation through adjusting temperature. We propose a Learning to Scale Logits for temperature-conditional GFlowNets (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy’s logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy’s logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
UR - https://openalex.org/W4387390228
M3 - Preprint
T3 - arXiv
BT - Learning to Scale Logits for Temperature-Conditional GFlowNets
ER -