'Aha moments' during reinforcement learning
Investigating phase transitions during Group Relative Policy Optimization
We look into the evolution of the Local Learning Coefficient (LLC), a measure that can indicate phase transitions in a model’s geometry, during Group Relative Policy Optimization on progressively harder algebraic tasks. Sudden changes in reward reflect peaks in the local learning coefficient.
Status: Won 3rd place at the Physics X AI Safety Grand Challenge by Apart Research.
Skills: Physics, Reinforcement learning
Time period: July 25–27, 2025