'Aha moments' during reinforcement learning

Investigating phase transitions during Group Relative Policy Optimization

We look into the evolution of the Local Learning Coefficient (LLC), a measure that can indicate phase transitions in a model’s geometry, during Group Relative Policy Optimization on progressively harder algebraic tasks. Sudden changes in reward reflect peaks in the local learning coefficient.

Status: Won 3rd place at the Physics X AI Safety Grand Challenge by Apart Research.

Skills: Physics, Reinforcement learning

Time period: July 25–27, 2025