@hendrydong
💥Thrilled to share our new work Reinforce-Ada, which fixes signal collapse in GRPO 🥳No more blind oversampling or dead updates. Just sharper gradients, faster convergence, and stronger models. ⚙️ One-line drop-in. Real gains. https://t.co/kJTeVek1S3 https://t.co/7qLywG2KWR https://t.co/4BGowcLpl5