@ye_chenlu
1/5 Happy CNY🎊 Still bothered by RL off-policy instability in LLM? Introducing a new way💡Adaptive Layerwise Perturbation (ALP)💡, a simple but robust fix that outperforms GRPO/MIS/Bypass, achieves better stability (KL, entropy) and exploration! 🔗 Blog: https://t.co/0def1Nb7uI https://t.co/9epsd4xJNp