@omarsar0
AI Agents suck at long-horizon tasks. AgentGym-RL aims to train strong LLM agents with long-horizon capabilities. Finds that post-training and test-time compute scale better than model size alone for agentic tasks. Leads to 7B models that beat much larger systems. My notes: https://t.co/rfqmHaGWUr