@codezakh
Can we automate the process of generating data to improve a model on diverse, open-ended tasks, based on automatically-discovered model weaknesses? Introducing DataEnvGym - a testbed for data-generation agents + teaching environments. Environment trains/evaluates student model ➡️ Environment discovers skills/errors and gives feedback to agent ➡️ Agent generates updated training data to address weaknesses ➡️ Iterate Key Idea -- Frame data generation + model improvement as an RL-style sequential decision-making task: states encode student errors, policy decides actions encoding which data to generate, and reward is the performance of the student model. We provide several modular environments + teaching agents that can improve models on VQA/math/programming, and provide a leaderboard benchmarking these agents. We welcome more entries to our leaderboard! Thread 🧵👇 (1/9)