@KevinQHLin
Thanks @_akhaliq sharing our work! Can frontier Multimodal Agents play games as well as humans? 🤩We are excited to introduce 🎮GameWorld: towards standardized and verfiable evaluation for multimodal game agents. 🕹️ 34 browser games 📌 170 tasks 🤖 18 multimodal agent baselines, covering 1. Computer-use (CUA) agents 👉 raw keyboard + mouse actions 2. Generalist multimodal agents 👉 semantic action parsinga GameWorld show that even sota agents still perform far below novice human players. 📹Watch our live runs: https://t.co/wrhKJD9JVx 🌐project page: https://t.co/J906LQ6Sfj 💻github: https://t.co/W1vL99MDg5 work done with @OuyyyangMingyu @who_s_yuan Hwee Tou Ng, @MikeShou1