@sh_reya
Databases are arguably the most commonly used enterprise tool, and enterprises typically have many of them. Yet no popular AI agent benchmark actually tests how well agents can query, join, and make sense of data across different databases! So, we built DAB (Data Agent Benchmark): 54 queries, 12 datasets, 9 domains, and 4 database management systems, grounded in a formative study of real enterprise data agent workloads. The best frontier model only gets 38% pass@1 (across 50 trials). Lots of room for improvement!