@_philschmid
Tau Bench got an update! Tau Bench is one of the most adopted Agentic Benchmarks. They now added “Banking” a fintech-inspired customer support domain built around a realistic knowledge base of 698 documents across 21 product categories. Tasks require agents to search this corpus, reason over what they find, and execute multi-step tool calls. "There's this transaction I want to dispute. I also want to file a credit limit increase request." The best model achieve 25% success of tasks and ~< 10% on pass^4