@iScienceLuvr
MTBBench: A Multimodal Sequential Clinical Decision-Making Benchmark in Oncology Going beyond the standard single-turn multiple choice benchmarks, this paper introduces a multimodal longitudinal agentic benchmark that simulates tumor boards, where oncologists review patient cases to make clinical decisions. Evaluates both VLMs and LLMs and the use of domain-specific foundation models as tools, which provides gains of up to 9.0% and 11.2% on multimodal and longitudinal reasoning, respectively.