@paperswithdata
👀Perception Test: a benchmark for evaluating the perception and reasoning skills of multimodal models. It uses real-world videos to define tasks that require understanding of memory, patterns, physics, and semantics across visual, audio, and text. https://t.co/rZ9VFkjquM https://t.co/5yl802Fkuy