@HuggingPapers
Introducing SpatialVID: A massive new video dataset for 3D spatial intelligence Crucial for training next-gen models, it features over 7,000 hours of diverse, in-the-wild video with dense annotations like camera poses, depth maps, and dynamic masks. https://t.co/FJXyPgOxxy