@eshedob
Turns out audio can provide supervision for visual odometry πΆπΊ Meet XVO, a generalized visual odometry model trained from YouTube videos to estimate motion with *real-world scale* (no camera parameters!) Project Page: https://t.co/qGn0rG0Tt2 #ICCV2023 @ICCVConference https://t.co/5z56blePim