@psermanet
I’m really happy to share that we’re launching UMA. Together with @RemiCadene, @alibert_s, @therobotstudio, and an exceptional founding team, we’re building general-purpose mobile and humanoid robots. If you want to be part of this adventure, reach out at https://t.co/X7IMRZtHfB Throughout my career, I have been obsessed with scalable learning and data acquisition methods that require little to no labels. Back in 2005 with @ylecun, we were self-supervising our “deep” 2-layer network to do long range vision using short range stereo information, this was running live onboard our robot. However, because our deep model was so slow, the robot would crash constantly, so I designed a decoupled fast & far architecture for robust navigation, allowing fast control to coexist with slow long horizon thinking, much like systems 1 & 2 in modern humanoids. My PhD was focused on making deep learning work for computer vision, including unsupervised feature learning with @koraykv, writing and open-sourcing a C++ deep learning library with @soumithchintala, and open-sourcing one of the first deep learning vision systems. I came back towards robotics at @Google Brain and @GoogleDeepMind, where I pushed for entirely label-free methods on real robots. In 2017, @coreylynch and I managed to make our robot imitate human motion by co-training self-supervision across sim and real domains jointly, without any labels. With @imkelvinxu and @svlevine , we showed that unsupervised visual reward learning could be used for RL in the real world. In 2020, Corey and I developed the first manipulation VLA, which was trained with very few language labels thanks to self-supervision on play data (playing is an efficient way to demonstrate and practice a broad set of skills and is essential for human development). I was never satisfied with the status quo of top-down data collection, where researchers decide a few tasks to collect data on. Instead, I believed that we should let the data speak: tasks should be automatically discovered bottom-up (scalable and general) from cheap and continuous data collection, with a sprinkle of more expensive data and labels. In 2022, I explored long-horizon reasoning for robotics using scalable automatic labeling augmentations for VQA tasks and studied the economics of different data collection schemes. Most recently, I developed approaches to scalably discover laws of robotics from real data (images, hospital reports, sci-fi literature) in a broad and bottom-up fashion, which improved robot behavior over top-down approaches like Asimov’s laws. All these experiences nourished my vision for UMA as Chief Scientist, I’m incredibly excited to put everything together and so grateful I get to contribute to this incredible moment in human history. Picture: Yann supporting UMA as an advisor and investor, with the team in Paris a couple weeks ago.