@brian_ichter
How do you get zero-shot robot control from VLMs? Introducing Prompting with Iterative Visual Optimization, or PIVOT! It casts spatial reasoning tasks as VQA by visually annotating images, which VLMs can understand and answer. Project website: https://t.co/HBO4WHPJk6 https://t.co/g01YqRzzsa