@pablovelagomez1
Some updates on the multiview vistadream pipeline with @rerundotio! @rerundotio came in extremely useful here, as being able to visualize depths at each stage of the pipeline allowed me to debug some nasty bugs. Since the last time, I was only working with a single image input. I've added in VGGT as my multiview pose + depth estimator. It works REALLY well for getting camera poses, but the depths are not that great. To try and fix that, I estimated depth maps from MoGeV2 for each of the views, and scale+shift aligned them so that they would match up to the confident sections of VGGT's depth predictions. You can see in the video just how much sharper the visualized 2d depth maps are! The biggest issue continues to be the multiview consistency 🫠 That's up next, along with actually training the Gaussian splat. Lots of work went into actually understanding inputs+outputs for VGGT. I had some funky bugs where the confidence values would all collapse to true I'm also really excited for this pipeline to use Difix3D+ Nvidia instead of Flux Inpainting, it seems like a better suited for a multiview pipeline.