🐦 Twitter Post Details

Viewing enriched Twitter post

@pablovelagomez1

Some updates on the multiview vistadream pipeline with @rerundotio! @rerundotio came in extremely useful here, as being able to visualize depths at each stage of the pipeline allowed me to debug some nasty bugs. Since the last time, I was only working with a single image input. I've added in VGGT as my multiview pose + depth estimator. It works REALLY well for getting camera poses, but the depths are not that great. To try and fix that, I estimated depth maps from MoGeV2 for each of the views, and scale+shift aligned them so that they would match up to the confident sections of VGGT's depth predictions. You can see in the video just how much sharper the visualized 2d depth maps are! The biggest issue continues to be the multiview consistency 🫠 That's up next, along with actually training the Gaussian splat. Lots of work went into actually understanding inputs+outputs for VGGT. I had some funky bugs where the confidence values would all collapse to true I'm also really excited for this pipeline to use Difix3D+ Nvidia instead of Flux Inpainting, it seems like a better suited for a multiview pipeline.

📊 Media Metadata

{
  "media": [
    {
      "url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1956460113000349953/media_0.mp4?",
      "media_url": "https://crmoxkoizveukayfjuyo.supabase.co/storage/v1/object/public/media/posts/1956460113000349953/media_0.mp4?",
      "type": "video",
      "filename": "media_0.mp4"
    }
  ],
  "processed_at": "2025-08-15T22:37:33.892859",
  "pipeline_version": "2.0"
}

🔧 Raw API Response

{
  "type": "tweet",
  "id": "1956460113000349953",
  "url": "https://x.com/pablovelagomez1/status/1956460113000349953",
  "twitterUrl": "https://twitter.com/pablovelagomez1/status/1956460113000349953",
  "text": "Some updates on the multiview vistadream pipeline with @rerundotio!\n\n@rerundotio came in extremely useful here, as being able to visualize depths at each stage of the pipeline allowed me to debug some nasty bugs.\n\nSince the last time, I was only working with a single image input. I've added in VGGT as my multiview pose + depth estimator. It works REALLY well for getting camera poses, but the depths are not that great.\n\nTo try and fix that, I estimated depth maps from MoGeV2 for each of the views, and scale+shift aligned them so that they would match up to the confident sections of VGGT's depth predictions.\n\nYou can see in the video just how much sharper the visualized 2d depth maps are!\n\nThe biggest issue continues to be the multiview consistency 🫠\n\nThat's up next, along with actually training the Gaussian splat. Lots of work went into actually understanding inputs+outputs for VGGT. I had some funky bugs where the confidence values would all collapse to true\n\nI'm also really excited for this pipeline to use Difix3D+ Nvidia instead of Flux Inpainting, it seems like a better suited for a multiview pipeline.",
  "source": "Twitter for iPhone",
  "retweetCount": 6,
  "replyCount": 2,
  "likeCount": 11,
  "quoteCount": 0,
  "viewCount": 4767,
  "createdAt": "Fri Aug 15 20:56:54 +0000 2025",
  "lang": "en",
  "bookmarkCount": 2,
  "isReply": false,
  "inReplyToId": null,
  "conversationId": "1956460113000349953",
  "displayTextRange": [
    0,
    281
  ],
  "inReplyToUserId": null,
  "inReplyToUsername": null,
  "author": {
    "type": "user",
    "userName": "pablovelagomez1",
    "url": "https://x.com/pablovelagomez1",
    "twitterUrl": "https://twitter.com/pablovelagomez1",
    "id": "844241772951822336",
    "name": "Pablo Vela",
    "isVerified": false,
    "isBlueVerified": true,
    "verifiedType": null,
    "profilePicture": "https://pbs.twimg.com/profile_images/1740024916156481536/D3NUhaAp_normal.jpg",
    "coverPicture": "https://pbs.twimg.com/profile_banners/844241772951822336/1747276266",
    "description": "",
    "location": "",
    "followers": 1683,
    "following": 580,
    "status": "",
    "canDm": true,
    "canMediaTag": false,
    "createdAt": "Tue Mar 21 17:38:18 +0000 2017",
    "entities": {
      "description": {
        "urls": []
      },
      "url": {}
    },
    "fastFollowersCount": 0,
    "favouritesCount": 1837,
    "hasCustomTimelines": true,
    "isTranslator": false,
    "mediaCount": 61,
    "statusesCount": 1097,
    "withheldInCountries": [],
    "affiliatesHighlightedLabel": {},
    "possiblySensitive": false,
    "pinnedTweetIds": [
      "1943025322267639809"
    ],
    "profile_bio": {
      "description": "Computer Vision engineer with a focus on 3D",
      "entities": {
        "description": {},
        "url": {
          "urls": [
            {
              "display_url": "pablovela.dev",
              "expanded_url": "https://pablovela.dev/",
              "indices": [
                0,
                23
              ],
              "url": "https://t.co/SdGe7Welay"
            }
          ]
        }
      }
    },
    "isAutomated": false,
    "automatedBy": null
  },
  "extendedEntities": {
    "media": [
      {
        "additional_media_info": {
          "monetizable": false
        },
        "display_url": "pic.twitter.com/xQiWJWrVze",
        "expanded_url": "https://twitter.com/pablovelagomez1/status/1956460113000349953/video/1",
        "ext_media_availability": {
          "status": "Available"
        },
        "id_str": "1956460059279650816",
        "indices": [
          282,
          305
        ],
        "media_key": "13_1956460059279650816",
        "media_results": {
          "id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARsmvg7imnAAAAA=",
          "result": {
            "__typename": "ApiMedia",
            "id": "QXBpTWVkaWE6DAAECgABGya+DuKacAAAAA==",
            "media_key": "13_1956460059279650816"
          }
        },
        "media_url_https": "https://pbs.twimg.com/amplify_video_thumb/1956460059279650816/img/jz52qHJlJSVSY2-Q.jpg",
        "original_info": {
          "focus_rects": [],
          "height": 1080,
          "width": 1688
        },
        "sizes": {
          "large": {
            "h": 1080,
            "w": 1688
          }
        },
        "type": "video",
        "url": "https://t.co/xQiWJWrVze",
        "video_info": {
          "aspect_ratio": [
            211,
            135
          ],
          "duration_millis": 14550,
          "variants": [
            {
              "content_type": "application/x-mpegURL",
              "url": "https://video-s.twimg.com/amplify_video/1956460059279650816/pl/d0TPdbGV1lHlBdMm.m3u8?tag=14"
            },
            {
              "bitrate": 288000,
              "content_type": "video/mp4",
              "url": "https://video-s.twimg.com/amplify_video/1956460059279650816/vid/avc1/422x270/IUTeEOWNzhuM6tm_.mp4?tag=14"
            },
            {
              "bitrate": 832000,
              "content_type": "video/mp4",
              "url": "https://video-s.twimg.com/amplify_video/1956460059279650816/vid/avc1/562x360/Ch8T27X8gfXvjCd6.mp4?tag=14"
            },
            {
              "bitrate": 2176000,
              "content_type": "video/mp4",
              "url": "https://video-s.twimg.com/amplify_video/1956460059279650816/vid/avc1/1124x720/ug0XajuoekOmdKf5.mp4?tag=14"
            }
          ]
        }
      }
    ]
  },
  "card": null,
  "place": {},
  "entities": {
    "user_mentions": [
      {
        "id_str": "1476271442455142403",
        "indices": [
          55,
          66
        ],
        "name": "Rerun",
        "screen_name": "rerundotio"
      },
      {
        "id_str": "1476271442455142403",
        "indices": [
          69,
          80
        ],
        "name": "Rerun",
        "screen_name": "rerundotio"
      }
    ]
  },
  "quoted_tweet": {
    "type": "tweet",
    "id": "1953934728534339960",
    "url": "https://x.com/pablovelagomez1/status/1953934728534339960",
    "twitterUrl": "https://twitter.com/pablovelagomez1/status/1953934728534339960",
    "text": "Some more updates on the image -> splat pipeline visualized with @rerundotio 🖼️➡️✨ I did a lot more with blueprints to try to make things easier to understand and visually pleasing 🎨\n\nI also wanted to get things working really well with the single image version before moving on to multiview, so I was meticulous about logging each part of the pipeline. I also made it easy to run each part of the pipeline described below.\n\n1. No outpainting, only splat training (no-outpaint) 🚫\n2. Outpainting + Splat Training (outpaint) 🖌️\n3. Outpainting + Splat Training + Inpainting (coarse) 🩹\n\nAll using a custom flux model from the great Asuka paper (https://github. com/Yikai-Wang/asuka-misato)\n\nWith a 1024 max resolution on my 5090, I'm getting around 4 seconds for the first stage, 30 seconds for the second stage, and 2 minutes for the third stage ⏱️\n\nI'm not yet doing the \"fine\" stage, which incorporates multiview constrained diffusion, and I'm not sure if I'll work on that before moving on to true sparse multiview \n\nFinally, I've been using AI to write code a lot more, and so I've been very methodical about using beartype + jaxtyping 🐻. There's a performance hit, BUT only when running in the dev environment (pixi shell -e dev), and in my opinion, it's totally worth it. When developing, it works as a two-fold process of making the code self-documenting AND giving type guarantees at runtime ✅\n\nI promise the next step is getting this working with VGGT for sparse views, I just really wanted the single image version to look great! 🎯",
    "source": "Twitter for iPhone",
    "retweetCount": 15,
    "replyCount": 6,
    "likeCount": 151,
    "quoteCount": 1,
    "viewCount": 15879,
    "createdAt": "Fri Aug 08 21:41:56 +0000 2025",
    "lang": "en",
    "bookmarkCount": 83,
    "isReply": false,
    "inReplyToId": null,
    "conversationId": "1953934728534339960",
    "displayTextRange": [
      0,
      282
    ],
    "inReplyToUserId": null,
    "inReplyToUsername": null,
    "author": {
      "type": "user",
      "userName": "pablovelagomez1",
      "url": "https://x.com/pablovelagomez1",
      "twitterUrl": "https://twitter.com/pablovelagomez1",
      "id": "844241772951822336",
      "name": "Pablo Vela",
      "isVerified": false,
      "isBlueVerified": true,
      "verifiedType": null,
      "profilePicture": "https://pbs.twimg.com/profile_images/1740024916156481536/D3NUhaAp_normal.jpg",
      "coverPicture": "https://pbs.twimg.com/profile_banners/844241772951822336/1747276266",
      "description": "",
      "location": "",
      "followers": 1683,
      "following": 580,
      "status": "",
      "canDm": true,
      "canMediaTag": false,
      "createdAt": "Tue Mar 21 17:38:18 +0000 2017",
      "entities": {
        "description": {
          "urls": []
        },
        "url": {}
      },
      "fastFollowersCount": 0,
      "favouritesCount": 1837,
      "hasCustomTimelines": true,
      "isTranslator": false,
      "mediaCount": 61,
      "statusesCount": 1097,
      "withheldInCountries": [],
      "affiliatesHighlightedLabel": {},
      "possiblySensitive": false,
      "pinnedTweetIds": [
        "1943025322267639809"
      ],
      "profile_bio": {
        "description": "Computer Vision engineer with a focus on 3D",
        "entities": {
          "description": {},
          "url": {
            "urls": [
              {
                "display_url": "pablovela.dev",
                "expanded_url": "https://pablovela.dev/",
                "indices": [
                  0,
                  23
                ],
                "url": "https://t.co/SdGe7Welay"
              }
            ]
          }
        }
      },
      "isAutomated": false,
      "automatedBy": null
    },
    "extendedEntities": {
      "media": [
        {
          "additional_media_info": {
            "monetizable": false
          },
          "display_url": "pic.twitter.com/MU1tKqbluu",
          "expanded_url": "https://twitter.com/pablovelagomez1/status/1953934728534339960/video/1",
          "ext_media_availability": {
            "status": "Available"
          },
          "id_str": "1953929026331901952",
          "indices": [
            283,
            306
          ],
          "media_key": "13_1953929026331901952",
          "media_results": {
            "id": "QXBpTWVkaWFSZXN1bHRzOgwABAoAARsdwBja1oAAAAA=",
            "result": {
              "__typename": "ApiMedia",
              "id": "QXBpTWVkaWE6DAAECgABGx3AGNrWgAAAAA==",
              "media_key": "13_1953929026331901952"
            }
          },
          "media_url_https": "https://pbs.twimg.com/amplify_video_thumb/1953929026331901952/img/WJm1mKbtZ1L-ZfVS.jpg",
          "original_info": {
            "focus_rects": [],
            "height": 1080,
            "width": 1928
          },
          "sizes": {
            "large": {
              "h": 1080,
              "w": 1928
            }
          },
          "type": "video",
          "url": "https://t.co/MU1tKqbluu",
          "video_info": {
            "aspect_ratio": [
              241,
              135
            ],
            "duration_millis": 14600,
            "variants": [
              {
                "content_type": "application/x-mpegURL",
                "url": "https://video-s.twimg.com/amplify_video/1953929026331901952/pl/CbESJ3jqzaB_Jcik.m3u8?tag=21"
              },
              {
                "bitrate": 256000,
                "content_type": "video/mp4",
                "url": "https://video-s.twimg.com/amplify_video/1953929026331901952/vid/avc1/482x270/a47GizqMZIsXr9EM.mp4?tag=21"
              },
              {
                "bitrate": 832000,
                "content_type": "video/mp4",
                "url": "https://video-s.twimg.com/amplify_video/1953929026331901952/vid/avc1/642x360/lRSlyJxYk-6Fv_z9.mp4?tag=21"
              },
              {
                "bitrate": 2176000,
                "content_type": "video/mp4",
                "url": "https://video-s.twimg.com/amplify_video/1953929026331901952/vid/avc1/1284x720/9VOwcWDv1oZfwT8r.mp4?tag=21"
              },
              {
                "bitrate": 10368000,
                "content_type": "video/mp4",
                "url": "https://video-s.twimg.com/amplify_video/1953929026331901952/vid/avc1/1928x1080/UcuIxyrQQ1yteg0Z.mp4?tag=21"
              }
            ]
          }
        }
      ]
    },
    "card": null,
    "place": {},
    "entities": {
      "user_mentions": [
        {
          "id_str": "1476271442455142403",
          "indices": [
            65,
            76
          ],
          "name": "Rerun",
          "screen_name": "rerundotio"
        }
      ]
    },
    "quoted_tweet": {
      "type": "tweet",
      "id": "1951367496755188212",
      "url": "",
      "twitterUrl": "",
      "text": "",
      "source": "Twitter for iPhone",
      "retweetCount": 0,
      "replyCount": 0,
      "likeCount": 0,
      "quoteCount": 0,
      "viewCount": 0,
      "createdAt": "",
      "lang": "",
      "bookmarkCount": 0,
      "isReply": false,
      "inReplyToId": null,
      "conversationId": "",
      "displayTextRange": [],
      "inReplyToUserId": null,
      "inReplyToUsername": null,
      "author": {},
      "extendedEntities": {},
      "card": null,
      "place": {},
      "entities": {},
      "quoted_tweet": null,
      "retweeted_tweet": null,
      "article": null
    },
    "retweeted_tweet": null,
    "article": null
  },
  "retweeted_tweet": null,
  "article": null
}