@iScienceLuvr
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents "We present BIFROST-1, a unified framework that bridges pretrained multimodal LLMs (MLLMs) and diffusion models using patch-level CLIP image embeddings as latent variables, which are natively aligned with the MLLM’s CLIP visual encoder. "