@__JohnNguyen__
Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow Matching (images). Performance boost while unlocking new capabilities: 🔥 Scales better than Transfusion (AR), 50% fewer FLOPS than Transfusion for similar performance ⚡ Mixed-modal training boosts both generation & understanding 😑 Mask diffusion: fixed-length + extra FLOPS for masked tokens. Transfusion: sequential generation ✅ OneFlow: variable-length via token deletion + concurrent mixed-modal generation with fewer FLOPS 🧵1/n