@iScienceLuvr
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale "Autoregressive modelsâgenerating content step-by-step like reading a sentenceâexcel in language but struggle with images. Traditionally, they either depend on costly diffusion models or compress images into discrete, lossy tokens via vector quantization (VQ). NextStep-1 takes a different path: a 14B-parameter autoregressive model that works directly with continuous image tokens, preserving the full richness of visual data. It models sequences of discrete text tokens and continuous image tokens jointlyâusing a standard LM head for text and a lightweight 157M-parameter flow matching head for visuals. This unified next-token prediction framework is simple, scalable, and capable of producing stunningly detailed image"