@RisingSayak
The @bfl_ml team released Klein KV and showed how KV-caching can incorporated in a flow pipeline 🤯 The idea is simple and elegant. In the first denoising step, reference image tokens are included in the full DiT forward pass. Their per-layer KVs are computed and cached. In the subsequent steps, KVs for only noisy latents are computed while the cached reference KVs are injected during computing attention. As a result, it delivers upto 2.5x speedups for multi-reference editing tasks over Klein. I basically learned about it from this PR: https://t.co/4jbAboaStf The PR is a poetry in motion and is from the BFL team itself! Kudos to them for always being the best when it comes to designing codebases for flow and diffusion models. The best! Check out the model here: https://t.co/f3NOHkg2HQ