@maximelabonne
Open recipe to turn Qwen3 into a diffusion LLM šš > Swap the causal mask for bidirectional attention > Source model matters a lot for performance > Block diffusion (BD3LM) >> masked diffusion (MDLM) > Light SFT with masking Great work from @asapzzhou with his dLLM library! https://t.co/ec2tSXAUA1