@eliebakouch
wow, this looks like a very solid open model by Xiaomi, competing with K2/DSV3.2 on benchmarks with fewer parameters. it's MIT licensed, with a very good tech report and base/thinking versions available it's using the same sliding window attention arch as gpt-oss (sink with SWA size = 128) but with much fewer global attention layers, multi-token prediction for speculative decoding support, and a new post-training distillation method. really seems like a beast at inference with day 0 @sgl_project support! really exciting