@rasbt
Pretty cool that they open sourced the actual full-sized production model. Here’s the Grok 2.5 architecture overview next to a roughly similarly sized Qwen3 model. The MoE residual is quite interesting. Kind of like a shared expert. I don't think I've seen this setup before. https://t.co/3t3c2buu3A