@rasbt
Hold on a sec, Mistral 3 Large uses the DeepSeek V3 architecture, including MLA? Just went through the config files; the only difference I could see is that Mistral 3 Large used 2x fewer experts but made each expert 2x large. https://t.co/jHxLKHBetr