@rasbt
@mwcrutcher No worries and thanks for the follow-up. I am not sure I am seeing the problem correctly. I.e. out of the 8 routed experts, are the *not* (weighted) summing over them? Or do you mean the top-k expert selection + weighted sum should be shown in more detail?