@gerardsans
@eigenron It has the same problems RL ran into, maybe worse. Collapsing branches to win benchmarks doesn’t improve real capability. It mostly compresses output variance; by shifting the weights, it distorts the latent space and hurts performance elsewhere. Careful what you optimize for. Benchmaxxing isn’t the path forward.