@gerardsans
@PsyPost They wouldn’t be surprised if they spend more time learning how LLMs work. This is not a mystery. It’s the result of sharing high percentages of the same training data and excessive RL/post-training focused around the same benchmarks known as eval maxxin.