@emollick
It turns out that Meta had 27 different models on LM Arena prior to the launch of Llama 4, but they announced it as if they had one model that topped the leaderboard. An extreme example of benchmark hacking (which other labs also do to lesser degrees). https://t.co/JfPmqyZiOg https://t.co/bVsUWh1218