@JonathanBerant
On most games, performance is flat or even decreasing. What went wrong? Using classic NLP, we find AI models suffer from low discourse coherence, leading to weak performance despite relatively high information density - even when using twice as many tokens as humans. https://t.co/piUFPWyLnO