@arankomatsuzaki
Inverse IFEval: a new bench testing whether LLMs can unlearn stubborn training habits and follow counter-intuitive instructions. - 8 challenge types (e.g. counterfactuals, flawed text) - 1k Qs + 23 domains - Reveals LLMs’ cognitive inertia and need for adaptability https://t.co/CewuwI4h2W