@emollick
@TheStalwart There is a sort of file drawer bias: AI benchmarks that don’t meaningfully benchmark performance are dropped, but mostly because they are either 0 or 100. The whole point of benchmarks is to measure something about AI performance. Verisimilitude is a different matter, though.