@seb_ruder
Multilingual LLMs ≠ Native speakers Evaluating native-like generation across language varieties is hard, subjective, and inconsistent. MENLO provides: – A structured evaluation protocol – Human preference data – Model-based reward modeling for 47 languages https://t.co/zEBXazlUUG