@omarsar0
The Illusion of Progress It's well known that there are caveats with benchmarks and metrics that measure LLM capabilities. It's no different for hallucination detection. "ROUGE fails to reliably capture true hallucination" Here are my notes: https://t.co/GFM9BPUDxh