@omarsar0
Results State-of-the-art across three medical QA benchmarks. On ExplainCPE, MIRAGE reaches 84.8% accuracy and the best GPT-4o ranking; similar gains appear on GenMedGPT-5k and CMCQA. Robustness holds when swapping in DeepSeek-R1-32B as the backbone. Human evals on GenMedGPT-5k also prefer MIRAGE.