@iScienceLuvr
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization "We present DuPO, a dual learning-based preference optimization framework that generates annotation-free feedback via a generalized duality" "DuPO decomposes a primal task’s input into known and unknown components, then constructs its dual task to reconstruct the unknown part using the primal output and known information (e.g., reversing math solutions to recover hidden variables"