@jaseweston
๐Diversity Aware RL (DARLING)๐ ๐: https://t.co/MH0tui34Cb - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks ๐งต1/5 https://t.co/AhEYPQwbkg