@omarsar0
Ablations that isolate the wins RL improves both components individually; memory distillation further boosts the Answer Agent. Gains compound when paired with a stronger memory manager. It's interesting to also see that this approach generalizes well across different backbones. Very promising. Paper: https://t.co/uj1mkHYHSO