@arankomatsuzaki
RLHI: Reinforcement Learning from Human Interaction ⢠Moves beyond expert-annotated data ā learns from real user conversations ⢠Two methods: 1. User-Guided Rewrites 2. User-Based Rewards ⢠Outperforms baselines in personalization, instruction-following & reasoning https://t.co/JA6PR6it59