@iScienceLuvr
SALMON: Self-Alignment with Principle-Following Reward Models abs: https://t.co/n8ikVW3WUh code: https://t.co/UtUtZilnaH This paper from IBM proposes a new RLAIF paradigm that uses the LLM to judge responses based on specific principles, and trains a reward model conditional on the principles. Their 70b model surpasses Llama2-70b-chat on various benchmarks.