@rasbt
As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (https://t.co/SwxBs8RsTq) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL… https://t.co/JnTmDNiVgg