@zihaozhou_
Recently, I saw the papers "rl on one sample" and "spurious reward". The findings are interesting, but they are indeed expected. In fact, the math solving ability of the Qwen models is really easy to activateโ๐๐ฏ๐๐ง ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐๐ง๐ฒ ๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ !๐คฃ I'd like to shareโฆ https://t.co/7wEmmZgnzA