@winglian
We've implemented a simple toolkit for fine-tuning powerful coding models using only RL with an entirely local, zero-setup sandboxed code interpreter. We found very promising results using a tiny fraction of data & training time vs SFT. Check out our blogpost for more details! ๐โฆ https://t.co/IMiRO3LS3C