@f_charton
Math transformers learn better when trained from repeated examples. New paper with @KempeLab https://t.co/aTIBfmqAtJ On 3 problems, modular multiplication, GCD and eigenvalues, for the same training budget, models trained from smaller datasets achieve better performances. 1/5 https://t.co/SLZd458wcq