@paperswithdata
➗Lila: a mathematical reasoning benchmark consisting 23 tasks covering mathematical abilities, language format, language diversity and external knowledge. It's an extension of 20 datasets by collecting Python programs’ task instructions and solutions. https://t.co/DbaL9KSsNB https://t.co/A46re8Bm9g