@BlancheMinerva
These findings can’t be squared with the claims Sakana made in their marketing materials such as “near human accuracy” in reviewing papers (when tested on 10 papers, it had a 50% precision, 20% recall, and 28.6% F1-score) or the ability to write and run code without human input.