@dair_ai
Can coding-agents replicate scientific ML papers? We know this is possible because we can already do this @dair_ai. Still a great read. So they try to replicate an ML paper from its materials alone. They use a coding-agent skill that turns each selected paper claim into a target with recorded evidence. The agent reconstructs the method, runs experiments, links outputs to provenance, compares against the paper's claims, and passes validation checks before completion. Completion depends on workspace evidence, not on the agent's final message. Across twelve runs over four scientific ML papers, all twelve workspaces pass the completion gate and all 158 recorded targets are matched with report coverage. Yet repeated runs still differ in how papers are split into targets, in numerical fidelity, in elapsed time, and in the rules used to accept evidence. Basically the completion becomes reproducible even when the path is not. Paper: https://t.co/IkhknqCUiv Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c