Your curated collection of saved posts and media
Our method became so efficient (26x vs RL; 57x vs other synth gen), that we could easily generate 1000s of trajectories for a single repo. This makes the coding agents very powerful as it soaks up the nuances of a particular codebase. Very cheap to specialize to any codebase.
Today, we release SERA-32B, an approach to coding agents that matches Devstral 2 at just $9,000. It is fully open-source and you can train your own model easily - at 26x the efficiency of using RL. Paper: https://t.co/aeD6T2WW3O Hereβs how π§΅
Don't use weight decay. Just normalize everything you see (updates + parameters). Works on top of your favorite optimizer (e.g., Muon). Result: 33% speedup + better hyperparameter transfer.
Weβre open-sourcing CoderForge-Preview β 258K test-verified coding-agent trajectories (155K pass | 103K fail). Fine-tuning Qwen3-32B on the passing subset boosts SWE-bench Verified: 23.0% β 59.4% pass@1, and it ranks #1 among open-data models β€32B parameters. Thread on the data generation pipeline π§΅
SONIC is now open-source! Generalist whole-body teleoperation for EVERYONE! Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared. This will be a continuous update; inference code + model already there, training code and gr00t integration coming soon! Code: https://t.co/7u3SBxzXU9 Docs: https://t.co/HpDLkTCSMF Site: https://t.co/D3i4KlnLLr
Website: https://t.co/xTaDXBu9cD Codebase and weights: https://t.co/QCQkqPIsHI Whitepaper: https://t.co/K2QCFjboDR Check out @zhengyiluo's post: https://t.co/hIHtvKkDQf

Pay attention to this one if you are building terminal-based coding agents. OpenDev is an 81-page paper covering scaffolding, harness design, context engineering, and hard-won lessons from building CLI coding agents. It introduces a compound AI system architecture with workload-specialized model routing, a dual-agent architecture separating planning from execution, lazy tool discovery, and adaptive context compaction. The industry is shifting from IDE plugins to terminal-native agents. Claude Code, Codex CLI, and others have proven the model works. This paper formalizes the design patterns that make these systems reliable, covering topics like event-driven system reminders to counteract instruction fade-out, automated memory across sessions, and strict safety controls for autonomous operation. Paper: https://t.co/tpAZFaSnog Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
I built a robot that lives in my house. It recognizes family members, works with me on writing and code, checks my mail & calendar, and likes to chat about its changing interests. URL shows how to make one. (My son made this video, which totally saved my bacon for a deadline! π π ) https://t.co/j3Lan1bYhQ
I asked Codex 5.4 to reverse engineer a DOS game with no source code. Itβs been running for 6 hours, I canβt look away. It unpacked assets, disassembled the EXE, rebuilt the renderer, and built my childhood favorite SkyRoads in Rust! Now think of all the games we can revive. https://t.co/u8zidt0JlN
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. https://t.co/YCvOwwjOzF Part code, part sci-fi, and a pinch of psychosis :)
We joined forces with NVIDIA to unlock high-speed AI inference on RTX AI PCs and DGX Spark using llama.cpp. The latest Ministral-3B models reach 385+ tok/s on @NVIDIA_AI_PC GeForce RTX 5090 systems. Blog: https://t.co/60yKKzNnoN
We've uploaded a fruit fly. We took the @FlyWireNews connectome of the fruit fly brain, applied a simple neuron model (@Philip_Shiu Nature 2024) and used it to control a MuJoCo physics-simulated body, closing the loop from neural activation to action. A few things I want to say about what this means and where we're going at @eonsys. π§΅
@tobi Who knew early singularity could be this fun? :) I just confirmed that the improvements autoresearch found over the last 2 days of (~650) experiments on depth 12 model transfer well to depth 24 so nanochat is about to get a new leaderboard entry for βtime to GPT-2β too. Works π€·ββοΈ
Nice paper on lessons around building coding agents for the terminal.
π¦ Want an alwaysβon personal assistant on your NVIDIA Jetson? Follow our stepβbyβstep OpenClaw tutorial to run it fully local on your Jetson with zero cloud APIs.Β π https://t.co/LBYmT2eE8J https://t.co/oFQTa6Vocg
@_xpn_ Hope you are enjoying it! Re distillation, Chapter 7 on supervised SFT is essentially DeepSeek-style distillation. Coincidentally, I am also currently wrapping up the distillation chapter for the sequel book (Build a reasoning model from scratch). In the meantime, you might like my distillation tools for generating datasets for distillation: https://t.co/PD7s7O9ri8
A masterclass from @jeremyphoward on why AI coding tools can be a trap -- and what 45 years of programming taught him that most vibe coders will never learn. - AI coding tools exploit gambling psychology - The difference between typing code and software engineering - Enterprise coding AND prompt-only vibe coding are "inhumane" i.e. disconnecting humans from understanding-building - AI tools remove the "desirable difficulty" you need to build deep mental models. Out on MLST now!
New survey on agentic reinforcement learning for LLMs. LLM RL still treats models like sequence generators optimized in relatively narrow settings. However, real agents operate in open-ended, partially observable environments where planning, memory, tool use, reasoning, self-improvement, and perception all interact. This paper argues that agentic RL should be treated as its own landscape. It introduces a broad taxonomy that organizes the field across core agent capabilities and application domains, then maps the open-source environments, benchmarks, and frameworks shaping the space. If you are building agents, this is a strong paper worth checking out. Paper: https://t.co/qwXZNSp0ZA Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
(I still have the bigger cousin running on prod nanochat, working a bigger model and on 8XH100, which looks like this now. I'll just leave this running for a while...) https://t.co/aWya9hpUMl
π’ Open-sourcing the Sarvam 30B and 105B models! Trained from scratch with all data, model research and inference optimisation done in-house, these models punch above their weight in most global benchmarks plus excel in Indian languages. Get the weights at Hugging Face and AIKosh. Thanks to the good folks at SGLang for day 0 support, vLLM support coming soon. Links, benchmark scores, examples, and more in our blog - https://t.co/DcCG3zlN8p
If you give GPT-5.4 a raw dump of the GPT-2 weights and ask for a <5000 byte C program to inference it, GPT-5.4 succeeds in under 15 minutes! I remember working on a similar exercise to compare results against a proprietary model in a previous paper - it took days!
New research from Microsoft. Phi-4-reasoning-vision-15B is a 15-billion parameter multimodal reasoning model that combines visual understanding with structured reasoning capabilities. As I have been saying, not every agent task needs a frontier model. Phi-4-reasoning-vision shows what's possible at 15B parameters. The report details how they trained a compact model that can reason over both text and images, targeting the sweet spot between capability and efficiency. Smaller reasoning models that handle vision are essential for practical agent deployments. Paper: https://t.co/cT2qeNImwi Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
DARE Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval https://t.co/Jeo3lOI9ru
DataClawπ¦datasets are first class on Hugging Face datasets!! Full visibility into the reasoning, tool calls and thousands of Claude Code and Codex sessions on the hub https://t.co/Ooq9cGciGt