Your curated collection of saved posts and media
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. https://t.co/YCvOwwjOzF Part code, part sci-fi, and a pinch of psychosis :)
If you give GPT-5.4 a raw dump of the GPT-2 weights and ask for a <5000 byte C program to inference it, GPT-5.4 succeeds in under 15 minutes! I remember working on a similar exercise to compare results against a proprietary model in a previous paper - it took days!
New research from Microsoft. Phi-4-reasoning-vision-15B is a 15-billion parameter multimodal reasoning model that combines visual understanding with structured reasoning capabilities. As I have been saying, not every agent task needs a frontier model. Phi-4-reasoning-vision shows what's possible at 15B parameters. The report details how they trained a compact model that can reason over both text and images, targeting the sweet spot between capability and efficiency. Smaller reasoning models that handle vision are essential for practical agent deployments. Paper: https://t.co/cT2qeNImwi Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
New survey on agentic reinforcement learning for LLMs. LLM RL still treats models like sequence generators optimized in relatively narrow settings. However, real agents operate in open-ended, partially observable environments where planning, memory, tool use, reasoning, self-improvement, and perception all interact. This paper argues that agentic RL should be treated as its own landscape. It introduces a broad taxonomy that organizes the field across core agent capabilities and application domains, then maps the open-source environments, benchmarks, and frameworks shaping the space. If you are building agents, this is a strong paper worth checking out. Paper: https://t.co/qwXZNSp0ZA Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

DARE Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval https://t.co/Jeo3lOI9ru
DataClawπ¦datasets are first class on Hugging Face datasets!! Full visibility into the reasoning, tool calls and thousands of Claude Code and Codex sessions on the hub https://t.co/Ooq9cGciGt
@giffmana @ChengleiSi Itβs a commit that lowered val loss but *increased* the wall clock time so it gets rejected for being slower. must improve one, the other or both in this version. In my (new) autoresearch repo I have an alternative approach where you *always* train for eg 5 minutes and try to reduce val loss as much as possible. Possibly less confusing but has its own issues too.
Cursor with Kimi K2.5. Don't sleep on this combo. From a prompt to a personal HN feed in about ~60 seconds. The future of building is going to be so wild. With faster models, you can quickly iterate on more ideas, while improving quality. https://t.co/WOYFcCBqM7
Cursor with Kimi K2.5. Don't sleep on this combo. From a prompt to a personal HN feed in about ~60 seconds. The future of building is going to be so wild. With faster models, you can quickly iterate on more ideas, while improving quality. https://t.co/WOYFcCBqM7
GPU Puzzle #6: implement a kernel that adds 10 to each position of a vector. The solution is just 3 lines, and getting there requires understanding global thread indexing and what breaks when you skip the bounds check. π€ Full walkthrough in our new video: https://t.co/BPmZugk3q6
Mercury 2 is live ππ The worldβs first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and Iβm incredibly proud of what weβve built. Weβre just getting started on what diffusion can do for language.
π New paper: MambaβTransformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. β Only 25% of visual tokens π 3.8β4.2Γ faster prefilling (TTFT) π― Near-baseline accuracy (can exceed baseline with light finetuning) https://t.co/CJaCktyWCt
Iβm unreasonably excited about the fact that we wrote everything in Cute-DSL, embedded in Python. Installing / βcompilingβ now takes seconds instead of minutes / hours (looking at you, C++ templates). Try pip install fa4!
Blackwell has shifted the bottleneck for attention computation: It is no longer GEMM! Check out FA4 πππ
the FA4 integration into @huggingface Transformers is here https://t.co/48XPxmKbMv you will need to apply my proposed changes at the end for it to work if the owner hasn't done it already by the time you try it out
FlexAttention now has a FlashAttention-4 backend. FlexAttention has enabled researchers to rapidly prototype custom attention variantsβwith 1000+ repos adopting it and dozens of papers citing it. But users consistently hit a performance ceiling. Until now. We've added a FlashAttention-4 backend to FlexAttention on Hopper and Blackwell GPUs. PyTorch now auto-generates CuTeDSL score/mask modifications and JIT-instantiates FlashAttention-4 for your custom attention variant. The result: 1.2Γ to 3.2Γ speedups over Triton on compute-bound workloads. ποΈ Read our latest blog here: https://t.co/KVElBn4TEE No more choosing between flexibility and performance. hashtag#PyTorch hashtag#FlexAttention hashtag#FlashAttention hashtag#OpenSourceAI
We believe Cursor discovered a novel solution to Problem Six of the First Proof challenge, a set of math research problems that approximate the work of Stanford, MIT, Berkeley academics. Cursor's solution yields stronger results than the official, human-written solution. Notably, we used the same harness that built a browser from scratch a few weeks ago. It ran fully autonomously, without nudging or hints, for four days. This suggests that our technique for scaling agent coordination might generalize beyond coding.
Weβre announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models. https://t.co/27sxAHPgZM
New course: Build and Train an LLM with JAX, built in partnership with @Google and taught by @chrisachard. JAX is the open-source library behind Google's Gemini, Veo, and other advanced models. This short course teaches you to build and train a 20-million parameter language model from scratch using JAX and its ecosystem of tools. You'll implement a complete MiniGPT-style architecture from scratch, train it, and chat with your finished model through a graphical interface. Skills you'll gain: - Learn JAX's core primitives: automatic differentiation, JIT compilation, and vectorized execution - Build a MiniGPT-style LLM using Flax/NNX, implementing embedding and transformer blocks - Load a pretrained MiniGPT model and run inference through a chat interface Come learn this important software layer for building LLMs! https://t.co/wm6NZOGIKC
@BarathAnandan7 Still have to simplify GatedDeltaNet. For now, I took a bit of a shortcut here and use the GatedDeltaNet implementation from HF (to make sure it works as a refernece), but the rest is from scratch :)
I just implemented inference for Qwen3.5 0.8B based on https://t.co/W8bSA5TRiO, and successfully ran it on an M1 Pro. https://t.co/z0g1ynNlq3
π₯ Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. β‘οΈ Less VRAM. Faster inference. Run powerful models on limited-GPU setups. π Grab the weights + example code: Hugging Face: https://t.co/3MSb7miq68 ModelScope: https://t.co/LGHruBHP6Q
π₯ Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. β‘οΈ Less VRAM. Faster inference. Run powerful models on limited-GPU setups. π Grab the weights + example code: Hugging Face: https://t.co/3MSb7miq68 ModelScope: https://t.co/LGHruBHP6Q
One of the more interesting and thought provoking research papers I've seen in a while. A system for reading and reimplementing NeRF papers, and it seems to work very well. Pretty easy to extrapolate out from here to what CVPR 2027 papers will look like. https://t.co/gokzG27mIT https://t.co/jPpRESdKkd