Your curated collection of saved posts and media

Showing 24 posts Β· last 30 days Β· by score
K
karpathy
@karpathy
πŸ“…
Mar 07, 2026
2d ago
πŸ†”18931079

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. https://t.co/YCvOwwjOzF Part code, part sci-fi, and a pinch of psychosis :)

Media 1
πŸ–ΌοΈ Media
M
markchen90
@markchen90
πŸ“…
Mar 07, 2026
2d ago
πŸ†”64670486
⭐0.40

If you give GPT-5.4 a raw dump of the GPT-2 weights and ask for a <5000 byte C program to inference it, GPT-5.4 succeeds in under 15 minutes! I remember working on a similar exercise to compare results against a proprietary model in a previous paper - it took days!

πŸ”omarsar0 retweeted
O
elvis
@omarsar0
πŸ“…
Mar 06, 2026
3d ago
πŸ†”40912429
⭐0.38

New research from Microsoft. Phi-4-reasoning-vision-15B is a 15-billion parameter multimodal reasoning model that combines visual understanding with structured reasoning capabilities. As I have been saying, not every agent task needs a frontier model. Phi-4-reasoning-vision shows what's possible at 15B parameters. The report details how they trained a compact model that can reason over both text and images, targeting the sweet spot between capability and efficiency. Smaller reasoning models that handle vision are essential for practical agent deployments. Paper: https://t.co/cT2qeNImwi Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

❀️271
likes
πŸ”50
retweets
O
omarsar0
@omarsar0
πŸ“…
Mar 07, 2026
2d ago
πŸ†”17153248

New survey on agentic reinforcement learning for LLMs. LLM RL still treats models like sequence generators optimized in relatively narrow settings. However, real agents operate in open-ended, partially observable environments where planning, memory, tool use, reasoning, self-improvement, and perception all interact. This paper argues that agentic RL should be treated as its own landscape. It introduces a broad taxonomy that organizes the field across core agent capabilities and application domains, then maps the open-source environments, benchmarks, and frameworks shaping the space. If you are building agents, this is a strong paper worth checking out. Paper: https://t.co/qwXZNSp0ZA Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

Media 1Media 2
πŸ–ΌοΈ Media
_
_akhaliq
@_akhaliq
πŸ“…
Mar 06, 2026
3d ago
πŸ†”71764808

DARE Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval https://t.co/Jeo3lOI9ru

Media 1
πŸ–ΌοΈ Media
C
calebfahlgren
@calebfahlgren
πŸ“…
Mar 06, 2026
3d ago
πŸ†”00410505

DataClaw🦞datasets are first class on Hugging Face datasets!! Full visibility into the reasoning, tool calls and thousands of Claude Code and Codex sessions on the hub https://t.co/Ooq9cGciGt

πŸ–ΌοΈ Media
K
karpathy
@karpathy
πŸ“…
Mar 07, 2026
2d ago
πŸ†”59225650
⭐0.38

@giffmana @ChengleiSi It’s a commit that lowered val loss but *increased* the wall clock time so it gets rejected for being slower. must improve one, the other or both in this version. In my (new) autoresearch repo I have an alternative approach where you *always* train for eg 5 minutes and try to reduce val loss as much as possible. Possibly less confusing but has its own issues too.

O
omarsar0
@omarsar0
πŸ“…
Mar 06, 2026
3d ago
πŸ†”13900871

Cursor with Kimi K2.5. Don't sleep on this combo. From a prompt to a personal HN feed in about ~60 seconds. The future of building is going to be so wild. With faster models, you can quickly iterate on more ideas, while improving quality. https://t.co/WOYFcCBqM7

πŸ–ΌοΈ Media
πŸ”dair_ai retweeted
O
elvis
@omarsar0
πŸ“…
Mar 06, 2026
3d ago
πŸ†”13900871
⭐0.34

Cursor with Kimi K2.5. Don't sleep on this combo. From a prompt to a personal HN feed in about ~60 seconds. The future of building is going to be so wild. With faster models, you can quickly iterate on more ideas, while improving quality. https://t.co/WOYFcCBqM7

❀️81
likes
πŸ”14
retweets
M
Modular
@Modular
πŸ“…
Mar 06, 2026
3d ago
πŸ†”30216289

GPU Puzzle #6: implement a kernel that adds 10 to each position of a vector. The solution is just 3 lines, and getting there requires understanding global thread indexing and what breaks when you skip the bounds check. πŸ€” Full walkthrough in our new video: https://t.co/BPmZugk3q6

Media 1
πŸ–ΌοΈ Media
πŸ”tri_dao retweeted
S
Stefano Ermon
@StefanoErmon
πŸ“…
Feb 24, 2026
13d ago
πŸ†”64520670
⭐0.34

Mercury 2 is live πŸš€πŸš€ The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.

❀️4,154
likes
πŸ”584
retweets
W
wonmin_byeon
@wonmin_byeon
πŸ“…
Mar 04, 2026
5d ago
πŸ†”46418709

πŸš€ New paper: Mamba–Transformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. βœ… Only 25% of visual tokens πŸš€ 3.8–4.2Γ— faster prefilling (TTFT) 🎯 Near-baseline accuracy (can exceed baseline with light finetuning) https://t.co/CJaCktyWCt

Media 1
πŸ–ΌοΈ Media
T
tri_dao
@tri_dao
πŸ“…
Mar 05, 2026
4d ago
πŸ†”95894742
⭐0.34

I’m unreasonably excited about the fact that we wrote everything in Cute-DSL, embedded in Python. Installing / β€œcompiling” now takes seconds instead of minutes / hours (looking at you, C++ templates). Try pip install fa4!

M
MayankMish98
@MayankMish98
πŸ“…
Mar 05, 2026
4d ago
πŸ†”73374546
⭐0.34

Blackwell has shifted the bottleneck for attention computation: It is no longer GEMM! Check out FA4 πŸš€πŸš€πŸš€

S
StasBekman
@StasBekman
πŸ“…
Mar 05, 2026
4d ago
πŸ†”75487320

the FA4 integration into @huggingface Transformers is here https://t.co/48XPxmKbMv you will need to apply my proposed changes at the end for it to work if the owner hasn't done it already by the time you try it out

Media 1
πŸ–ΌοΈ Media
P
PyTorch
@PyTorch
πŸ“…
Mar 05, 2026
4d ago
πŸ†”99381376

FlexAttention now has a FlashAttention-4 backend. FlexAttention has enabled researchers to rapidly prototype custom attention variantsβ€”with 1000+ repos adopting it and dozens of papers citing it. But users consistently hit a performance ceiling. Until now. We've added a FlashAttention-4 backend to FlexAttention on Hopper and Blackwell GPUs. PyTorch now auto-generates CuTeDSL score/mask modifications and JIT-instantiates FlashAttention-4 for your custom attention variant. The result: 1.2Γ— to 3.2Γ— speedups over Triton on compute-bound workloads. πŸ–‡οΈ Read our latest blog here: https://t.co/KVElBn4TEE No more choosing between flexibility and performance. hashtag#PyTorch hashtag#FlexAttention hashtag#FlashAttention hashtag#OpenSourceAI

Media 1
πŸ–ΌοΈ Media
M
mntruell
@mntruell
πŸ“…
Mar 03, 2026
6d ago
πŸ†”47841336
⭐0.38

We believe Cursor discovered a novel solution to Problem Six of the First Proof challenge, a set of math research problems that approximate the work of Stanford, MIT, Berkeley academics. Cursor's solution yields stronger results than the official, human-written solution. Notably, we used the same harness that built a browser from scratch a few weeks ago. It ran fully autonomously, without nudging or hints, for four days. This suggests that our technique for scaling agent coordination might generalize beyond coding.

B
bertgodel
@bertgodel
πŸ“…
Mar 03, 2026
5d ago
πŸ†”11940087

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models. https://t.co/27sxAHPgZM

Media 1
πŸ–ΌοΈ Media
A
AndrewYNg
@AndrewYNg
πŸ“…
Mar 04, 2026
5d ago
πŸ†”78693378

New course: Build and Train an LLM with JAX, built in partnership with @Google and taught by @chrisachard. JAX is the open-source library behind Google's Gemini, Veo, and other advanced models. This short course teaches you to build and train a 20-million parameter language model from scratch using JAX and its ecosystem of tools. You'll implement a complete MiniGPT-style architecture from scratch, train it, and chat with your finished model through a graphical interface. Skills you'll gain: - Learn JAX's core primitives: automatic differentiation, JIT compilation, and vectorized execution - Build a MiniGPT-style LLM using Flax/NNX, implementing embedding and transformer blocks - Load a pretrained MiniGPT model and run inference through a chat interface Come learn this important software layer for building LLMs! https://t.co/wm6NZOGIKC

Media 2
πŸ–ΌοΈ Media
R
rasbt
@rasbt
πŸ“…
Mar 03, 2026
6d ago
πŸ†”40702354
⭐0.34

@BarathAnandan7 Still have to simplify GatedDeltaNet. For now, I took a bit of a shortcut here and use the GatedDeltaNet implementation from HF (to make sure it works as a refernece), but the rest is from scratch :)

J
JohnMai_Dev
@JohnMai_Dev
πŸ“…
Mar 03, 2026
6d ago
πŸ†”69881465

I just implemented inference for Qwen3.5 0.8B based on https://t.co/W8bSA5TRiO, and successfully ran it on an M1 Pro. https://t.co/z0g1ynNlq3

Media 1
+1 more
πŸ–ΌοΈ Media
A
Alibaba_Qwen
@Alibaba_Qwen
πŸ“…
Mar 03, 2026
6d ago
πŸ†”57616477

πŸ”₯ Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. ⚑️ Less VRAM. Faster inference. Run powerful models on limited-GPU setups. πŸ‘‡ Grab the weights + example code: Hugging Face: https://t.co/3MSb7miq68 ModelScope: https://t.co/LGHruBHP6Q

Media 1
πŸ–ΌοΈ Media
πŸ”ai_fast_track retweeted
A
Qwen
@Alibaba_Qwen
πŸ“…
Mar 03, 2026
6d ago
πŸ†”57616477
⭐0.38

πŸ”₯ Qwen 3.5 Series GPTQ-Int4 weights are live. Native vLLM & SGLang support. ⚑️ Less VRAM. Faster inference. Run powerful models on limited-GPU setups. πŸ‘‡ Grab the weights + example code: Hugging Face: https://t.co/3MSb7miq68 ModelScope: https://t.co/LGHruBHP6Q

❀️856
likes
πŸ”83
retweets
J
jon_barron
@jon_barron
πŸ“…
Mar 03, 2026
6d ago
πŸ†”31236246

One of the more interesting and thought provoking research papers I've seen in a while. A system for reading and reimplementing NeRF papers, and it seems to work very well. Pretty easy to extrapolate out from here to what CVPR 2027 papers will look like. https://t.co/gokzG27mIT https://t.co/jPpRESdKkd

Media 1
πŸ–ΌοΈ Media