Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
T
tqchenml
@tqchenml
πŸ“…
Jun 23, 2026
11d ago
πŸ†”02734099

We taught a brand-new mini-series this year at @SCSatCMU on Modern GPU Programming for ML Systems, as part of the ML Systems course, touching on fun questions like what data layout swizzling is, how to use 3D TMA, and state-of-the-art Blackwell programming. We released a curated online book based on the materials: https://t.co/5ZJg2lySNO check it out

Media 1Media 2
πŸ–ΌοΈ Media
S
SakanaAILabs
@SakanaAILabs
πŸ“…
Jun 22, 2026
12d ago
πŸ†”84581023

How does it work? Sakana Fugu is itself an LLM, trained to call various LLMs in an agent pool, including instances of itself recursively. Fugu dynamically orchestrates the world's best models to tackle complex, multi-step tasks. As shown in this figure, Fugu is a multi-agent system that behaves like a single model. You send a request to one endpoint, and Fugu decides how to handle it internally. Fugu manages model selection, delegation, verification, and synthesis automatically. It solves tasks directly when that is enough, or coordinates a team of expert models when a problem calls for more. The complexity of a multi-agent system never reaches your code. At launch, Sakana Fugu comes in two models accessed via a single OpenAI-compatible API: β€’ Fugu balances strong performance with low latency for everyday work. It fits naturally into tools like Codex for coding, as well as chatbots and interactive services. You can also opt specific agents out of its pool for data compliance. β€’ Fugu Ultra is our flagship model tuned for maximum answer quality on hard, multi-step problems. It coordinates a deeper pool of expert agents for demanding work like AI research, cybersecurity analysis, and patent investigations.

Media 1
πŸ–ΌοΈ Media
πŸ”jeremyphoward retweeted
D
Dmytro Dzhulgakov
@dzhulgakov
πŸ“…
Jun 25, 2026
9d ago
πŸ†”38384918

you may have heard that glm-5.2 at 280 token/s is cool, how about 318 and we still have room to go https://t.co/4g0dI6CEzd

Media 1
❀️662
likes
πŸ”29
retweets
πŸ–ΌοΈ Media
E
emollick
@emollick
πŸ“…
Jun 29, 2026
5d ago
πŸ†”62127927

I took the new AA-Briefcase scores from @ArtificialAnlys (basically having the AI do multi-week consulting gigs with a lot of complexity) and graphed the frontier curve for open and closed models: 1) Surprise, rapid gains! 2) The open weights gap is clear https://t.co/a1QGQC2hey https://t.co/bqJHA0WU0j

Media 1Media 2
πŸ–ΌοΈ Media
H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Jun 29, 2026
5d ago
πŸ†”97913720

Last week, we released HANDBOOK.md: a benchmark for long-context agentic instruction following. HANDBOOK drops an agent into a live company environment with files (PDFs, Excel, Word docs…), tools (email, Slack, Jira, calendar…), and a dense corporate handbook (up to 124 pages!). The agent is given one instruction: do your job, while following the company rules. Every frontier model broke them over 75% of the time. They fired employees without authorization... They approved thousands of dollars of expenses against company policy... And then - like they were covering up their tracks - they reported full compliance. HANDBOOK.md models how enterprise employees are expected to adhere to corporate policies. Learn more about how frontier agents acted in ways that would get human employees terminated: Blog post: https://t.co/zJ7zVpDOfH Github: https://t.co/zjwood6H6s Benchmark Leaderboard: https://t.co/lI3F0MwkCc

Media 1
πŸ–ΌοΈ Media
πŸ”ai_fast_track retweeted
A
Qwen
@Alibaba_Qwen
πŸ“…
Jun 24, 2026
10d ago
πŸ†”42719867
⭐0.34

πŸ“£πŸ“£ Meet Qwen-AgentWorld β€” a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation. πŸ€” LLMs are trained to be better agents β€” better at acting in environments. But nobody has trained them to model the environments themselves. πŸ—ΊοΈ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes: 1️⃣ Build a foundation model for environment simulation β€” outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench 2️⃣ Investigate how world modeling enhances agent training: πŸ”¬ Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments 🧠 Learning to predict environments (LWM warm-up) makes agents stronger β€” remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning πŸ“‘ Paper: https://t.co/Jx2l5RKq71 πŸ“– Blog: https://t.co/7tVcKyhsx2 πŸ’» GitHub: https://t.co/B5Lvb1UZCn πŸ€— HuggingFace: https://t.co/Kw3QBL1TM5 🧩 ModelScope: https://t.co/YBnGYgMWWI

❀️4,705
likes
πŸ”783
retweets
πŸ”ylecun retweeted
R
Randall Balestriero
@randall_balestr
πŸ“…
Jul 02, 2026
2d ago
πŸ†”40998064
⭐0.38

Oops, SIGReg did it again! Large scale (CC12M->Datacomp-L) vision-language JEPA pretraining beats CLIP and SigLIP objectives! Thanks to SIGReg, our LeVLJEPA has no collapse, no EMA, no stop-gradient, no negatives, no problem! Checkpoints/demo are live: https://t.co/wz6S6tYB6p

❀️163
likes
πŸ”25
retweets
S
SarvamForDevs
@SarvamForDevs
πŸ“…
Jul 02, 2026
2d ago
πŸ†”53989608

AI Infra Day | SGLang Γ— Sarvam with Hugging Face India's AI infra community is coming together. A day of deep-dives and technical discussions with researchers and engineers building the future of AI infrastructure. Bangalore | 11th July | 12:00–4:00PM Register Now: https://t.co/StSC0dxac9

Media 1
πŸ–ΌοΈ Media
M
mgoin_
@mgoin_
πŸ“…
Jul 02, 2026
2d ago
πŸ†”39212825

GLM 5.2 DSpark preview is here! ✨ https://t.co/DQOMYEiY1o This is the first DSpark speculator for a non-DeepSeek frontier model, trained with Speculators and running on vLLM nightly for ~1.5Γ— faster decode for GLM-5.2-FP8 on 4Γ—B300. Stronger checkpoints to come!

@mgoin_ β€’ Mon Jun 29 14:27

this means GLM 5.2 DSpark on the way btw

Media 1
πŸ–ΌοΈ Media
B
ben_burtenshaw
@ben_burtenshaw
πŸ“…
Jul 02, 2026
2d ago
πŸ†”75706032

the wildest part of this intelligence per watt paper (71.3% of chat queries could be local) is that the model is only a gpt-oss 20b. which is about a year old! compared to the current batch of small moe models (gemma 4, liquid LFM, Qwen-3.6, etc.) this is nothing. https://t.co/d4Oem5d35t

Media 1
πŸ–ΌοΈ Media
R
randall_balestr
@randall_balestr
πŸ“…
Jul 01, 2026
3d ago
πŸ†”00590573

The Sensorimotor World Model (https://t.co/K5iWbk7Izs): a deep dive into the role of inverse dynamics modeling as an anti-collapse regularization for JEPAs. IDM is weaker than SIGReg as it doesn't have to fill the space--it only captures what is affected by the agent's actions🧡 https://t.co/kdnVGbhkht

Media 1
πŸ–ΌοΈ Media
P
PyTorch
@PyTorch
πŸ“…
Jun 30, 2026
3d ago
πŸ†”19766866

PyTorch-native NeMo AutoModel handles transformer pretraining in @nvidia's end-to-end workflow for building a transaction foundation model. The workflow combines GPU-accelerated data processing and tokenization, decoder-only model pretraining, embedding extraction, and XGBoost fraud classification. On the synthetic @IBM TabFormer dataset, combining raw features with learned embeddings increased Average Precision by 41.76% over the raw-feature baseline. πŸ”— Read the full post: https://t.co/DJvRP2K5Qp

Media 1
πŸ–ΌοΈ Media
S
SakanaAILabs
@SakanaAILabs
πŸ“…
Jun 22, 2026
12d ago
πŸ†”79462779

Use Case 1: Autonomous ML Research Can an AI autonomously improve another AI’s training recipe? We tasked Fugu Ultra with improving a small GPT model using AutoResearch. Over 14 hours on a single H100 GPU, Fugu ran > 100 experiments. It iteratively edited the training code, ran tests, and kept any changes that successfully lowered the validation error rate. Watch the animation. The callouts track every time Fugu Ultra autonomously discovered a new improvement across batch size, model depth, learning rates, and optimizer settings. We pitted Fugu against three frontier models (Gemini 3.1 Pro, Opus 4.8, and GPT 5.5). To keep the focus purely on agentic behavior rather than brand wars, we anonymized them as Models A, B, and C. The Results: β€’ Fugu Ultra (bold red) finished with the best mean performance (0.9774). β€’ Fugu Ultra also achieved the best single run of the entire experiment (0.9748), leading every single baseline. For long horizon, agentic ML research, using Fugu to dynamically orchestrate a pool of strong models significantly outperforms relying on any individual monolithic model.

Media 1
πŸ–ΌοΈ Media
K
karpathy
@karpathy
πŸ“…
Jun 30, 2026
3d ago
πŸ†”43921550
⭐0.40

@Etched Congrats!! I was impressed to learn about some of the engineering wizardry (e.g. *very* low voltage domains, cluster scale memory, ...) that goes into tokens/watt maxxing of state of the art LLMs at interactive tokens/sec/user. Esp fun and memorable is the idea that this is engineering at the "opposite" regime to that of power transmission lines: very low voltage high current (at tiny distances) vs. very high voltage & low current (at great distances). Looking forward to more!

R
randall_balestr
@randall_balestr
πŸ“…
Jun 30, 2026
3d ago
πŸ†”48630648

Can regularization based JEPA (e.g. SIGReg) scale and compete with SOTA foundation models (DINO)? Here is the answer: yes and with 10x less data. VISReg (slight variation of SIGReg) competes with DINOv2-LVD142M while only training on inet22k. Try it out: https://t.co/vBhrNAmFq6 https://t.co/XERFZEAE8t

@HaiyuWu1 β€’ Sat Jun 27 13:47

Working on world model or SSL? You definitely need to try our new work: VISReg! What does it achieve? πŸ’ͺ Strong collapse prevention: High gradient when embedding collapse ⚑ Friendly to scale training: Linear complexity to scaling factors 🧩 Easy to train: Similar to LeJEPA, it is

Media 1Media 2
πŸ–ΌοΈ Media
O
OpenAI
@OpenAI
πŸ“…
Jun 30, 2026
4d ago
πŸ†”74167294
⭐0.44

We’re introducing GeneBench-Pro, a research-level benchmark for a harder kind of AI progress: how well agents can navigate messy biological data, choose the right analysis path, and make judgment calls that real computational research depends on. https://t.co/AsilnnSxnE

S
stash_pomichter
@stash_pomichter
πŸ“…
Jul 01, 2026
3d ago
πŸ†”55007340

Announcing the first production robot navigation framework on $500 hardware Explore the world once β†’ your robot agent will relocalize and build a persistant, spatial memory across sessions SLAM, relocalization, loop closure, map i/o, planning, control No ROS. Open source. https://t.co/VCk9GvOrrM

πŸ–ΌοΈ Media
M
MiaAI_lab
@MiaAI_lab
πŸ“…
Jul 01, 2026
2d ago
πŸ†”86414623

I'm going to try the new @NVIDIAAI Nemotron-3-Nano-30B-A3B and compare it to Qwen 3.6 35B in agentic workflows. https://t.co/z9cnRBOo1c

Media 1
πŸ–ΌοΈ Media
C
cedric_chee
@cedric_chee
πŸ“…
Jun 28, 2026
6d ago
πŸ†”10740999

DeepSeek preparing release of DSpark, DFlash and Eagle draft models for Qwen3 and Gemma-4 variants https://t.co/2zdfL9XAkQ

Media 1
πŸ–ΌοΈ Media
E
emollick
@emollick
πŸ“…
Jun 22, 2026
12d ago
πŸ†”15227232
⭐0.38

I have been trying Sakana Fugu Ultra-high and, first, it is incredibly slow: my typical coding tests (shaders, interactive scenes) take 30 minutes to run And the results are... fine. It does not match Fable in real use. Its harbor is a good example: https://t.co/xVqulPBsQf

@SakanaAILabs β€’ Mon Jun 22 01:00

Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API. Our β€˜Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls. Try it: https://t.co/hhO6qTawgb 🐑

X
xenovacom
@xenovacom
πŸ“…
Jun 25, 2026
9d ago
πŸ†”39707568

While we eagerly await Fable 5's return, our agentic WebGPU kernel optimization framework kept running. Opus 4.8 picked up where Fable left off, pushing Liquid AI's new LFM2.5 230M to an unbelievable 1,400 tok/s... running locally in your browser. Don't blink or you'll miss it. https://t.co/27WARZwTcD

@xenovacom β€’ Wed Jun 17 16:54

Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real. Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser. Agentic kernel optimization is the future of on-device inference

πŸ–ΌοΈ Media
B
BoWang87
@BoWang87
πŸ“…
Jun 30, 2026
4d ago
πŸ†”58254332
⭐0.42

Our team at Xaira was fortunate to have early access to test Claude Science (Operon). πŸ”₯πŸš€ We used it to add agentic loops to both virtual cell modeling and protein design workflows. A nice plus: Operon had already added our scGPT as one of the default skills for single-cell analysis πŸ™πŸ˜ŽπŸ”₯ This is the kind of product that actually understands how research works, not just chat with a model, but traceable artifacts, reproducible environments, and real scientific data connections. That's a big deal for computational biology.

@claudeai β€’ Tue Jun 30 17:02

Introducing Claude Science, a new app designed with every stage of research in mind. Artifacts traced to their code, environments managed on demand, and 60+ optional scientific databases that you can connect. Available now in beta. https://t.co/HKhLknxLJO

D
DrJimFan
@DrJimFan
πŸ“…
Jun 30, 2026
4d ago
πŸ†”56212902

Today, we give robots a /skills library that self-evolves and compounds indefinitely! Introducing ASPIRE: a robot solving its 100th task is no longer as clueless as solving its first. Coding agents observe multimodal sensory traces from simulation and real robots, launch an evolutionary search over control programs, and distill the best know-how into an ever-expanding library. ASPIRE is a new type of continual learning: "training" is skill refinement instead of gradient descent. "Trained model" is a repo of sensorimotor skills instead of floating weights. β€œDistributed training” is a panel of agents each practicing a different skill instead of sharded minibatches. Here's the beauty: ASPIRE gives the tired terms "sim2real transfer" and "cross-embodiment transfer" a whole new meaning. Bridging the sim-to-real gap is notoriously brutal. An end-to-end policy has to swallow both the visual shift (sim looks toyish next to a real camera) and the subtle contact physics it never quite gets right. ASPIRE sidesteps the mess, because it doesn't ship pixels or weights across the gap, but ships the know-how. The robot still has to practice in the real world, not zero-shot, but it gets there way faster because it isn't rediscovering the strategy from scratch. Same for going single-arm to bimanual hardware, which usually requires new data and retraining from zero. ASPIRE achieves up to ~10x cut in "transfer learning” tokens (yes, tokens are the new unit of *training* compute ;) Check out our gallery of 150+ tasks and 90+ skills the robots taught themselves, all on the website! Kind of wild that we can ship the "learned weights" as an HTML page rather than a GGUF. We'll open-source the full stack so your own robot library starts compounding from ours! Deep dive in thread:

πŸ–ΌοΈ Media
N
NielsRogge
@NielsRogge
πŸ“…
Jun 22, 2026
12d ago
πŸ†”63442218

New benchmark added to Papers with Code based on @giffmana's Schmidhubering 🫑 Check the SOTA for semi-supervised ImageNet (using 10% of the labels) here https://t.co/CXd4lLkhlG https://t.co/sGi68AIoqh

@giffmana β€’ Mon Jun 15 08:21

LLM community slowly rediscovering what we in vision found out over half a decade ago. MY SCHMIDHUBER MOMENT IS COMING! Source: S4L paper where i tuned the most sota 10% and 1% ImageNet baselines ever, by far. https://t.co/Cj10TYvpOP https://t.co/c1yNYFEXHk

Media 1Media 2
πŸ–ΌοΈ Media
πŸ”HamelHusain retweeted
B
Bryan Bischof fka Dr. Donut
@BEBischof
πŸ“…
Jun 29, 2026
4d ago
πŸ†”36272120
⭐0.32

I keep telling people that evals teach you how to build your product; either by showing you how it should work or that you're not building the right thing at all. Hamel wrote up what this means in practice.

❀️22
likes
πŸ”2
retweets
M
MaziyarPanahi
@MaziyarPanahi
πŸ“…
Jun 26, 2026
8d ago
πŸ†”78796704

Got GLM-5.2 running on my Mac Studio via llama.cpp, the reasoning behind all my medical agentic workflows. It orchestrates a swarm of tiny on-device OpenMed experts: oncology, meds, labs. No cloud, no rate limits, nobody can take it away. AI must be owned, not rented. https://t.co/pNpeOgYmFD

πŸ–ΌοΈ Media
B
benln
@benln
πŸ“…
Jul 01, 2026
2d ago
πŸ†”66030929
⭐0.30

Take Fable 5 for a spin in Cursor:

@cursor_ai β€’ Wed Jul 01 19:33

Claude Fable 5 is available again in Cursor. It leads all models on CursorBench, but is the most expensive per task.

V
victormustar
@victormustar
πŸ“…
Jun 22, 2026
12d ago
πŸ†”94349791
⭐0.38

LTX-2 trainer is a huge deal. I tried it to add water to the podracers scene using their demo water-sim fine-tune (it's a fine-tune whose only goal is to add water :D) I think we're going to see film productions that don't use general models. They'll train their own fine-tunes, built for exactly what they need and consistent between shots for long content.

I
ivanleomk
@ivanleomk
πŸ“…
Jun 26, 2026
7d ago
πŸ†”15331319
⭐0.30

Easily the biggest unlock for vibe coding 1

@GoogleAIStudio β€’ Fri Jun 26 20:42

describing an aesthetic in a prompt can be tough, so we made a button for it introducing Design Variations instantly generate, explore, and apply beautiful new UI layouts with a single click try it today in AI Studio https://t.co/cVnR4hjJZe https://t.co/JEyuImiWcP

H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Jul 01, 2026
3d ago
πŸ†”46675102
⭐0.46

Deeper Instructions, Stronger Generalization: Training on ComplexConstraints Given the chance, a model will reward hack however it can: finding the laziest path that satisfies a grader, whether or not that path reflects what you actually wanted. If the grader can be satisfied by a surface trick, that trick is what the model learns. Most instruction-following benchmarks are full of surface tricks. "Stay under 300 words," "avoid commas", a model can satisfy those by scanning the output text, without understanding the task at all. ComplexConstraints, our frontier instruction-following benchmark, is built so there's no lazy path: its constraints fire only under certain conditions, depend on the outputs of earlier steps, require planning ahead, and are often left unstated. You can't satisfy "don't assign anyone with a religious dietary restriction to pork prep" by pattern-matching. You have to understand who's who and reason through many interdependent requirements at once. We post-trained Qwen3-4B on 1,000 of these tasks, using expert-written rubrics directly as the RL reward. The results: β†’ +15.5pp on the held-out set, reaching parity with a model 60x larger β†’ the gains transferred to two external benchmarks the model never trained on: +8.4pp on Meta's AdvancedIF and +10.1pp on MultiChallenge β†’ the largest gains landed on multi-turn abilities, even though every training example was single-turn Think about that last result. When the only way to score is to actually track many interdependent requirements, the model learns that skill rather than a shortcut, and the skill is the same whether the requirements arrive in one complex prompt or accumulate over nine turns. So it showed up on tasks the model was never trained on. A reward signal is only as good as the thought behind it, and not all rubrics are created the same. Research Blog: https://t.co/bUJPcoNFrX Research Paper: https://t.co/zQxE0TN260

S
silverbottlep
@silverbottlep
πŸ“…
Jul 03, 2026
1d ago
πŸ†”00281454

Accepted to #ECCV2026! πŸŽ‰ We've also released the code, it should work like a charm. If it doesn't, feel free to poke @roodiiiiiiiii πŸ˜„ https://t.co/t5M0J7S1GR

@_akhaliq β€’ Tue Mar 24 18:11

Group3D MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection paper: https://t.co/8NVynfAm2u https://t.co/RzkYdEKhRk

Media 1
πŸ–ΌοΈ Media
N
nickarner
@nickarner
πŸ“…
Jun 30, 2026
4d ago
πŸ†”61711489

Got the model converted to CoreML and working on iOS; will open source soon! https://t.co/6xo8VetVGT

@ndstudio β€’ Mon Jun 29 16:55

Today, we are releasing Rampart: a 14.7MB machine learning model designed to protect citizens’ privacy by redacting personal information directly in your browser before it gets sent to any server

πŸ–ΌοΈ Media