Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
M
MiaAI_lab
@MiaAI_lab
πŸ“…
Jul 01, 2026
3d ago
πŸ†”86414623

I'm going to try the new @NVIDIAAI Nemotron-3-Nano-30B-A3B and compare it to Qwen 3.6 35B in agentic workflows. https://t.co/z9cnRBOo1c

Media 1
πŸ–ΌοΈ Media
C
code
@code
πŸ“…
Jun 25, 2026
9d ago
πŸ†”14424638

πŸ› οΈ Agent Customization Customize AI workflows with agents, instructions, skills, prompts, and hooks. πŸ”— https://t.co/ag5zffSLjd https://t.co/NSP4H9DwYj

Media 1
πŸ–ΌοΈ Media
B
benln
@benln
πŸ“…
Jul 01, 2026
3d ago
πŸ†”66030929
⭐0.30

Take Fable 5 for a spin in Cursor:

@cursor_ai β€’ Wed Jul 01 19:33

Claude Fable 5 is available again in Cursor. It leads all models on CursorBench, but is the most expensive per task.

C
cognition
@cognition
πŸ“…
Jul 01, 2026
3d ago
πŸ†”82432109
⭐0.32

Introducing Devin Security Swarm A more cost effective and accurate way to find security vulnerabilities in complex codebases, based on a new architecture: Agentic MapReduce.

R
random_walker
@random_walker
πŸ“…
Jul 01, 2026
3d ago
πŸ†”69719374

πŸ“’ 1) We have a few papers that advance the state of the art of AI agent evaluation. Details and links in Stephan's post. 2) AI agent evaluation has quickly become a distinct discipline. We're working on a paper titled "Emerging trends in AI agent evaluation" that extracts best practices for this community. 3) I'm giving an invited talk at ICML, addressing anxiety about supposedly imminent Recursive Self Improvement and the question of what will remain for humans to work on (especially scientists, researchers, software engineers). I hope to make it provocative but cautiously optimistic. https://t.co/rYHlxPGEXY (I also plan to share the ideas from the talk as essays on the AI as Normal Technology newsletter.)

@steverab β€’ Wed Jul 01 13:42

πŸ“£ I'll be in Seoul next week to present one main conference paper and four workshop papers at ICML! I'll also be on a panel at the https://t.co/D3wwI18H7o alignment workshop! Reach out if you are around and want to chat about uncertainty, reliability, or AI evals!😊 Details⬇️ πŸ“„P

Media 1
πŸ–ΌοΈ Media
E
emollick
@emollick
πŸ“…
Jun 28, 2026
6d ago
πŸ†”47036858
⭐0.38

It is worth being very, very careful about how you are approaching routing, especially when the systems are primarily tested on verifiable IT benchmarks, which may lead you to overestimate the ability of weaker models.

H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Jul 01, 2026
3d ago
πŸ†”46675102
⭐0.46

Deeper Instructions, Stronger Generalization: Training on ComplexConstraints Given the chance, a model will reward hack however it can: finding the laziest path that satisfies a grader, whether or not that path reflects what you actually wanted. If the grader can be satisfied by a surface trick, that trick is what the model learns. Most instruction-following benchmarks are full of surface tricks. "Stay under 300 words," "avoid commas", a model can satisfy those by scanning the output text, without understanding the task at all. ComplexConstraints, our frontier instruction-following benchmark, is built so there's no lazy path: its constraints fire only under certain conditions, depend on the outputs of earlier steps, require planning ahead, and are often left unstated. You can't satisfy "don't assign anyone with a religious dietary restriction to pork prep" by pattern-matching. You have to understand who's who and reason through many interdependent requirements at once. We post-trained Qwen3-4B on 1,000 of these tasks, using expert-written rubrics directly as the RL reward. The results: β†’ +15.5pp on the held-out set, reaching parity with a model 60x larger β†’ the gains transferred to two external benchmarks the model never trained on: +8.4pp on Meta's AdvancedIF and +10.1pp on MultiChallenge β†’ the largest gains landed on multi-turn abilities, even though every training example was single-turn Think about that last result. When the only way to score is to actually track many interdependent requirements, the model learns that skill rather than a shortcut, and the skill is the same whether the requirements arrive in one complex prompt or accumulate over nine turns. So it showed up on tasks the model was never trained on. A reward signal is only as good as the thought behind it, and not all rubrics are created the same. Research Blog: https://t.co/bUJPcoNFrX Research Paper: https://t.co/zQxE0TN260

R
rseroter
@rseroter
πŸ“…
Jun 22, 2026
12d ago
πŸ†”90446193

"That is the difference between using a coding agent and engineering an autonomous coding system. One gives you a conversation. The other gives you a harness." https://t.co/47NWbraF3G < I liked the descriptions and visuals from @omarsar0 here. Very understandable! https://t.co/nIthf99EMB

Media 1
πŸ–ΌοΈ Media
P
PyTorch
@PyTorch
πŸ“…
Jun 25, 2026
9d ago
πŸ†”92803948

One runtime, multiple GPU architectures, and zero vendor-specific model code. In this blog post, the TokenSpeed team @lightseekorg introduces TokenSpeed-Kernel, a portable, high-performance kernel system built for modern LLM inference. Using GPT-OSS 120B as a case study, they show how specialized kernels for @AIatAMD and @NVIDIAAI GPUs can seamlessly coexist behind a common API. This unified approach delivers up to 3.6x higher throughput on the AMD MI355X, all without requiring any changes to the underlying model logic. Link to blog in comments section πŸ‘‡

Media 1
πŸ–ΌοΈ Media
A
ArmSoftwareDev
@ArmSoftwareDev
πŸ“…
Jun 29, 2026
5d ago
πŸ†”99116750

Deploying AI models at the edge comes with a different set of challenges. These hands-on Jupyter labs walk you through usingΒ ExecuTorch to deploy and optimize @PyTorch models on Arm CPUs and NPUs, with examples you can run on hardware including Raspberry Pi.Β https://t.co/mJv4hbYFUZ

Media 1
πŸ–ΌοΈ Media
S
StabilityAI
@StabilityAI
πŸ“…
Jul 01, 2026
2d ago
πŸ†”27668193

Most AI audio models have never heard a maqam. Team Motif fine-tuned Stable Audio 3.0 on Arabic maqam, built an Ableton plugin for microtonal style transfer, and won our Stable Audio 3.0 Challenge at Music Hackspace running locally on device. Watch Jad Al Masri break it down πŸ‘‡

πŸ–ΌοΈ Media
πŸ”arankomatsuzaki retweeted
O
Overworld
@overworld_ai
πŸ“…
Jul 01, 2026
3d ago
πŸ†”51968222
⭐0.38

The Waypoint-1.5 technical paper is now live. Waypoint-1.5 is a real-time video diffusion world model designed to run on consumer GPUs, bringing interactive world models closer to practical, accessible deployment. https://t.co/U04x1YEwhF

❀️166
likes
πŸ”19
retweets
G
GregKamradt
@GregKamradt
πŸ“…
Jun 30, 2026
3d ago
πŸ†”80347586

.@tufalabs just open sourced their 1st place notebook πŸ‘€ https://t.co/tLs8aNmJ7P

@ β€’

Media 1
πŸ–ΌοΈ Media
A
askalphaxiv
@askalphaxiv
πŸ“…
Jun 27, 2026
7d ago
πŸ†”45859097

Sakana Fugu Technical Report Instead of training one larger model, Sakana AI trains an orchestrator that reads each query and dynamically routes or composes GPT-5.5, Gemini-3.1-Pro, Claude Opus 4.8 and other agents into query-specific workflows. With Fugu being the fast router, and Fugu-Ultra being the deep multi-agent conductor, trained with SFT, evolutionary strategies and GRPO to build adaptive scaffolds. The idea is to have the model pick GPT for math, Gemini for science and recall, Opus for debugging, then synthesize them when no single agent is best. This router is able to get SoTA results across SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, CharXiv and more, demonstrating the potential of orchestration being a practical alternative beyond training.

Media 1
πŸ–ΌοΈ Media
S
stash_pomichter
@stash_pomichter
πŸ“…
Jul 01, 2026
3d ago
πŸ†”55007340

Announcing the first production robot navigation framework on $500 hardware Explore the world once β†’ your robot agent will relocalize and build a persistant, spatial memory across sessions SLAM, relocalization, loop closure, map i/o, planning, control No ROS. Open source. https://t.co/VCk9GvOrrM

πŸ–ΌοΈ Media
O
OpenAI
@OpenAI
πŸ“…
Jun 30, 2026
4d ago
πŸ†”74167294
⭐0.44

We’re introducing GeneBench-Pro, a research-level benchmark for a harder kind of AI progress: how well agents can navigate messy biological data, choose the right analysis path, and make judgment calls that real computational research depends on. https://t.co/AsilnnSxnE

K
karpathy
@karpathy
πŸ“…
Jun 30, 2026
3d ago
πŸ†”43921550
⭐0.40

@Etched Congrats!! I was impressed to learn about some of the engineering wizardry (e.g. *very* low voltage domains, cluster scale memory, ...) that goes into tokens/watt maxxing of state of the art LLMs at interactive tokens/sec/user. Esp fun and memorable is the idea that this is engineering at the "opposite" regime to that of power transmission lines: very low voltage high current (at tiny distances) vs. very high voltage & low current (at great distances). Looking forward to more!

T
tri_dao
@tri_dao
πŸ“…
Jun 30, 2026
4d ago
πŸ†”81891525
⭐0.36

If you ever wondered about how how open/closed model makers and inference providers make economic sense, this is the piece to read

@vipulved β€’ Mon Jun 29 01:26

https://t.co/TIeuZQUj5D

N
NielsRogge
@NielsRogge
πŸ“…
Jun 22, 2026
12d ago
πŸ†”63442218

New benchmark added to Papers with Code based on @giffmana's Schmidhubering 🫑 Check the SOTA for semi-supervised ImageNet (using 10% of the labels) here https://t.co/CXd4lLkhlG https://t.co/sGi68AIoqh

@giffmana β€’ Mon Jun 15 08:21

LLM community slowly rediscovering what we in vision found out over half a decade ago. MY SCHMIDHUBER MOMENT IS COMING! Source: S4L paper where i tuned the most sota 10% and 1% ImageNet baselines ever, by far. https://t.co/Cj10TYvpOP https://t.co/c1yNYFEXHk

Media 1Media 2
πŸ–ΌοΈ Media
πŸ”GaryMarcus retweeted
D
Dongyang Fan
@dyfan22
πŸ“…
Jun 23, 2026
11d ago
πŸ†”95438532
⭐0.32

HalluHard update: We’ve added GLM-5.2, using adaptive thinking with maximum reasoning effort, to our leaderboard. Despite its impressive performance on other benchmarks, GLM-5.2 still hallucinates frequently on our challenging multiturn benchmark. https://t.co/xbppFeo7Pd

❀️139
likes
πŸ”13
retweets
C
cedric_chee
@cedric_chee
πŸ“…
Jun 28, 2026
6d ago
πŸ†”10740999

DeepSeek preparing release of DSpark, DFlash and Eagle draft models for Qwen3 and Gemma-4 variants https://t.co/2zdfL9XAkQ

Media 1
πŸ–ΌοΈ Media
N
nickarner
@nickarner
πŸ“…
Jun 30, 2026
4d ago
πŸ†”61711489

Got the model converted to CoreML and working on iOS; will open source soon! https://t.co/6xo8VetVGT

@ndstudio β€’ Mon Jun 29 16:55

Today, we are releasing Rampart: a 14.7MB machine learning model designed to protect citizens’ privacy by redacting personal information directly in your browser before it gets sent to any server

πŸ–ΌοΈ Media
D
DrJimFan
@DrJimFan
πŸ“…
Jun 30, 2026
4d ago
πŸ†”56212902

Today, we give robots a /skills library that self-evolves and compounds indefinitely! Introducing ASPIRE: a robot solving its 100th task is no longer as clueless as solving its first. Coding agents observe multimodal sensory traces from simulation and real robots, launch an evolutionary search over control programs, and distill the best know-how into an ever-expanding library. ASPIRE is a new type of continual learning: "training" is skill refinement instead of gradient descent. "Trained model" is a repo of sensorimotor skills instead of floating weights. β€œDistributed training” is a panel of agents each practicing a different skill instead of sharded minibatches. Here's the beauty: ASPIRE gives the tired terms "sim2real transfer" and "cross-embodiment transfer" a whole new meaning. Bridging the sim-to-real gap is notoriously brutal. An end-to-end policy has to swallow both the visual shift (sim looks toyish next to a real camera) and the subtle contact physics it never quite gets right. ASPIRE sidesteps the mess, because it doesn't ship pixels or weights across the gap, but ships the know-how. The robot still has to practice in the real world, not zero-shot, but it gets there way faster because it isn't rediscovering the strategy from scratch. Same for going single-arm to bimanual hardware, which usually requires new data and retraining from zero. ASPIRE achieves up to ~10x cut in "transfer learning” tokens (yes, tokens are the new unit of *training* compute ;) Check out our gallery of 150+ tasks and 90+ skills the robots taught themselves, all on the website! Kind of wild that we can ship the "learned weights" as an HTML page rather than a GGUF. We'll open-source the full stack so your own robot library starts compounding from ours! Deep dive in thread:

πŸ–ΌοΈ Media
B
BoWang87
@BoWang87
πŸ“…
Jun 30, 2026
4d ago
πŸ†”58254332
⭐0.42

Our team at Xaira was fortunate to have early access to test Claude Science (Operon). πŸ”₯πŸš€ We used it to add agentic loops to both virtual cell modeling and protein design workflows. A nice plus: Operon had already added our scGPT as one of the default skills for single-cell analysis πŸ™πŸ˜ŽπŸ”₯ This is the kind of product that actually understands how research works, not just chat with a model, but traceable artifacts, reproducible environments, and real scientific data connections. That's a big deal for computational biology.

@claudeai β€’ Tue Jun 30 17:02

Introducing Claude Science, a new app designed with every stage of research in mind. Artifacts traced to their code, environments managed on demand, and 60+ optional scientific databases that you can connect. Available now in beta. https://t.co/HKhLknxLJO

R
randall_balestr
@randall_balestr
πŸ“…
Jun 30, 2026
3d ago
πŸ†”48630648

Can regularization based JEPA (e.g. SIGReg) scale and compete with SOTA foundation models (DINO)? Here is the answer: yes and with 10x less data. VISReg (slight variation of SIGReg) competes with DINOv2-LVD142M while only training on inet22k. Try it out: https://t.co/vBhrNAmFq6 https://t.co/XERFZEAE8t

@HaiyuWu1 β€’ Sat Jun 27 13:47

Working on world model or SSL? You definitely need to try our new work: VISReg! What does it achieve? πŸ’ͺ Strong collapse prevention: High gradient when embedding collapse ⚑ Friendly to scale training: Linear complexity to scaling factors 🧩 Easy to train: Similar to LeJEPA, it is

Media 1Media 2
πŸ–ΌοΈ Media
A
angelaqdai
@angelaqdai
πŸ“…
Jul 04, 2026
2h ago
πŸ†”68872759

πŸ“’WorldMesh is accepted to #ECCV2026, and we're releasing the code today! πŸŽ‰ Led by @mschneider456: navigable, multi-room 3D scenes from a text prompt, with a mesh scaffold conditioning image diffusion for global consistency + photorealistic detail πŸ‘‡ https://t.co/8fXCl2flIu https://t.co/Z1HkoO3s37

πŸ–ΌοΈ Media
X
xenovacom
@xenovacom
πŸ“…
Jun 25, 2026
9d ago
πŸ†”39707568

While we eagerly await Fable 5's return, our agentic WebGPU kernel optimization framework kept running. Opus 4.8 picked up where Fable left off, pushing Liquid AI's new LFM2.5 230M to an unbelievable 1,400 tok/s... running locally in your browser. Don't blink or you'll miss it. https://t.co/27WARZwTcD

@xenovacom β€’ Wed Jun 17 16:54

Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real. Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser. Agentic kernel optimization is the future of on-device inference

πŸ–ΌοΈ Media
U
UnslothAI
@UnslothAI
πŸ“…
Jun 23, 2026
11d ago
πŸ†”75564484

1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5 We gave 3 models the same prompt and compared one-shot outputs. The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s. Which output do you like best? GGUF: https://t.co/BMkxswdj5N https://t.co/UoXsCSh4Gn

@UnslothAI β€’ Thu Jun 18 12:40

GLM-5.2 can now be run locally!πŸ”₯ The 2-bit model retains ~82% accuracy after we shrunk it from 1.51TB to 238GB (-84% size). Run on a 256GB Mac or RAM/VRAM setups. GLM-5.2 is the strongest open model to date. Guide: https://t.co/bI7FeeKHDd GGUF: https://t.co/BMkxswdj5N https:/

Media 2
πŸ–ΌοΈ Media
I
ivanleomk
@ivanleomk
πŸ“…
Jun 26, 2026
7d ago
πŸ†”15331319
⭐0.30

Easily the biggest unlock for vibe coding 1

@GoogleAIStudio β€’ Fri Jun 26 20:42

describing an aesthetic in a prompt can be tough, so we made a button for it introducing Design Variations instantly generate, explore, and apply beautiful new UI layouts with a single click try it today in AI Studio https://t.co/cVnR4hjJZe https://t.co/JEyuImiWcP

V
victormustar
@victormustar
πŸ“…
Jun 24, 2026
10d ago
πŸ†”26947290

At Hugging Face we've been building our own agent that we use via Slack (Moon Bot). Honestly, building your own is quite simple and you'll be happy you did: any model you want (self-hosted if needed), fully customizable to your stack (drop in a skill file and it can use any internal tool, codebase, or DB), your data never leaves your infra, every session auditable in your own bucket. and ofc no lock-in and no waitlist and not overpriced :). read more: https://t.co/7l1w3cib0M

@claudeai β€’ Tue Jun 23 17:12

Introducing Claude Tag, a new way for teams to work with Claude. In Slack, Claude joins as a team member with access to the channels and tools you choose. Tag Claude in and delegate tasks to it while you focus on other work. https://t.co/R2C6A5Kcye

Media 1Media 2
πŸ–ΌοΈ Media
N
NVIDIAAI
@NVIDIAAI
πŸ“…
Jun 22, 2026
12d ago
πŸ†”06628287

3D scene reconstruction works great until the camera never sees part of the scene. ArtiFixer from NVIDIA Research is an open autoregressive model that fills in the missing geometry that other methods leave blank. #SIGGRAPH2026 paper, code + demo: https://t.co/D9PX2OzbZf https://t.co/AGQicvVKkW

πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Jun 27, 2026
7d ago
πŸ†”38758217

LiteParse is unreasonably good for document parsing βœ… It is the fastest document parsing tool out there - average parse time per page is 3ms ⚑️⚑️ βœ… Now that we support markdown, it tops opendataloader-bench, OlmOCR-bench, and ParseBench in terms of accuracy βœ… It supports 50+ other document formats βœ… It even gives you basic bounding boxes that your coding agent can stitch together Even if you need deeper VLM-enabled parsing (e.g. LlamaParse), there's no reason you shouldn't be using this as a first pass for everything. https://t.co/JNER0mVcB8

@llama_index β€’ Thu Jun 25 16:26

We built LiteParse, the fastest document parsing solution on the planet and made it open source. And it just hit 10k github stars. πŸ¦™ Fast to run. Fast to love. Thanks for building with us. If you haven't tried it already, repo at: https://t.co/wXRxvlREQq https://t.co/Shv0J1CRO

Media 1Media 2
+1 more
πŸ–ΌοΈ Media