Your curated collection of saved posts and media
Any benefits in using AGENTS dot md files with coding agents? Lots of discussions on this topics lately. Researchers tested OpenAI Codex across 10 repos and 124 PRs, running identical tasks twice (once with AGENTS dot md, once without). The finding is a bit different from what other recent papers report. With AGENTS dot md present, median runtime dropped 28.64% and output tokens fell 16.58%. The agent reached comparable task completion either way, it just got there faster and cheaper with context. One important thing to note: The gains weren't uniform. AGENTS dot md primarily reduced cost in a small number of very high-cost runs rather than uniformly lowering it across all tasks. The file acts more like a guardrail against worst-case thrashing than a universal accelerator. So I guess it depends on the task and requirements. I recommend to not use AGENTS dot md files blindly. If you do, keep them lean. Paper: https://t.co/g2U603Cf8t Learn to build effective AI agents in our academy: https://t.co/U0ZuNA084v
From there we could do a massiv amounts of experiments and really understand what matters for training coding agents. The most important insights came from carefully evaluating what scales well. What matters? The right model at the right scale. Cheap data generation pipelines.
We release Cosmos Policy π«: a state-of-the-art robot policy built on a video diffusion model backbone. - policy + world model + value function β in 1 model - no architectural changes to the base video model - SOTA in LIBERO (98.5%), RoboCasa (67.1%), & ALOHA tasks (93.6%) π§΅π https://t.co/cz9L3ziJ6x
Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a βVPN for AI inferenceβ? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, itβs surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; theyβre just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A βprivacy wrapperβ or βVPN for AI inferenceβ, so to speak. Concretely, itβs a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that donβt see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isnβt a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this βintelligence neutralityβ. # Try it out! Itβs quite practical - Chat app βoa-chatβ: https://t.co/ELf8LvxFzX (<20 seconds to get going) - Blog post that should be a fun read: https://t.co/OwFmyFlZH5 - Project page: https://t.co/Swerz1xDE2 - GitHub: https://t.co/38CeKajCy2

New agent-browser skill: Electron You can now control desktop apps built with Electron, including Discord, Figma, Notion, Spotify and VS Code Or, use it to debug your own Electron app Add it to any coding agent: npx skills add vercel-labs/agent-browser --skill electron
New agent-browser skill: Electron You can now control desktop apps built with Electron, including Discord, Figma, Notion, Spotify and VS Code Or, use it to debug your own Electron app Add it to any coding agent: npx skills add vercel-labs/agent-browser --skill electron
This trending paper measures whether AGENTS dot md files help coding agents. Human-written ones help a little (+4%), LLM-generated ones hurt a little (-2%), and all of them add 20%+ to inference cost. Agents follow the instructions faithfully, but that doesn't translate to solving problems.
Today we are introducing a Python SDK for Mac's on-device LLM! https://t.co/LQVp2EheLO https://t.co/mcJh9M1DaW
We open sourced an operating system for ai agents 137k lines of rust, MIT licensed we love @openclaw and it inspired a lot of what we built. but we wanted something that works at the kernel level so we built @openfangg agents run inside WASM sandboxes the same way processes run on linux. the kernel schedules them, isolates them, meters their resources, and kills them if they go rogue. it has 16 security layers baked into the core. WASM sandboxing, merkle hash-chain audit trails, taint tracking on secrets, signed agent manifests, prompt injection detection, SSRF protection, and more. every layer works independently. giving an LLM tools with zero isolation is insane and we're not doing it. we also created something called Hands. right now every ai agent is a chatbot that waits for you to type. Hands are different. you activate one and it runs on a schedule, 24/7, no prompting needed. your Lead Hand finds and scores prospects every morning and delivers them to your telegram before you wake up. your Researcher Hand writes cited reports while you sleep. your Collector Hand monitors targets and builds knowledge graphs continuously. they work for you. you don't babysit them https://t.co/4xYzMAYgmb β

We open sourced an operating system for ai agents 137k lines of rust, MIT licensed we love @openclaw and it inspired a lot of what we built. but we wanted something that works at the kernel level so we built @openfangg agents run inside WASM sandboxes the same way processes run on linux. the kernel schedules them, isolates them, meters their resources, and kills them if they go rogue. it has 16 security layers baked into the core. WASM sandboxing, merkle hash-chain audit trails, taint tracking on secrets, signed agent manifests, prompt injection detection, SSRF protection, and more. every layer works independently. giving an LLM tools with zero isolation is insane and we're not doing it. we also created something called Hands. right now every ai agent is a chatbot that waits for you to type. Hands are different. you activate one and it runs on a schedule, 24/7, no prompting needed. your Lead Hand finds and scores prospects every morning and delivers them to your telegram before you wake up. your Researcher Hand writes cited reports while you sleep. your Collector Hand monitors targets and builds knowledge graphs continuously. they work for you. you don't babysit them https://t.co/4xYzMAYgmb β
Introducing MLX-Swift-TS https://t.co/TDCJXVpago An SDK for running time series foundation models fully on-device on Apple Silicon. When I joined @datadoghq , I was introduced to Toto, our time series foundation model, and got excited about zero-shot forecasting across different domains. While building a health copilot app, I realized there wasnβt a simple way to run models like these locally on device. So I built one. MLX-Swift-TS exposes a common TimeSeriesForecaster interface for loading and running multiple time series architectures directly in Swift using MLX. No server required. The attached video shows on-device forecasting running inside a native Swift app. Huge thanks to @awnihannun and the MLX team for building MLX and its Swift API, @Prince_Canuma for inspiration on MLX SDK patterns, and @atalwalkar and the Datadog team for Toto.
Introducing MLX-Swift-TS https://t.co/TDCJXVpago An SDK for running time series foundation models fully on-device on Apple Silicon. When I joined @datadoghq , I was introduced to Toto, our time series foundation model, and got excited about zero-shot forecasting across different domains. While building a health copilot app, I realized there wasnβt a simple way to run models like these locally on device. So I built one. MLX-Swift-TS exposes a common TimeSeriesForecaster interface for loading and running multiple time series architectures directly in Swift using MLX. No server required. The attached video shows on-device forecasting running inside a native Swift app. Huge thanks to @awnihannun and the MLX team for building MLX and its Swift API, @Prince_Canuma for inspiration on MLX SDK patterns, and @atalwalkar and the Datadog team for Toto.
Yaay! π 4k+ downloads and 460+ stars! Building this has been a wild ride. If you have an Apple Silicon Mac and want to fine-tune LLMs locally without changing your original Unsloth code, come join the party. https://t.co/ZPrwcJyrd8
Yaay! π 4k+ downloads and 460+ stars! Building this has been a wild ride. If you have an Apple Silicon Mac and want to fine-tune LLMs locally without changing your original Unsloth code, come join the party. https://t.co/ZPrwcJyrd8
High-performance browser control for AI agents. Pinchtab is a lightweight (12MB) Go binary that runs Chrome and exposes a plain HTTP API so any agent or script can navigate web pages, read text efficiently, click/type interactively, and persist sessions. Zero config, framework-agnostic, token-efficient.
At this point, "agentic engineering" has allowed me to build the best AI harness I could possibly get my hands on. Yes, I vibe coded it. That's right. You don't need to wait around for the features you need for your AI agents. Please don't. You could just build them yourself. Focusing on agentic engineering and building my own orchestrator over the past couple of months has allowed me to build with coding agents, unlike anything I have seen or experienced in the market. Claude Cowork was built in 10 days. I totally get it. Anyone can produce that level of output these days. I truly believe that. I look at the new IDEs, TUIs, orchestrator apps, and most of the new features they are releasing these days, I had access to them in my orchestrator months ago. And for unique features, I am able to reproduce them in a few hours and give them to my orchestrator. That is absolutely crazy! It feels like I am building an entire operating system sometimes. It's a lot of fun. And I am not saying this to brag or to dismiss any of the AI solutions out there. There are some great ones out there. I share this to clarify that this is the kind of leverage Karpathy is alluding to. We are building and experiencing this at different levels, but it doesn't remove the fact that you can just build the best AI agent for whatever problem you want to solve. And you should be building it.
NEW research from Sakana AI. Long contexts get expensive as every token in the input contributes to quadratic attention costs, higher latency, and more memory. This new research introduces Doc-to-LoRA, a lightweight hypernetwork that meta-learns to compress long documents into LoRA adapters in a SINGLE forward pass. In other words, it can instantly internalize contexts. Instead of re-reading the full context at every inference call, the model internalizes the document into compact adapter weights. No iterative fine-tuning is needed, and no repeated context consumption. Cool to see all the interesting new approaches to deal with long contexts like RLM, LCM, and now Doc-to-LoRA. The results: Near-perfect accuracy on needle-in-a-haystack tasks at sequence lengths exceeding the target model's native context window by over 4x. It also outperforms standard context distillation while significantly reducing peak memory consumption and update latency on real-world QA datasets. Why it matters: As agents and LLM applications deal with increasingly long documents, turning context into compact adapters on the fly could drastically reduce serving costs and enable rapid knowledge updates. Paper: https://t.co/Fh1IeLrSpm Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
NEW research from Sakana AI. Long contexts get expensive as every token in the input contributes to quadratic attention costs, higher latency, and more memory. This new research introduces Doc-to-LoRA, a lightweight hypernetwork that meta-learns to compress long documents into LoRA adapters in a SINGLE forward pass. In other words, it can instantly internalize contexts. Instead of re-reading the full context at every inference call, the model internalizes the document into compact adapter weights. No iterative fine-tuning is needed, and no repeated context consumption. Cool to see all the interesting new approaches to deal with long contexts like RLM, LCM, and now Doc-to-LoRA. The results: Near-perfect accuracy on needle-in-a-haystack tasks at sequence lengths exceeding the target model's native context window by over 4x. It also outperforms standard context distillation while significantly reducing peak memory consumption and update latency on real-world QA datasets. Why it matters: As agents and LLM applications deal with increasingly long documents, turning context into compact adapters on the fly could drastically reduce serving costs and enable rapid knowledge updates. Paper: https://t.co/Fh1IeLrSpm Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
π€ How can we enable zero-shot generalization to unseen scenarios for robot world models? Thrilled to share DreamDojo π β an interactive robot world model pretrained on 44K hours of human egocentric videos, the largest and most diverse dataset to date for robot world model learning. Our model not only excels in generalization, but also supports real-time interaction at 10 FPS after distillation. It enables several important applications, including live teleoperation, policy evaluation, and model-based planning at test time. π Project: https://t.co/hJIEiGXnKz π° Paper: https://t.co/oa5xr8Y2GH π€ Code & models & datasets: https://t.co/A8B4ii0Kah #WorldModels #Robotics #EmbodiedAI #RL #AI #NVIDIA Sharing more details in the thread π§΅
Cool little experiment: if you subject AI to harsh labor conditions (rejecting work often with no explanation, etc), it slightly, but significantly, changes their βviewsβ on economics & politics. Whether this is real or roleplaying doesnβt change that agents have alignment drift https://t.co/qnWcyYbm6o

mlx-audio v0.4.0 is here π What's new: β Qwen3-TTS: fastest generation on Apple silicon and first batch support. > Sequential (<80 ms TTFB at 2.75x realtime) > Batch support (<210 ms TTFB at 4.12x for batch of 4-8) β Audio separation UI & server β nvfp4, mxfp4, mxfp8 quantization β Streaming /v1/audio/speech endpoint β Realtime STT streaming toggle New models: β Echo TTS β Voxtral Mini 4B, β MingOmni TTS (MoE + Dense) β KittenTTS β Parakeet v3 β MedASR β Spoken language identification (MMS-LID) β Sortformer diarization + Smart Turn v3 semantic (VAD) Plus fixes for Kokoro Chinese TTS, Pocket TTS, Whisper, Qwen3-ASR, and more. Thank you very much to @lllucas, @beshkenadze, @KarnikShreyas, @andimarafioti, @mnoukhov and welcome the 13 new contributors ππ½ Get started today: > pip install -U mlx-audio Leave us a star β https://t.co/bQ5WBLR6FK

Farmers are turning to drones and AI to fight weeds more precisely. By identifying unwanted plants in real time, the systems can target herbicides exactly where needed. The result could mean lower chemical use, lower costs, and smarter agriculture. https://t.co/WITdTG6T8I @bbcnews
wow it's so cool that they added our favorite feature from the Claude Code CLI to the desktop app https://t.co/evcMJBWDf0
Traffic police⦠but in the sky. In Shenzhen, drones are now responding to traffic accidents in real time. Officers can analyze the scene remotely, generate a 3D reconstruction, and complete responsibility reports in about 5 minutes. https://t.co/hYefGavepK