Your curated collection of saved posts and media
Efficient RL Training for LLMs with Experience Replay "Empirically, we show that a well-designed replay buffer can drastically reduce inference compute without degrading β and in some cases even improving β final model performance, while preserving policy entropy." https://t.co/8KeFNPQ4mK
No cameras. No extra sensors. Your smartwatch already has everything it needs to track your hand. βοΈβπ» Monday at #CHI2026, @jiwan_hci and I are presenting WatchHand, a continuous 3D hand pose tracking system that uses just the speaker and mic in your smartwatch. https://t.co/8bXMI2Mux4
ARC-AGI-3 has the lowest human bar of any AI benchmark out there. Almost all benchmarks require specialized knowledge that make them inaccessible to 99%+ of humans (like, say SWE-Bench). ARC-AGI-3 is feasible by regular people.
Any smart human giving it real effort should score >90% on ARC-AGI-3
Super excited to share our ClawBench to test real-world tasks. Check out our website at https://t.co/qkW3LJA77b
ClawBench: Can AI Agents Complete Everyday Online Tasks? A real-world benchmark for AI agents: 153 everyday online tasks across live websites (shopping, booking, job apps). Even top models struggleβdropping from ~70% on sandbox benchmarks to as low as 6.5% here. https://t.co/A
ARC-AGI-3 Human Baseline Dataset Today we're open-sourcing the ARC-AGI-3 Human Baseline. This is the most exhaustive human testing study in the ARC-AGI series Every environment was solved by at least 2 people (many by more) from the general public, with no prior training https://t.co/yk1QBrHWln
A prototype that turns everyday life into something like an adventure game. Itβs built on a Pixel 10 Pro with Gemma 4 via AI Core. https://t.co/AnBZ7GeS5F
θ‘γAIγθ¦γ¦γγγ¦γ²γΌγ γΏγγγ«γ‘γγ»γΌγΈθ‘¨η€Ίγγ¦γγγγγ€δ½γ£γ γγΌγ«γ«VLM(γγγζ₯ηΆδΈθ¦ https://t.co/nlx5t8cc1H
Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework β then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: https://t.co/5hmOyZvSKU
We are looking for excellent people to help build our vertically integrated AI stack. Numerics, quantization, HW simulators, compiler, runtime, kernel performance, RTL, verification, emulation, DFT, physical design, post Si bringup. Join us at Tesla!
Use Vercel Sandbox with the OpenAI agents SDK as an official extension. Build agents that can run code, read files, and analyze data safely inside isolated microVMs. Control the compute and data flow from your secure cloud environment.
Build long-running agents with more control over agent execution. New capabilities in the Agents SDK: β’ Run agents in controlled sandboxes β’ Inspect and customize the open-source harness β’ Control when memories are created and where theyβre stored https://t.co/zPyuLup6b6