Your curated collection of saved posts and media

Showing 10 posts Β· last 14 days Β· by score
βž• Add New Post
I
iScienceLuvr
@iScienceLuvr
πŸ“…
Apr 13, 2026
11d ago
πŸ†”67650808

Efficient RL Training for LLMs with Experience Replay "Empirically, we show that a well-designed replay buffer can drastically reduce inference compute without degrading – and in some cases even improving – final model performance, while preserving policy entropy." https://t.co/8KeFNPQ4mK

Media 1
πŸ–ΌοΈ Media
L
LeeChiJung
@LeeChiJung
πŸ“…
Apr 12, 2026
11d ago
πŸ†”79538485

No cameras. No extra sensors. Your smartwatch already has everything it needs to track your hand. ⌚️✌🏻 Monday at #CHI2026, @jiwan_hci and I are presenting WatchHand, a continuous 3D hand pose tracking system that uses just the speaker and mic in your smartwatch. https://t.co/8bXMI2Mux4

πŸ–ΌοΈ Media
F
fchollet
@fchollet
πŸ“…
Apr 15, 2026
9d ago
πŸ†”18310832
⭐0.40

ARC-AGI-3 has the lowest human bar of any AI benchmark out there. Almost all benchmarks require specialized knowledge that make them inaccessible to 99%+ of humans (like, say SWE-Bench). ARC-AGI-3 is feasible by regular people.

F
fchollet
@fchollet
πŸ“…
Apr 15, 2026
9d ago
πŸ†”58066554
⭐0.32

Any smart human giving it real effort should score >90% on ARC-AGI-3

W
WenhuChen
@WenhuChen
πŸ“…
Apr 10, 2026
14d ago
πŸ†”28096939

Super excited to share our ClawBench to test real-world tasks. Check out our website at https://t.co/qkW3LJA77b

@arankomatsuzaki β€’ Fri Apr 10 03:18

ClawBench: Can AI Agents Complete Everyday Online Tasks? A real-world benchmark for AI agents: 153 everyday online tasks across live websites (shopping, booking, job apps). Even top models struggleβ€”dropping from ~70% on sandbox benchmarks to as low as 6.5% here. https://t.co/A

Media 1
πŸ–ΌοΈ Media
A
arcprize
@arcprize
πŸ“…
Apr 14, 2026
9d ago
πŸ†”87056294

ARC-AGI-3 Human Baseline Dataset Today we're open-sourcing the ARC-AGI-3 Human Baseline. This is the most exhaustive human testing study in the ARC-AGI series Every environment was solved by at least 2 people (many by more) from the general public, with no prior training https://t.co/yk1QBrHWln

πŸ–ΌοΈ Media
G
GoogleAI
@GoogleAI
πŸ“…
Apr 10, 2026
13d ago
πŸ†”08378257
⭐0.38

A prototype that turns everyday life into something like an adventure game. It’s built on a Pixel 10 Pro with Gemma 4 via AI Core. https://t.co/AnBZ7GeS5F

@GOROman β€’ Mon Apr 06 11:03

θ‘—γ‚’AIγŒθ¦‹γ¦γγ‚Œγ¦γ‚²γƒΌγƒ γΏγŸγ„γ«γƒ‘γƒƒγ‚»γƒΌγ‚Έθ‘¨η€Ίγ—γ¦γγ‚Œγ‚‹γ‚„γ€δ½œγ£γŸ ローカルVLM(γƒγƒƒγƒˆζŽ₯碚不要 https://t.co/nlx5t8cc1H

πŸ”tri_dao retweeted
S
Monishwaran Maheswaran
@sudomonish
πŸ“…
Apr 10, 2026
13d ago
πŸ†”67173352
⭐0.32

Super excited to Introduce our latest work: Squeeze Evolve. We unify test-time scaling methods into one evolutionary framework β€” then orchestrate many models across it. 3x lower cost. 10x throughput. 97.5%(SoTA) on ARC-AGI-V2. No verifier required. Framework: https://t.co/5hmOyZvSKU

❀️51
likes
πŸ”12
retweets
R
rawat_ritvik
@rawat_ritvik
πŸ“…
Apr 15, 2026
8d ago
πŸ†”42603056
⭐0.36

We are looking for excellent people to help build our vertically integrated AI stack. Numerics, quantization, HW simulators, compiler, runtime, kernel performance, RTL, verification, emulation, DFT, physical design, post Si bringup. Join us at Tesla!

@ β€’

V
vercel_dev
@vercel_dev
πŸ“…
Apr 15, 2026
8d ago
πŸ†”73960733
⭐0.36

Use Vercel Sandbox with the OpenAI agents SDK as an official extension. Build agents that can run code, read files, and analyze data safely inside isolated microVMs. Control the compute and data flow from your secure cloud environment.

@OpenAIDevs β€’ Wed Apr 15 17:23

Build long-running agents with more control over agent execution. New capabilities in the Agents SDK: β€’ Run agents in controlled sandboxes β€’ Inspect and customize the open-source harness β€’ Control when memories are created and where they’re stored https://t.co/zPyuLup6b6