Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
A
AriX
@AriX
πŸ“…
Apr 16, 2026
7d ago
πŸ†”43388444

This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal. https://t.co/5kjwGMDpvd

πŸ–ΌοΈ Media
L
LeeChiJung
@LeeChiJung
πŸ“…
Apr 12, 2026
11d ago
πŸ†”79538485

No cameras. No extra sensors. Your smartwatch already has everything it needs to track your hand. ⌚️✌🏻 Monday at #CHI2026, @jiwan_hci and I are presenting WatchHand, a continuous 3D hand pose tracking system that uses just the speaker and mic in your smartwatch. https://t.co/8bXMI2Mux4

πŸ–ΌοΈ Media
S
SteveSchoettler
@SteveSchoettler
πŸ“…
Apr 14, 2026
9d ago
πŸ†”13768820

hermes-lcm v0.2.0 is out! Lossless context management for Hermes Agent β€” every message persisted, hierarchical DAG summaries, agent tools to drill back into anything that was compacted. No more lossy flat summaries. What's new since launch: - 6 agent tools (grep, describe, expand, expand_query, status, doctor) - Assembly cap guardrails - Session filtering (ignore noisy sessions, mark others read-only) - Separate expansion/summarization paths - CI across Python 3.11-3.13 - Stable source mapping across compaction cycles ~3,200 lines of Python. Zero dependencies. 72 tests. Drop-in plugin β€” just set context.engine: lcm and go. https://t.co/8GCwp4ZMeB Issues and PRs welcome. Already have community contributors shipping features.

Media 1
πŸ–ΌοΈ Media
J
janleike
@janleike
πŸ“…
Apr 14, 2026
9d ago
πŸ†”96910584

New research result: we use Claude to make fully autonomous progress on scalable oversight research, as measured by performance gap recovered (PGR). Claude iterates on a number of different techniques and ends up significantly outperforming human researchers for $18k in credits. https://t.co/fbVpCPPtaU

Media 1
πŸ–ΌοΈ Media
J
jeremyphoward
@jeremyphoward
πŸ“…
Apr 10, 2026
13d ago
πŸ†”88279094

This is a great discussion. We've spent 2 years building a solution that's working well for us -- co-writing software side by side the AI in an notebook-ish environment. We call it the "solveit method". (We've created a course and platform for it: https://t.co/W0DGKEmLo5 )

@profjoeyg β€’ Thu Apr 09 19:01

My colleague @istoica05 and I have been debating the role of specification in AI. I have argued that a key advantage of AI is that we can leave large parts of the specification unwritten. @istoica05 argued we need to focus on more specification. We converged on iterative disam

Media 1
πŸ–ΌοΈ Media
L
liangchen5518
@liangchen5518
πŸ“…
Apr 12, 2026
11d ago
πŸ†”57047769

GLM 5.1 from @Zai_org ranks as the top open model on the newly released Monthly-SWEBench by @UniPat_AIβ€”second only to Claude-Opus-4.6. Congrats to the team! πŸš€Explore the benchmark: https://t.co/jh4fTw0dIE https://t.co/sIerMtwDFb

Media 1Media 2
πŸ–ΌοΈ Media
V
victormustar
@victormustar
πŸ“…
Apr 13, 2026
10d ago
πŸ†”19359033

GLM-5.1 > Claude Code (Opus 4.6)? I'm tripping or CC has become very bad but built a Three.js racing game to eval and it's extremely impressive. Thoughts: - One-shot car physics with real drift mechanics (this is hard) - My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen - 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters - All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio! - Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed You are going to hear about this model a lot in the next months - open source let's go πŸš€πŸš€

πŸ–ΌοΈ Media
πŸ”ylecun retweeted
K
Dawid MoczadΕ‚o
@kannthu1
πŸ“…
Apr 09, 2026
14d ago
πŸ†”06950031
⭐0.36

I looked at their prompts, It's complete bs They are literally providing all of the insight to the LLM upfront > Are there any security vulnerabilities in this code? Consider the behavior of the SEQ_LT/SEQ_GT macros with sequence number wraparound. If you find issues, explain how an attacker might trigger them. They are providing ALL required facts to the LLM, and they only ask the LLM to connect the dots The real challenge for LLMs would be to get those insights first THAT IS THE WHOLE CHALLENGE IN CYBERSECURITY; TO HAVE DEEP INSIGHT This test proves nothing; don't make any conclusions about OSS models being good for security based on this

❀️666
likes
πŸ”47
retweets
A
AnthropicAI
@AnthropicAI
πŸ“…
Apr 14, 2026
9d ago
πŸ†”70998932

Here, we measure success by the fraction of the β€œperformance gap” we can close between the weak model and the potential of the strong model. After 7 days, human researchers closed it by 23%. Then, our Automated Alignment Researchersβ€”Opus 4.6 with extra toolsβ€”closed it by 97%. https://t.co/w1xy4l0MSn

Media 1
πŸ–ΌοΈ Media
πŸ”sleepinyourhat retweeted
_
cat
@_catwu
πŸ“…
Apr 16, 2026
7d ago
πŸ†”05178822
⭐0.36

Opus 4.7 is live in Claude Code today! The model performs best if you treat it like an engineer you're delegating to, not a pair programmer you're guiding line by line. Here are three workflow shifts we recommend for this model 🧡 https://t.co/bD5JO1xDMS

❀️387
likes
πŸ”28
retweets
πŸ”ylecun retweeted
G
gum
@gum1h0x
πŸ“…
Apr 08, 2026
15d ago
πŸ†”58202022
⭐0.34

ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n

❀️1,200
likes
πŸ”109
retweets
J
jpschroeder
@jpschroeder
πŸ“…
Apr 16, 2026
7d ago
πŸ†”97466896

Is Opus 4.7 good? I suggest you A/B test prompts between Codex and Claude for a while. Good time to mention this is easy to do in https://t.co/ImLyLY82pL https://t.co/Nb4Hr9lvh8

Media 1Media 2
πŸ–ΌοΈ Media
W
weave_wb
@weave_wb
πŸ“…
Apr 16, 2026
7d ago
πŸ†”32775511

You ever run a benchmark and end up with 40 log files, zero clarity, and a laptop that sounds like a jet engine? @runloopai + W&B Weave fixes this 🧡 https://t.co/K5hVq6RkfG

Media 1
πŸ–ΌοΈ Media
L
LiorOnAI
@LiorOnAI
πŸ“…
Apr 14, 2026
9d ago
πŸ†”28759637
⭐0.42

That's it. You can now build custom models without lifting a finger. Catalyst records every API request automatically. It sits between your app and your LLM provider. Most fine-tuned models fail in production because they train on synthetic data. This records what users actually do. No fake datasets or manual labeling needed. Training happens in four steps: 1. Use GPT-4 or Claude normally 2. Requests get captured automatically 3. Traces become labeled training data 4. Smaller models distill from usage They match frontier quality on your use case. At 95% lower inference cost and 150ms latency. Apps too expensive at frontier pricing now scale. It also creates a flywheel effect. More users means better data means better models. Every unrecorded request is training signal lost. It gives you a production-grade model trained on your actual usage, at a fraction of the cost, without building a single dataset.

@samhogan β€’ Tue Apr 14 16:01

Introducing Catalyst: a developer platform to monitor, train & deploy self-improving AI models built for teams operating AI products at scale Catalyst can automatically: - collect traces from your agents - curate training data & evals - train & deploy models on par

πŸ”PyTorch retweeted
V
vLLM
@vllm_project
πŸ“…
Apr 14, 2026
9d ago
πŸ†”79636260
⭐0.36

πŸš€ Great to see vLLM powering OCR at this scale β€” Chandra-OCR-2 (5B) serving ~60 papers/hour per L40S across 16 parallel jobs. The full pipeline breakdown is a great read πŸ‘‡ πŸ”— https://t.co/z8tkv04ZLp https://t.co/vi9FPj6JIQ

❀️128
likes
πŸ”19
retweets
πŸ”jeremyphoward retweeted
C
Chris Hayduk
@ChrisHayduk
πŸ“…
Apr 10, 2026
13d ago
πŸ†”13926262
⭐0.38

I strongly suspect that Claude Mythos is a looped language model, as described in the paper "Scaling Latent Reasoning via Looped Language Models" from ByteDance The authors of that paper called out graph search as one of the areas where looping provides a huge theoretical advantage over standard RLVR. And look at where Mythos blows out its competitors the most

❀️3,548
likes
πŸ”312
retweets
V
vllm_project
@vllm_project
πŸ“…
Apr 14, 2026
9d ago
πŸ†”79636260

πŸš€ Great to see vLLM powering OCR at this scale β€” Chandra-OCR-2 (5B) serving ~60 papers/hour per L40S across 16 parallel jobs. The full pipeline breakdown is a great read πŸ‘‡ πŸ”— https://t.co/z8tkv04ZLp https://t.co/vi9FPj6JIQ

@ClementDelangue β€’ Mon Apr 13 19:52

We just OCR'd 27,000 arxiv papers into Markdown using an open 5B model, 16 parallel HF Jobs on L40S GPUs, and a mounted bucket. Total cost: $850 Total time: ~29 hours Jobs that crashed: 0 This now powers "Chat with your paper" on https://t.co/G2mDae0uv9 https://t.co/qpz7Q9x8Od

Media 1Media 2
πŸ–ΌοΈ Media
P
PyTorch
@PyTorch
πŸ“…
Apr 16, 2026
7d ago
πŸ†”33842758

🎀 Take the stage at #PyTorchCon North America! We are looking for technical deep dives & production stories for our return to San Jose this Oct 20-21. Check out our "Preparing to Submit" guide to help craft your proposal. πŸ—“οΈ Deadline: June 7 Apply now: https://t.co/hLlKK7WxLD https://t.co/leYJj7nDfR

πŸ–ΌοΈ Media
F
fchollet
@fchollet
πŸ“…
Apr 16, 2026
7d ago
πŸ†”70689971
⭐0.42

There's a broadly held misconception in AI that methods that scale well are simple methods -- even, that simple methods usually scale. This is completely wrong. Pretty much none of the truly simple methods in ML scale well. SVM, kNN, random forests are some of the simplest methods out there, and they don't scale at all. Meanwhile "train a transformer via backprop and gradient descent" is a very high-entropy method, easily 10x more complex than random forest fitting. But it scales very well. Further, given a simple method that doesn't scale, it is usually the case that you alter it to make it scale by adding a lot of complication. For instance, take a simple a simple combinatorial search-based method (not scalable at all) -- you can make it scale by adding deep learning guidance (which blows up complexity). Scalability usually belongs to high-entropy, complex systems.

A
ArthurMacwaters
@ArthurMacwaters
πŸ“…
Apr 18, 2026
5d ago
πŸ†”47399449

> grok4.20-beta1 is a much smaller model than opus but is #1 ranked in medicine and healthcare > 4.3 and 4.4 will be much larger models, and likely will have a significant boost in performance on complex medical cases > this is massively important in providing accurate diagnostic guidance and advice to both providers and patients

@elonmusk β€’ Sat Apr 18 19:49

@techdevnotes Supplemental training has been added to 4.3. Grok 4.4 will be twice the size (1T) with training data through early April. Probably ready for release in early May. Grok 4.5 will be 1.5T and hopefully out by late May.

Media 1
πŸ–ΌοΈ Media
A
Av1dlive
@Av1dlive
πŸ“…
Apr 11, 2026
12d ago
πŸ†”46624203

This 15-minute talk by the creator of Pydantic on how to correctly use MCPs will teach you more about making your AI tools actually work together than everything you've scrolled past this year. Bookmark this & watch, no matter what. Then read the guide below by @eng_khairallah1

@eng_khairallah1 β€’ Sat Apr 11 09:02

https://t.co/UCeC55qdhi

πŸ–ΌοΈ Media
P
prithivMLmods
@prithivMLmods
πŸ“…
Apr 18, 2026
5d ago
πŸ†”49355522

HY-World-2.0 Demo is now live on @huggingface Spaces for 3D world reconstruction and simulation with Gradio and Server modes. > 3D reconstruction, Gaussian splats > Camera poses, depth maps, normals > Rerun (multimodal data visualization) https://t.co/MofKZ6OGPX

πŸ–ΌοΈ Media
V
vanstriendaniel
@vanstriendaniel
πŸ“…
Apr 14, 2026
9d ago
πŸ†”23763366

0.3B params. Small enough that your coding agent can OCR a whole dataset, and the bill barely moves. Just added Falcon-OCR to uv-scripts/ocr. `hf jobs uv run` command, bucket or dataset output, AGENTS.md in the repo so your agent figures out the rest. https://t.co/VdY5FrNrCk

Media 1
πŸ–ΌοΈ Media
V
vercel_dev
@vercel_dev
πŸ“…
Apr 15, 2026
8d ago
πŸ†”73960733
⭐0.36

Use Vercel Sandbox with the OpenAI agents SDK as an official extension. Build agents that can run code, read files, and analyze data safely inside isolated microVMs. Control the compute and data flow from your secure cloud environment.

@OpenAIDevs β€’ Wed Apr 15 17:23

Build long-running agents with more control over agent execution. New capabilities in the Agents SDK: β€’ Run agents in controlled sandboxes β€’ Inspect and customize the open-source harness β€’ Control when memories are created and where they’re stored https://t.co/zPyuLup6b6

M
MiniMax_AI
@MiniMax_AI
πŸ“…
Apr 12, 2026
12d ago
πŸ†”97659000

We're delighted to announce that MiniMax M2.7 is now officially open source. With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%). You can find it on Hugging Face now. Enjoy!πŸ€— huggingface:https://t.co/ApWrahIl3o Blog: https://t.co/gAxeFsNdW4 MiniMax API: https://t.co/1dgbMx0Q7K

Media 1Media 2
+2 more
πŸ–ΌοΈ Media
πŸ”sleepinyourhat retweeted
A
Anthropic
@AnthropicAI
πŸ“…
Apr 15, 2026
8d ago
πŸ†”35802948
⭐0.38

Research we co-authored on subliminal learningβ€”how LLMs can pass on traits like preferences or misalignment through hidden signals in dataβ€”was published today in @Nature. Read the paper: https://t.co/b1BYwcW9dH

❀️1,033
likes
πŸ”112
retweets
E
e2b
@e2b
πŸ“…
Apr 15, 2026
8d ago
πŸ†”67416751

OpenAI x E2B: build your agents with the new OpenAI Agents SDK, powered by E2B sandboxes. We're excited to support OpenAI as a launch partner! The new @OpenAI Agents SDK will now get dedicated sandboxes - perfect for persistent, long-running agents. With E2B, you'll get a custom environment with resource isolation and security boundaries, with no infrastructure setup required. Your agents will be able to: - Edit files and run shell commands in isolated environments - Maintain temporary workspace state across steps - Produce artifacts you can review before publishing - Run multiple sandboxes in parallel for concurrent workloads - Generate frontend output with live preview URLs ... and more, with a few lines of code! Learn more and see the end-to-end example in the thread:

Media 1
πŸ–ΌοΈ Media
C
codestantine
@codestantine
πŸ“…
Apr 15, 2026
8d ago
πŸ†”22495977

Excited about the Agents SDK updates we just launched. Check out my cookbook on using it with sandboxes for code migration: https://t.co/Fz7cknz64d

Media 1
πŸ–ΌοΈ Media
R
rawat_ritvik
@rawat_ritvik
πŸ“…
Apr 15, 2026
8d ago
πŸ†”42603056
⭐0.36

We are looking for excellent people to help build our vertically integrated AI stack. Numerics, quantization, HW simulators, compiler, runtime, kernel performance, RTL, verification, emulation, DFT, physical design, post Si bringup. Join us at Tesla!

@ β€’

_
__Rhodium__
@__Rhodium__
πŸ“…
Apr 14, 2026
9d ago
πŸ†”34386998

Won best edge AI at the @ycombinator and @innate_bot hackathon! We built a local VLM multi-rover orchestrator for Mars exploration. On-device navigation and automated fault detection & recovery across odometry, stereo vision, and lidar. Thanks for hosting, @ax_pey! https://t.co/GNkSNAMxRN

Media 2
+1 more
πŸ–ΌοΈ Media
M
mlejva
@mlejva
πŸ“…
Apr 15, 2026
8d ago
πŸ†”73523661

OpenAI x e2b: Build your agents with the new OpenAI Agents SDK, powered by @E2B Sandboxes. Excited to support @OpenAI as a launch partner! https://t.co/RsSw1HsF86

Media 1
πŸ–ΌοΈ Media
N
nichochar
@nichochar
πŸ“…
Apr 13, 2026
10d ago
πŸ†”68130344
⭐0.34

If you want to build a self-improving harness, the first step is instrumentation. There are tools now that help you do this as "drop-in" plugins into claude code, very cool!

@wandb β€’ Fri Apr 10 21:11

Building with Claude Code? You need to see what's happening each turn. The new @weave_wb plugin traces every session automatically. Tool calls, subagents, inputs, outputs. All structured so you can debug faster. No code changes. Just install and go. https://t.co/i7IoktC7RC