Your curated collection of saved posts and media

Showing 32 posts ยท last 7 days ยท newest first
T
Teknium
@Teknium
๐Ÿ“…
Apr 17, 2026
26m ago
๐Ÿ†”50235783
โญ0.36

@hexyn7 Every tool call an agent makes, it has to send back the whole chat history EVERY time. so if you are at 50k tokens, each tool call from there sends back 50k tokens more. These tokens are majority input tokens, and cached - input tokens usually cost 5x less, and cached input tokens cost 90% less than input tokens in most cases.

T
Teknium
@Teknium
๐Ÿ“…
Apr 17, 2026
27m ago
๐Ÿ†”65181814
โญ0.38

@hexyn7 @Max_brandkernel What tools do you find unnecessary? `hermes tools` to disable whichever you want `hermes skills config` to disable whatever you want auxiliary models are cheaper than the main agent model in most cases, you can set them to dirt cheap ones if you want

๐Ÿ”Scobleizer retweeted
A
Allen Braden
@allen_explains
๐Ÿ“…
Apr 16, 2026
19h ago
๐Ÿ†”49319172
โญ0.38

This 2-hour Stanford lecture breaks down how models like ChatGPT and Claude are actually built, clearer than what many people in top AI roles ever get exposed to. Save this and set aside two hours today. It might end up being the most valuable thing you learn all week. https://t.co/5u97uZCWxd

โค๏ธ12,177
likes
๐Ÿ”1,974
retweets
๐Ÿ”Scobleizer retweeted
Z
zostaff
@zostaff
๐Ÿ“…
Apr 16, 2026
8h ago
๐Ÿ†”01092752
โญ0.36

AI FOOTBALL ANALYSIS. A FULL COMPUTER VISION SYSTEM. BUILT ON YOLO, OPENCV, AND PYTHON. You upload a regular match video. No sensors, no GPS trackers, just camera footage. The neural network finds every player, referee, and ball on its own. Every frame, in real time. KMeans clustering breaks down jersey colors pixel by pixel. The system splits players into teams automatically. Without a single manual hint. Optical Flow tracks camera movement. Separates it from player movement. Perspective Transformation converts pixels into real meters. Speed of every player. Distance covered. Ball possession percentage. All calculated automatically. Four hours of tutorial from zero to a working system. The model is trained on real Bundesliga matches. Runs on a regular GPU. Python code - take it and run. Sports analytics is no longer behind closed doors. AI leveled the playing field.

โค๏ธ937
likes
๐Ÿ”88
retweets
P
percyliang
@percyliang
๐Ÿ“…
Apr 17, 2026
2h ago
๐Ÿ†”77991535
โญ0.40

Marin is using quantile balancing from @Jianlin_S (who developed RoPE, which was also a good idea) to train our current 1e23 FLOPs MoE. The idea is elegant: assigning tokens to experts by solving a linear program. No hyperparameters to tune. Yields stable training.

@classiclarryd โ€ข Wed Apr 15 16:26

Researchers' brilliant ideas often get lost in the sea of endless SOTA claims on weak baselines. At Marin we battle-test ideas in an open arena, where anyone's idea can be promoted to the next hero run. One that recently rose up was @Jianlin_S MoE Quantile Balancing, used in our

P
percyliang
@percyliang
๐Ÿ“…
Apr 17, 2026
3h ago
๐Ÿ†”91781116

See all the gory details on GitHub: https://t.co/CfUbhtcBOp and follow along on wandb: https://t.co/UWU00HPknJ

Media 1Media 2
๐Ÿ–ผ๏ธ Media
๐Ÿ”_akhaliq retweeted
K
Kevin Lin
@KevinQHLin
๐Ÿ“…
Apr 16, 2026
15h ago
๐Ÿ†”44620811
โญ0.34

Thanks @_akhaliq sharing our work! Can frontier Multimodal Agents play games as well as humans? ๐ŸคฉWe are excited to introduce ๐ŸŽฎGameWorld: towards standardized and verfiable evaluation for multimodal game agents. ๐Ÿ•น๏ธ 34 browser games ๐Ÿ“Œ 170 tasks ๐Ÿค– 18 multimodal agent baselines, covering 1. Computer-use (CUA) agents ๐Ÿ‘‰ raw keyboard + mouse actions 2. Generalist multimodal agents ๐Ÿ‘‰ semantic action parsinga GameWorld show that even sota agents still perform far below novice human players. ๐Ÿ“นWatch our live runs: https://t.co/wrhKJD9JVx ๐ŸŒproject page: https://t.co/J906LQ6Sfj ๐Ÿ’ปgithub: https://t.co/W1vL99MDg5 work done with @OuyyyangMingyu @who_s_yuan Hwee Tou Ng, @MikeShou1

โค๏ธ18
likes
๐Ÿ”5
retweets
๐Ÿ”drfeifei retweeted
W
Wenlong Huang
@wenlong_huang
๐Ÿ“…
Apr 15, 2026
1d ago
๐Ÿ†”89701624
โญ0.34

I recently gave some talks on PointWorld. In this latest version, I discussed: Why world models? Why 3D? Why it matters amidst scaling data in robotics? Why itโ€™s a missing side of the coin for โ€œThe Bitter Lessonโ€? (Itโ€™s more than just a better backbone for training policies) https://t.co/oGhLvuyB6B

โค๏ธ78
likes
๐Ÿ”11
retweets
๐Ÿ”jxnlco retweeted
C
AGIใƒฉใƒœ
@ctgptlb
๐Ÿ“…
Apr 16, 2026
14h ago
๐Ÿ†”09851775
โญ0.36

ใ€้€Ÿๅ ฑใ€‘OpenAIใ€Codex ใซๅคงๅž‹ใ‚ขใƒƒใƒ—ใƒ‡ใƒผใƒˆโ€ฆ!! ใƒปใ‚ณใƒณใƒ”ใƒฅใƒผใ‚ฟไฝฟ็”จ ใƒปใ‚ขใƒ—ใƒชๅ†…ใƒ–ใƒฉใ‚ฆใ‚ถ ใƒป็”ปๅƒ็”Ÿๆˆ/็ทจ้›† ใƒป90่ถ…ใฎๆ–ฐใƒ—ใƒฉใ‚ฐใ‚คใƒณ(Atlassian / GitLab / Microsoftใชใฉ) ใƒปใƒกใƒขใƒช ใƒป้•ทๆœŸใ‚ฟใ‚นใ‚ฏใฎ่‡ชๅ‹•ๅพฉๅธฐ(ๆ•ฐๆ—ฅใ€œๆ•ฐ้€ฑ้–“ใ‚’ใพใŸใไฝœๆฅญ็ถ™็ถš) ๆœฌๆ—ฅใ‚ˆใ‚Š้ †ๆฌกๅฑ•้–‹๐Ÿ‘‡ https://t.co/CzmZ1MOUOT

โค๏ธ829
likes
๐Ÿ”125
retweets
๐Ÿ”jxnlco retweeted
G
null-sensei
@GOROman
๐Ÿ“…
Apr 16, 2026
8h ago
๐Ÿ†”64152187

macOS ็‰ˆใฎCodex Desktop ใ‚ขใƒ—ใƒชใซใ€ŒComputer Useใ€ๆฉŸ่ƒฝใŒใคใ„ใŸใฎใงใ‚คใƒณใ‚นใƒˆใƒผใƒซใ—ใฆ่ฉฆใ—ใฆใฟใพใ™ใ€‚ https://t.co/xBU891BSUa

Media 1
โค๏ธ119
likes
๐Ÿ”8
retweets
๐Ÿ–ผ๏ธ Media
S
Shaughnessy119
@Shaughnessy119
๐Ÿ“…
Apr 17, 2026
5h ago
๐Ÿ†”02217824
โญ0.44

Let me break down why @NousResearch Tool Gateway is so important for Agents Right now if you want - Image generation - Text to Speech - Browser automation - Web Scraping - LLM Or any of the 100s of other tools you need to signup with each provider and get an API key Dozens of API keys and billing accounts to manage. It's very annoying. With Nous Tool Gateway you get ALL of these tools setup AUTOMATICALLY through your one Nous account Switch your agent to a Nous plan and cancel your laundry list of APIs Signup: https://t.co/3XhGYKH7AC

@NousResearch โ€ข Thu Apr 16 20:39

Tool Gateway is now live in Nous Portal. No separate accounts, no API key juggling. All you need is one subscription, and everything works. A paid Nous Portal subscription now includes access to 300+ models and a growing set of third-party tools. Launching with: โ†’ Web scraping

๐Ÿ”llama_index retweeted
J
Jerry Liu
@jerryjliu0
๐Ÿ“…
Apr 16, 2026
9h ago
๐Ÿ†”46363016
โญ0.34

We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench - our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results ๐Ÿง‘โ€๐Ÿ”ฌ: - Opus 4.7 is a general improvement over Opus 4.6. It has gotten much better at charts compared to the previous iteration - Opus 4.7 is quite good at tables, though not quite as good as Gemini 3 flash - Opus 4.7 wins on content faithfulness across all techniques (including ours) - Using Opus 4.7 as an OCR solution is expensive at ~7c per page!! For comparison, our agentic mode is 1.25c and cost-effective is ~0.4c by default. Take a look at these results and more on ParseBench! https://t.co/tYiSOMbd6p

โค๏ธ60
likes
๐Ÿ”9
retweets
S
SathvikBil
@SathvikBil
๐Ÿ“…
Apr 17, 2026
6h ago
๐Ÿ†”33952756
โญ0.36

THREAD 1/7 Every AI benchmark a lab bragged about this year is compromised. not because labs are cheating. because the game itself is broken.

R
render
@render
๐Ÿ“…
Apr 17, 2026
6h ago
๐Ÿ†”60951260

Codex writes your code โ†’ Codex ships it on Render. We built a Codex plugin with @OpenAI that lets you deploy, debug, and monitor your entire stack on Render, without leaving your flow. Just type @render https://t.co/xrgcCksKRj

@OpenAI โ€ข Thu Apr 16 17:18

Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks. https://t.co/UEEsYBDYfo

๐Ÿ–ผ๏ธ Media
G
gerardsans
@gerardsans
๐Ÿ“…
Apr 17, 2026
7h ago
๐Ÿ†”75382102
โญ0.42

@ChanPerco The general public is missing an important dimension to judge AI models: operational costs. Compute time is the silent variable that the press and benchmarks ignore. A model that spends 48h of inference to reach what another hits in seconds simply doesnโ€™t show up in todayโ€™s data. Yet thatโ€™s exactly what would reveal whether the approach is economically viable. Claims of โ€œrecursive self-improvementโ€ are mostly bounded optimization over fixed support. LLMs arenโ€™t open-ended learners here: theyโ€™re function approximators resampling the same distribution. That alone locks in diminishing returns โ‰  takeoff. Every agent loop or test-time compute burns tokens and FLOPs. Benchmarks show the wins. They rarely show the avg@k reality: how many background runs it actually took. Businesses donโ€™t have unlimited VC capital to burn on tokens. The moment the subsidised token pot goes dry, the whole hotdog stand may crash and burn from its own operation. Real organisations optimise for return on spend, not โ€œbest output no matter the price.โ€ Technical gains plateau fast while inference costs scale linearly with every extra loop. So you hit two ceilings at once: โ†’ Technical: diminishing returns from bounded optimisation โ†’ Economic: compounding costs for shrinking gains Thereโ€™s no free compounding flywheel. Youโ€™re trading ever-more compute for incremental refinement, and that trade stops making sense long before takeoff. Your agentic AI workforce looks magically self-sustaining. Right up until the bill arrives and you are forced to close shop.

G
gerardsans
@gerardsans
๐Ÿ“…
Apr 17, 2026
7h ago
๐Ÿ†”20666960
โญ0.40

@saprmarks AI doesnโ€™t have a self, hidden intentions or beliefs. Itโ€™s easy to project human traits onto it, but thatโ€™s metaphor, not reality. The industry has normalised aspirational narratives while downplaying limitations. Full breakdown: https://t.co/gKLjtn12uz

M
Mid0
@Mid0
๐Ÿ“…
Apr 17, 2026
7h ago
๐Ÿ†”83351487
โญ0.32

@theo Works when you trigger ultrathink mode (I know they deprecated it) but somehow reasoning effort is higher now like xHigh. Might be a bugโ€ฆ

๐Ÿ”_akhaliq retweeted
H
Han Wang
@HanWang98
๐Ÿ“…
Apr 17, 2026
8h ago
๐Ÿ†”48390924
โญ0.34

Thanks @_akhaliq for sharing our work! If you are interested in multimodal agents on open-web search, please see our thread for more details: https://t.co/uh9rr6hFcZ

โค๏ธ3
likes
๐Ÿ”2
retweets
H
HanWang98
@HanWang98
๐Ÿ“…
Apr 17, 2026
8h ago
๐Ÿ†”48390924
โญ0.40

Thanks @_akhaliq for sharing our work! If you are interested in multimodal agents on open-web search, please see our thread for more details: https://t.co/uh9rr6hFcZ

@_akhaliq โ€ข Thu Apr 16 18:22

MERRIN A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments paper: https://t.co/UZpJdGxIxY https://t.co/ZmRa2TcuAu

๐Ÿ”Adam_Fish retweeted
B
Boris Cherny
@bcherny
๐Ÿ“…
Apr 16, 2026
13h ago
๐Ÿ†”35156457
โญ0.32

Dogfooding Opus 4.7 the last few weeks, I've been feeling incredibly productive. Sharing a few tips to get more out of 4.7 ๐Ÿงต

โค๏ธ6,280
likes
๐Ÿ”506
retweets
Z
zostaff
@zostaff
๐Ÿ“…
Apr 16, 2026
8h ago
๐Ÿ†”01092752
โญ0.42

AI FOOTBALL ANALYSIS. A FULL COMPUTER VISION SYSTEM. BUILT ON YOLO, OPENCV, AND PYTHON. You upload a regular match video. No sensors, no GPS trackers, just camera footage. The neural network finds every player, referee, and ball on its own. Every frame, in real time. KMeans clustering breaks down jersey colors pixel by pixel. The system splits players into teams automatically. Without a single manual hint. Optical Flow tracks camera movement. Separates it from player movement. Perspective Transformation converts pixels into real meters. Speed of every player. Distance covered. Ball possession percentage. All calculated automatically. Four hours of tutorial from zero to a working system. The model is trained on real Bundesliga matches. Runs on a regular GPU. Python code - take it and run. Sports analytics is no longer behind closed doors. AI leveled the playing field.

@ โ€ข

M
modal
@modal
๐Ÿ“…
Apr 16, 2026
8h ago
๐Ÿ†”18344802
โญ0.42

Next Tuesday 12pm EST: @erikdunteman will break down the custom agent harness we launched with Modal sandboxes + @OpenAIDevs Agent SDK. Sandboxes, parallel coding agents, context mgmt, and more. Register here: https://t.co/HAIsKAJY6I

@erikdunteman โ€ข Thu Apr 16 20:36

Yesterday we launched our custom agent harness built for parallel background coding tasks, built on @modal sandboxes and @OpenAIDevs Agent SDK. I'll be talking in greater depth about harness design, sandboxes, context management, and more this Tuesday, link below https://t.co/mY

O
omarsar0
@omarsar0
๐Ÿ“…
Apr 16, 2026
8h ago
๐Ÿ†”91370826
โญ0.34

We need more challenging benchmarks to test long-horizon coding capabilities. FrontierSWE looks like a nice new set of tasks to test out your best coding agents or harnesses.

@MatternJustus โ€ข Thu Apr 16 20:30

Introducing FrontierSWE, an ultra-long horizon coding benchmark. We test agents on some of the hardest technical tasks like optimizing a video rendering library or training a model to predict the quantum properties of molecules. Despite having 20 hours, they rarely succeed http

J
jerryjliu0
@jerryjliu0
๐Ÿ“…
Apr 16, 2026
9h ago
๐Ÿ†”46363016

We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench - our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results ๐Ÿง‘โ€๐Ÿ”ฌ: - Opus 4.7 is a general improvement over Opus 4.6. It has gotten much better at charts compared to the previous iteration - Opus 4.7 is quite good at tables, though not quite as good as Gemini 3 flash - Opus 4.7 wins on content faithfulness across all techniques (including ours) - Using Opus 4.7 as an OCR solution is expensive at ~7c per page!! For comparison, our agentic mode is 1.25c and cost-effective is ~0.4c by default. Take a look at these results and more on ParseBench! https://t.co/tYiSOMbd6p

@llama_index โ€ข Thu Apr 16 21:11

Anthropic says Opus 4.7 hits 80.6% on Document Reasoning โ€” up from 57.1%. But "reasoning about documents" โ‰  "parsing documents for agents." We ran it on ParseBench. โ†’ Charts: 13.5% โ†’ 55.8% (+42.3) โ€” huge โ†’ Formatting: 64.2% โ†’ 69.4% (+5.2) โ†’ Content: 89.7% โ†’ 90.3% (+0.6) โ†’ T

Media 1
๐Ÿ–ผ๏ธ Media
D
dair_ai
@dair_ai
๐Ÿ“…
Apr 16, 2026
10h ago
๐Ÿ†”21895729

Coding agents learn from experience, but that knowledge stays locked in silos. Solve a thousand SWE tasks, and none of that wisdom helps with competitive coding. What if memories could transfer across domains? The work introduces Memory Transfer Learning, a framework where coding agents share a unified memory pool across 6 heterogeneous benchmarks. They test four memory formats ranging from raw execution traces to high-level insights, and find that cross-domain memory improves average performance by 3.7%. Why does it matter? The transferable value isn't task-specific code. It's meta-knowledge: validation routines, structured action workflows, safe interaction patterns with execution environments. Algorithmic strategy transfer accounts for only 5.5% of the gains. The real benefit comes from procedural guidance on how to act, not what to code. Abstraction dictates transferability: high-level insights generalize well, while low-level execution traces often cause negative transfer by anchoring agents to incompatible implementation details. Paper: https://t.co/XPD5kczsoZ Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1
๐Ÿ–ผ๏ธ Media
S
sean_t_strong
@sean_t_strong
๐Ÿ“…
Apr 16, 2026
10h ago
๐Ÿ†”45390042
โญ0.46

@emollick Hey Ethan! Sean here, PM on https://t.co/KZTlPpbqBQ - thanks for the feedback. This isn't a router, this is the model being trained to decide when to think based on the context -- we've been running this for a while in Sonnet 4.6 in https://t.co/KZTlPpbqBQ as well as Claude Code. Understood that it's not tuned perfectly in https://t.co/3Rk7wAMA7D yet - we're sprinting on tuning this more internally and should have some updates here shortly. Feel free to DM us examples of queries where you expected thinking and didn't see it

W
wandb
@wandb
๐Ÿ“…
Apr 16, 2026
10h ago
๐Ÿ†”34147917
โญ0.38

๐Ÿคฏ Wild day in AI โ€” three major releases all dropped today! Anthropic's Claude Opus 4.7, OpenAI's big Codex update (computer use, built-in browser, image gen, memory), and Alibaba's open-weights Qwen. Which one are you most excited by?

I
iScienceLuvr
@iScienceLuvr
๐Ÿ“…
Apr 16, 2026
10h ago
๐Ÿ†”39992588

We do!! @SophontAI has released the Medmarks benchmark suite, which is the largest completely open-source automated evaluation suite for medical capabilities. (new version coming soon) We'd love to help any frontier lab evaluate their model using our suite! https://t.co/ACNe1b9Vko

@DrellLabs โ€ข Thu Apr 16 21:00

@iScienceLuvr Does Sophont have/building its own bench?

Media 1Media 2
๐Ÿ–ผ๏ธ Media
N
NickADobos
@NickADobos
๐Ÿ“…
Apr 16, 2026
11h ago
๐Ÿ†”92877028

With codex computer use + mac's iPhone Mirror app, GPT can use any app on your phone!!! Seems less accurate with clicks vs actual mac desktop apps, but it does work! Whats the best app to use this with?! https://t.co/44BbT9UJie

๐Ÿ–ผ๏ธ Media
D
dkundel
@dkundel
๐Ÿ“…
Apr 16, 2026
11h ago
๐Ÿ†”54278983
โญ0.32

Because computer use in Codex doesn't take over your own cursor so Codex can work in the background and you can truly cursor max! ๐Ÿ”ฅ

@priyashah_ โ€ข Thu Apr 16 20:13

brb, cursor-maxxing https://t.co/gryiF1MCik

M
Modular
@Modular
๐Ÿ“…
Apr 16, 2026
11h ago
๐Ÿ†”81024968
โญ0.42

We partnered with @ProximalHQ to run five frontier coding agents on a hard task: rebuild the full Wan 2.1 text-to-video pipeline on MAX (no PyTorch, no diffusers) in 20 hours as part of their new Frontier-SWE benchmark. Two nearly pulled it off. Every model understood the architecture. The agents that produced results kept debugging numerical errors layer by layer. Several others just tried to smuggle in a torch import. Full report: https://t.co/Se2MJ4ySok

๐Ÿ”s_batzoglou retweeted
O
OpenAI
@OpenAI
๐Ÿ“…
Apr 16, 2026
12h ago
๐Ÿ†”11850863
โญ0.38

Introducing GPT-Rosalind, our frontier reasoning model built to support research across biology, drug discovery, and translational medicine. https://t.co/PubLU0FkSv

โค๏ธ4,663
likes
๐Ÿ”395
retweets
Page 1 of 158Next โ†’