Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
R
renckorzay
@renckorzay
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”56834857

Google just gave AI agents access to your credit card. Yesterday they launched Agent Payments Protocol (AP2), which lets AI complete purchases for you 100% autonomously. Here's why you'll never transact the same way again: ๐Ÿงต https://t.co/bZpOgNPxeo

Media 1Media 2
๐Ÿ–ผ๏ธ Media
A
alexutopia
@alexutopia
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”14502420

If youโ€™re not following @Scobleizerโ€™s X lists, youโ€™re missing out. Just activated a full deck in X Pro ๐Ÿคฏ I can see everything now! https://t.co/0ohMEfFoKQ

Media 1
๐Ÿ–ผ๏ธ Media
P
PawelHuryn
@PawelHuryn
๐Ÿ“…
Sep 16, 2025
221d ago
๐Ÿ†”31897459

I got permission to publish a new AI Evals FAQ (Sep, 2025). Itโ€™s massive. And it's a goldmine for engineers and AI PMs. @HamelHusain and @sh_reya run the worldโ€™s No. 1 AI Evals course. Together with top AI architects and ML researchers, they answer the most common questions from 1,500+ students. 51 pages of unique insights and resources. And 100% free. Some of the questions: Q: What are LLM Evals? Q: Whatโ€™s a minimum viable evaluation setup? Q: Why is "error analysis" so important in LLM evals, and how is it performed? Q: What is the best approach for generating synthetic data? Q: Are there scenarios where synthetic data may not be reliable? Q: Why do you recommend binary (pass/fail) evaluations? Q: Should I use "ready-to-use" evaluation metrics? Q: How many people should annotate my LLM outputs? Q: Should PMs and engineers collaborate on error analysis? How? Q: What parts of evals can be automated with LLMs? Q: Should I stop writing prompts manually in favor of automated tools? Q: What makes a good custom interface for reviewing LLM outputs? Q: What gaps in eval tooling should I be prepared to fill myself? Q: How should I version and manage prompts? Q: How are evaluations used differently in CI/CD vs. monitoring production? Q: Whatโ€™s the difference between guardrails & evaluators? Q: Is RAG dead? Q: How should I approach evaluating my RAG system? Q: How do I evaluate sessions with human handoffs? Q: How do I evaluate complex multi-step workflows? Q: How do I evaluate agentic workflows? Get a full PDF (Google Drive, 51 pages): https://t.co/4azaPGfIxr Hope that helps! Feel free to share it with your network. โ€” P.S. The guys shared a crazy amount of knowledge for free. But if you want to dive even deeper, here's a 35% discount for the AI Evals Cohort: https://t.co/MtnOgX99i9 (The next cohort: Oct 6โ€”Nov 1, 2025)

Media 1Media 2
+2 more
๐Ÿ–ผ๏ธ Media
H
HamelHusain
@HamelHusain
๐Ÿ“…
Sep 16, 2025
221d ago
๐Ÿ†”54972230

These are Xโ€™s ad creative rules, which seem worth paying attention to even if you arenโ€™t using X for ads because they will likely down rank organic content with this? (Already knew about URLs, but not emojis) https://t.co/n8iEOh9QQn

Media 1
๐Ÿ–ผ๏ธ Media
H
HamelHusain
@HamelHusain
๐Ÿ“…
Sep 16, 2025
221d ago
๐Ÿ†”97619197

@JnBrymn Making good progress but need to be more agressive https://t.co/E9HAskiE3X

Media 1
๐Ÿ–ผ๏ธ Media
C
claudeai
@claudeai
๐Ÿ“…
Sep 16, 2025
221d ago
๐Ÿ†”49447408

Multi-directory support: https://t.co/Pccmmf9KEE

๐Ÿ–ผ๏ธ Media
๐Ÿ”HamelHusain retweeted
C
Claude
@claudeai
๐Ÿ“…
Sep 16, 2025
221d ago
๐Ÿ†”49447408

Multi-directory support: https://t.co/Pccmmf9KEE

โค๏ธ493
likes
๐Ÿ”22
retweets
๐Ÿ–ผ๏ธ Media
C
charles_irl
@charles_irl
๐Ÿ“…
Sep 04, 2025
233d ago
๐Ÿ†”42215323

I like making GPUs go brrt at @modal. I wrote up what I've learned along the way in an extension to the GPU Glossary -- our "CUDA Docs for Humans". Introducing: the GPU ๐”“๐”ข๐”ฏ๐”ฃ๐”ฌ๐”ฏ๐”ช๐”ž๐”ซ๐” ๐”ข Glossary. https://t.co/9IDfgGqVFX https://t.co/ESU62gVa0A

Media 2
๐Ÿ–ผ๏ธ Media
S
sh_reya
@sh_reya
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”46637036

I am giving a talk in ~1 hr on some of recent work we have been doing at Berkeley around the DocETL and DocWrangler projects! The talk is titled "Agentic Query Optimization for Unstructured Data Processing," and it will be livestreamed on youtube: https://t.co/S5hC4oM5kS

Media 1
๐Ÿ–ผ๏ธ Media
H
HamelHusain
@HamelHusain
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”25844081

Making Eval memes. (Click to expand this one) https://t.co/IsgCxiU2CB

Media 1
๐Ÿ–ผ๏ธ Media
L
lennysan
@lennysan
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”78647913

Also learning evals is the #1 course on Maven right now https://t.co/GG34qRqNYY

@lennysan โ€ข Wed Sep 17 18:47

Trend I'm following: How AI labs' hunger for high-quality evals and data-labeling is creating some of the fastest growing and profitable companies in the world, e.g. - @mercor_ai ($1m -> $500m in 17 months ๐Ÿ˜ฎ) - @joinHandshake ($0 -> $100m in <12 months ๐Ÿ˜ฎ) - @HelloSurgeAI ($1.5B+

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”HamelHusain retweeted
L
Lenny Rachitsky
@lennysan
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”78647913

Also learning evals is the #1 course on Maven right now https://t.co/GG34qRqNYY

Media 1
โค๏ธ81
likes
๐Ÿ”9
retweets
๐Ÿ–ผ๏ธ Media
E
eugeneyan
@eugeneyan
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”64651336

Demo of the Qwen3-recommender hybrid returning both semantic IDs and natural language! โ€ข steering recs via natural language โ€ข explaining the recommendation โ€ข naming the bundle of recommendations โ€ข multi-turn conversation to get recs watch till the end for the bloopers lol https://t.co/mHpKhiY7OQ

๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”26265937

Learning happens, but needs many examples. With 50โ€“100 examples in a prompt, accuracy improves steadily and models of different sizes and brands start looking similar. This challenges the common few-shot story: a handful of examples usually isnโ€™t enough. https://t.co/V7H79E42F6

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”43229665

Prompt wording matters less over time If you replace instructions with random word salad, performance eventually catches up, as long as the exemplars remain intact. But if you scramble the examples themselves (โ€œsalad-of-thoughtโ€), performance collapses. https://t.co/VzlMgkpxRe

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”86628214

Weak on robustness. When the test data looks different from the training examples (distribution shift), performance drops sharply. Chain-of-thought and automated prompt optimization are especially brittle. https://t.co/lU8cnBAxoQ

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”30500103

Not all tasks are equal. Some formal problems (like simple pattern matching) are nearly solved, while others (like string reversal or certain arithmetic tasks) remain tough. Interestingly, two tasks that look similar can differ by as much as 31% in accuracy. https://t.co/qkZgJliUng

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 18, 2025
219d ago
๐Ÿ†”89077366

Towards a Physics Foundation Model Proposes GPhyT (General Physics Transformer), a large transformer trained on 1.8 TB of simulation data across fluid flows, shock waves, heat transfer, and multiphase dynamics. Here are a few key notes: https://t.co/CKkW9mQGsM

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 18, 2025
219d ago
๐Ÿ†”50232615

How the model works Think of GPhyT as a hybrid of a neural net and a physics engine. It takes in a short history of whatโ€™s happening (like a few frames of a simulation), figures out the rules of change from that, then applies a simple update step to predict what comes next. Itโ€™s like teaching a transformer to play physics frame prediction with hints from basic calculus.

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 18, 2025
219d ago
๐Ÿ†”21207028

The data it learned from Instead of sticking to one type of fluid or system, the team pulled together 1.8 TB of simulations covering many different scenarios: calm flows, turbulent flows, heat transfer, fluids going around obstacles, even two-phase flows through porous material. Variable ฮ”t sub-sampling and per-dataset normalization encourage in-context inference across scales.

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 18, 2025
219d ago
๐Ÿ†”73496261

How well it performs On single-step forecasts across all test sets, GPhyT cuts median MSE vs. UNet by about 5ร— and vs. FNO by about 29ร— at similar parameter counts. https://t.co/1ELUh7WMwG

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 18, 2025
219d ago
๐Ÿ†”84703987

Generalization to new cases The wild part: you can hand it a new situation it never trained on, like a boundary condition it hasnโ€™t seen, or even supersonic flow, and it will still produce physically plausible results. It doesnโ€™t nail the fine details, but it gets the big structures right, like correctly forming bow shocks.

Media 1
๐Ÿ–ผ๏ธ Media
O
omarsar0
@omarsar0
๐Ÿ“…
Sep 18, 2025
219d ago
๐Ÿ†”85359491

Stability over time Rollouts over 50 steps show the model keeps the overall dynamics consistent. Tiny details fade as predictions accumulate, but the large-scale behavior (like global flow structure) stays believable much longer than youโ€™d expect for a learned model. This tells me that there is still an area of exploration for transformer-based foundation models for different domains. Paper: https://t.co/WVlW51Qkqt

Media 1Media 2
๐Ÿ–ผ๏ธ Media
W
winglian
@winglian
๐Ÿ“…
Sep 07, 2025
230d ago
๐Ÿ†”41424304

@kalomaze @cocinomial https://t.co/v2GjSCOW24

Media 1
๐Ÿ–ผ๏ธ Media
J
jobergum
@jobergum
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”49886341

We are hiring at https://t.co/kWsq4hb8PC - Iโ€™m in SF next week too so would love to connect with builders. Letโ€™s go! https://t.co/OZfFyGxghe

Media 1Media 2
๐Ÿ–ผ๏ธ Media
๐Ÿ”jxnlco retweeted
J
Jo Kristian Bergum
@jobergum
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”49886341

We are hiring at https://t.co/kWsq4hb8PC - Iโ€™m in SF next week too so would love to connect with builders. Letโ€™s go! https://t.co/OZfFyGxghe

Media 1
โค๏ธ33
likes
๐Ÿ”4
retweets
๐Ÿ–ผ๏ธ Media
J
jxnlco
@jxnlco
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”32217802

happening in 30 minutes! enroll even if you can't make it and we'll send you the recording and notes! https://t.co/sphGArPkZL

Media 1
๐Ÿ–ผ๏ธ Media
J
jxnlco
@jxnlco
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”26503107

first office hours of our rag cohort https://t.co/RSvLGzkEs5

Media 1
๐Ÿ–ผ๏ธ Media
J
jxnlco
@jxnlco
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”93387490

We kicked things off today. Check out the course here: https://t.co/UsWStpXh1D https://t.co/FVJ031l9l3

Media 1Media 2
๐Ÿ–ผ๏ธ Media
J
jxnlco
@jxnlco
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”25207934

Most teams debug reactively, missing systematic patterns that could prevent user churn before it happens. Ben Hylak (CTO of Raindrop, ex-Apple Vision Pro/SpaceX/Google) teaches you to proactively identify which user intents break your agents and build monitoring that catches issues before users leave. "Reliable Agents: Intent-Driven Failure Detection" - Oct 1, 6PM UTC / 2PM EST / 11AM PDT. If you want to get the study notes or the recording, everything will be sent to participants. Just make sure to enroll! ๐Ÿ›ก๏ธ https://t.co/fawWTnMtjZ

Media 1
๐Ÿ–ผ๏ธ Media
J
jxnlco
@jxnlco
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”23594016

Learn more about Systematically Improving RAG Applications here: https://t.co/UsWStpXh1D

Media 1
๐Ÿ–ผ๏ธ Media
P
perceptroninc
@perceptroninc
๐Ÿ“…
Sep 17, 2025
220d ago
๐Ÿ†”70150077

1/ Introducing Isaac 0.1 โ€” our first perceptive-language model. 2B params, open weights. Matches or beats models significantly larger on core perception. We are pushing the efficient frontier for physical AI. https://t.co/dJ1Wjh2ARK https://t.co/hf3aq3Vb4i

Media 1Media 2
๐Ÿ–ผ๏ธ Media