Your curated collection of saved posts and media
ποΈIntroducing Max Agency Max Agency is a new podcast where we go deep on how the best agents are actually being built: architecture decisions, tradeoffs, evals, and everything in between. Each episode, I sit down with engineering leaders who are doing this work in production. Our first episode features Izzy Miller (@isidoremiller), AI Engineer at Hex (@_hex_tech). Hex has been shipping data agents since before most teams were even thinking about them, starting with single-cell text-to-SQL and graduating to a full Notebook agent that can work autonomously for 20 minutes on a complex analysis. Izzy has a lot of perspective on what it actually takes to get agents working well in production, and what breaks along the way. A few takeaways from our conversation: - Keep your eval sets small enough to hold in your head: Izzy runs 30-50 handcrafted "traps" with multiple repetitions, rather than hundreds of variants. If you can't explain why your agent fails each one, your eval set is too big - Day zero performance is almost irrelevant: The more interesting question is how the agent compounds. Izzy is building a 90-day simulation where the warehouse evolves and the agent has to accumulate understanding - You can catch agent errors without seeing the raw outputs: By running an LLM-as-a-judge over production usage and clustering the results, you can surface places where something likely went wrong, without needing to read individual conversations Watch the full episode on: - Youtube: https://t.co/AdkQbV3Pq2 - Apple Podcasts: https://t.co/1MKF7mcYSr - Spotify: https://t.co/DxACw24oob

Embarrassingly Simple Self-Distillation Improves Code Generation paper: https://t.co/Iwyx1ebDCW https://t.co/iWry3xlxz6
MedGemma 1.5 Technical Report paper: https://t.co/LBgoAzd4A8 https://t.co/mt28b1UxLU
Think in Strokes, Not Pixels Process-Driven Image Generation via Interleaved Reasoning paper: https://t.co/SPggWj7Hvx https://t.co/Vsc0qCo6aY
RAGEN-2 Reasoning Collapse in Agentic RL paper: https://t.co/nMo9xTq9x6 https://t.co/qjIKTUNMW1
MARS Enabling Autoregressive Models Multi-Token Generation paper: https://t.co/dUJac9spi7 https://t.co/sWfZ5Vx6CH
New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys @lmsysorg and RadixArk @radixark, and taught by Richard Chen @richardczl, a Member of Technical Staff at RadixArk. Running LLMs in production is expensive, and much of that cost comes from redundant computation. This short course teaches you to eliminate that waste using SGLang, an open-source inference framework that caches computation already done and reuses it across future requests. When ten users share the same system prompt, SGLang processes it once, not ten times. The speedups compound quickly, especially when there's a lot of shared context across requests. Skills you'll gain: - Implement a KV cache from scratch to eliminate redundant computation within a single request - Scale caching across users and requests with RadixAttention, so shared context is only processed once - Accelerate image generation with diffusion models using SGLang's caching and multi-GPU parallelism Join and learn to make LLM inference faster and more cost-efficient at scale! https://t.co/vUiu6goWCO
Weβre excited to announce today that @mastra has raised a $22M Series A led by Spark Capital. This brings our total capital raised to $35M: https://t.co/CHKHyf2hn8
π¨ WhatsAppβs βend-to-end encryptedβ privacy is a total lie. New class-action lawsuit just dropped: Meta secretly let employees, contractors like Accenture, and third parties read, intercept, and store your private messages WITHOUT consent. All while marketing it as βonly you and the recipient can read it.β Zuck lied to billions. Your chats were never safe.
Yesterdayβs Mythos announcement from Anthropic was overblown. β’Β Sandboxing was turned off, so test didnβt show much about the real world. β’Β Cheap open-weight models can (already) do some similar stuff β’Β No evidence that Mythos itself is a major qualitative jump. In short, we got played. cc @tomfriedman @RonanFarrow

Transformers.js monthly downloads π€―π 2023/03 β3.2K 2024/03 β 266K 2025/03 ββ 764K 2026/03 ββββββββββββ 4.4M Thanks so much to this amazing community! π€ We're excited to see what you build next! https://t.co/ruvsLMQNGl
βBut the curse of every ancient civilization was that its men in the end became unable to fight. Materialism, luxury, safety, even sometimes an almost modern sentimentality, weakened the fibre of each civilized race in turn; each became in the end a nation of pacifists, and then each was trodden under foot by some ruder people that had kept that virile fighting power the lack of which makes all other virtues useless and sometimes even harmful.β -Teddy Roosevelt
I just built my own wiki generator plugin for my agents. My agents can now generate wikis for anything I ask. One of my favorite wikis is called PaperWiki. This is a great example of what @karpathy describes. It uses obsidian vaults to organize papers, retrieve LLM-generated summaries, diagrams, and other advanced views for paper exploration. When Obsidian UI is not enough, I use my own artifact generator inside my agent orchestrator (see clip for example). This allows my agents to build any kind of view or exploration feature that I need. The papers are all curated with automations and several rules/patterns I have manually built over the years. On the surface, this looks basic. But behind the scenes, there are advanced search capabilities, connections, metadata, derived data, and other interesting bits of information that are extremely useful for my research agents. This is mostly built for agents. The artifact preview is just a high-level way to validate and quickly assess the quality of the wiki, suggest improvements, and it's also great for research. I use @tobi's qmd for all search capabilities. Everything is markdown. The summaries and even the diagrams. The wiki updates on its own based on several automations I have optimized over the past couple of weeks. The wiki grows and self-improves based on several requirements important for my research use cases. This is as personalized as it gets. There is nothing like it out there. And I use my research expertise to continue improving it over time. This is a vanilla wiki. There are so many things I want to build on top of this. Different aggregations, views, artifacts, etc. All to help automate more of my research work and accelerate productivity. I think the biggest leverage here is how powerful this could be for discovery and experimentation. One of my goals is to use it to find deeper connections and insights that would otherwise elude the top human researchers and use those to generate interesting new hypotheses and research experiments. That way, my agents can use autoresearch to explore research ideas at the frontier. Stay tuned for more.
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating know
@PrintedPathways @igormomentum @karpathy @THDX @rauchg @mitchellh @dhh @addyosmani @alexwg I'm far better at "chronicling" than anyone else. Plus I have the most complete lists of AI industry here on X: https://t.co/9eRY65x3IQ (by far, second isn't even close). I also built a news site that goes through everyone in AI here on X and finds you the best: https://t.co/8L5xphk0qQ
@bstract_thot My brain is so fried I thought you meant aspergillus https://t.co/rtCwXNRsI9
@igormomentum @karpathy @THDX @rauchg @mitchellh @dhh @addyosmani Some things about me. I curate the best of the AI industry. And have done so ever since the very beginning. Been here 19 years. Siri, the first consumer app that used AI, was launched in my home. So no one else can claim to be there "from the beginning of AI." Insta360, the first camera to use AI, and the MaticRobot company, the first to use computer vision in a robot, were also launched in my home. Every day I read thousands of posts and share the best from the AI world. And I have the only site that reads all of X's AI community and finds you the best: https://t.co/8L5xphk0qQ My lists are the most complete of AI industry here on X. By far. https://t.co/9eRY65x3IQ Following lists is a lot better for your feed than following individual people. Plus mine are complete, which you never will be by following individual people (most people's accounts are kept from following more than 2,000 here on X until you get more than 2,000 followers yourself). I also wrote eight books about technology futures, which are highly reviewed and predicted decade-long change. Last night I had the CTO of @nousresearch on my audio space, which was spectacular (OpenClaw competitor). I'd suggest you follow me, include me on your lists, but better yet follow my lists. They are really good and no one else has anything like them.
@ZainanZhou I built the best news site to follow the AI community here on X: https://t.co/kiuZ7QXLzb And I have the most complete lists of builders in tech industry. By far: https://t.co/fasUz7PuHq
@Shy_Agarwal @SCSatCMU @msrconf Awesome. You might check out my AI on X news site. It collects all the best papers three times a day: https://t.co/kiuZ7QXLzb
.@ApacheArrow is downloaded hundreds of millions of times each month. It's the columnar format behind Pandas 2.0, Polars, Spark, DataFusion, and Parquet. @kszucs_, Arrow PMC Member and @huggingface Open Source Engineer, built Marrow, a pure-Mojo Arrow implementation, and it's already 1.3-3.9x faster than PyArrow on conversions. Mojo's SIMD abstractions enable vectorized conversion without Python overhead. Zero-copy PyArrow interop, early GPU support via DeviceContext, and a lot more to come: https://t.co/OqQsz7w6w5
AI chatbots are becoming part of peopleβs emotional lives. With widespread use, many are turning to them for advice, support and even companionship, raising questions about their impact on mental health. The challenge is not just usage, but dependence. When support becomes artificial, the effects can be hard to measure. https://t.co/dvBSmZxNeD @ConversationUS
Everyone's talking about Anthropic's new model discovering new security vulnerabilities. What people aren't talking about is the millions of KNOWN vulnerabilities remaining unfixed due to lack time, interest, etc. e.g. OpenClaw has 67 CVEs right now, including 4 critical ones. https://t.co/HZ1FaiD1hb
@igormomentum Follow my AI Newsmaker list: https://t.co/fasUz7PuHq 1,900 hand picked from 50,000. But my AI reads everyone on X and finds the best three times a day: https://t.co/kiuZ7QXLzb
3,400 likes and only one guy told him about my lists: https://t.co/fasUz7PuHq And no one suggested he watch my AI on X News site: https://t.co/kiuZ7QXLzb Sigh.
Here are some best accounts to follow for original content on AI, engineering and design: @karpathy β on llms @thdx β opencode creator @rauchg β vercel ceo @mitchellh β ghostty, ex-hashicorp founder @dhh β ruby on rails creator, 37signals/basecamp cto @addyosmani β google clo
INSPATIO-WORLD A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling paper: https://t.co/8pRBe73t6B https://t.co/FEHdtXA2Mg
https://t.co/thNKojJJoI
Congratulations to Alex and the whole team at MSL. As a sucker for all things speedy (https://t.co/Od1fQoL3FQ), I thought this was an impressive chart: https://t.co/7S5sgXeQNW
https://t.co/thNKojJJoI
@igormomentum @karpathy @THDX @rauchg @mitchellh @dhh @addyosmani I have 40,000 in AI all arranged in lists: https://t.co/fasUz7PuHq But I built you an AI that reads them all and finds the best: https://t.co/kiuZ7QXLzb
Computer now connects with Plaid to link bank accounts, credit cards, and loans. Track spending in detail, build custom budget tools, and visualize your net worth alongside your investment portfolio. https://t.co/m9nws4VjKO
Mythos' Firefox exploitation didn't actually have sandbox enabled and built on top of research from Opus. Shocker. https://t.co/xwWUsb82hW
We're live in 5 minutes! Today we're about Modernizing Legacy .NET applications with GitHub Copilot Modernization in VS Code. Join us to chat live with the team βΆοΈ https://t.co/Ki2NMlPhjv https://t.co/oXorLDBCBT
Agents like @openclaw are incredibly powerful, as long as the information they receive is clean and structuredπ¦ When it comes to PDFs and other unstructured documents, most agents struggle. The tools they rely on often return only raw text, losing critical context like layout, tables, and imagesβ Thatβs why we created LlamaParse and LiteParse Agent Skills, designed to give agents access to a deeper layer of document understanding, enabling more reliable knowledge extraction and automation across complex documentsπ πLearn more about the problem, and how the skills solve it: https://t.co/dn33HE6Z0k π¦ Get started with LlamaParse: https://t.co/ADG9CPTcAV
The real story of AI in enterprises is more grounded than the hype suggests. While many pilots fail, adoption is happening where ROI is clear, in areas tied to measurable impact rather than experimentation alone. The gap is not whether AI works, but where it works best. Enterprise AI is moving from curiosity to select, proven use cases that actually deliver value. https://t.co/KXXDtTvYXv @a16z @kimberlywtan