Your curated collection of saved posts and media
omg this title, this paper https://t.co/OXL9C4v2nX
omg this title, this paper https://t.co/OXL9C4v2nX

Fascinating paper with so many interesting observations. One that jumped out to me, which arguably could have got more attention, is the divergence between discrimination and calibration of agents. Calibration (see "CAL" on the predictability column) β the alignment between predicted confidence and actual accuracy β has improved noticeably in recent frontier models. But discrimination ( "AUROC" on the predictability column) β the ability to distinguish tasks the agent will solve from those it won't β shows divergent trends and has in some cases worsened. This matters enormously for deployment in real world contexts. An agent can be well-calibrated in aggregate (e.g. saying "I'm 70% confident" and being right 70% of the time) while being completely unable to flag which specific tasks it will fail at. Discrimination is therefore critical for anyone building autonomous workflows. You need the agent to know when to escalate, rather than just having good statistical properties across a population of tasks. I'm intrigued by what this means from a hardware perspective. Most of these reliability failures will stem from properties of model weights and training. But if this paper is correct, and trends in agent reliability continue to lag capabilities, it creates a strong case for architectures that enable rapid re-inference and consistency-checking (running the same query multiple times and comparing outputs). Here, low-latency, high-throughput inference hardware would have an outsized advantage. In this sense, the reliability tax on compute is basically a multiplier on inference demand.
Most agents donβt fail on modelsβ¦ they fail on context: those ugly, messy, complex documents that trip up even the latest LLMs (PDFs, tables, messy scans). Don't worry. We got you. π VC-backed (seed+) startup? Join the LlamaParse Startup Program: β free credits β dedicated slack channel + priority support β alignment call with our founder Jerry Liu β community spotlight (millions of devs) β production-ready ingestion pipelines Apply today spots are limited β https://t.co/61csPhQULp
LlamaIndex has evolved far beyond a RAG framework - we're now focused on agentic document processing that automates knowledge work. π Agent orchestration has fundamentally changed with sophisticated reasoning loops, tool discovery through Skills/MCP, and coding agents that write Python for you π Document understanding remains a massive opportunity - frontier vision models still struggle with complex tables, charts, and long documents at scale π’ LlamaParse now processes 300k+ users across 50+ formats for enterprises like @OneCarlyle, @CEMEX, and @KPMG with multi-agent workflows combining OCR, computer vision, and LLM reasoning βοΈ Real automation potential exists in workflows where humans manually process documents daily - financial analysis, contract review, insurance underwriting can all become end-to-end agentic processes Our mission is now providing core infrastructure to automate knowledge work over documents, not just being connective tissue between LLMs and data. Read about our evolution and what's next: https://t.co/M0DbsIdGrF

3 years ago, you mightβve known @llama_index as a RAG framework. Today we are not a RAG framework. We are an agentic document processing platform π¦π I wrote a blog post detailing the evolution of our company over the past ~3 years and why we believe our current position is enduring in the rapidly evolving landscape of evolving AI. There are two main points that I want to highlight: 1οΈβ£ One of the most important opportunities in todayβs world is to provide high-quality unstructured context to AI agents. We see ourselves as the best in class OCR module that can unlock context from the hardest document containers (PDFs, Word, Powerpoint, Excel, and more) 2οΈβ£ Agent reasoning loops have gotten a lot more sophisticated. General LLM abstractions are a lot less relevant. Retrieval patterns have completely changed. We need to build deep, focused tooling that actually provides value in this world of long-running agents. Note: We are not giving up on OSS tooling. We think open-source software is extremely important for democratizing AI access. We will continue to build OSS that is more aligned with our core focus area of AI-native document processing. We will continue to support framework users and point them to updated resources for relevant releases. Come check out our blog: https://t.co/2hGgzYtI3v Our core managed platform is LlamaParse. If youβre interested come check out our platform: https://t.co/TqP6OT5U5O

If you need to split complex or composite documents into structured categories or sections, LlamaSplit is built for the job βοΈ With the intuitive UI, you can: β’Define a custom configuration for how your documents should be categorized β’Specify the exact sections or impact types you want extracted β’Run the job and explore the results through an interactive interface π In this walkthrough, @itsclelia demonstrates how to configure LlamaSplit to break down Environmental Impact Reports into clearly defined impact categories π³ π₯ Watch the full video here: π Or get started right away with the docs (UI + code examples): https://t.co/kAMUqwOCDW
Huge thank you to everyone who joined the @GoogleDeepMind hackathon in NYC with us over the weekend π Our DevRel @tuanacelik gave a 30 minute workshop to get participants started on document agents with LlamaParse. We saw some amazing projects being submitted with no lack of creativity and imagination. Congrats to the 3 winning teams, and see you next time!
Adobe Acrobat has PDF splitting. We have agentic PDF splitting π€βοΈ Simply define the categories you want in natural language, and our split agent will automatically βchunkβ the document into subsets of pages and tag them with the appropriate categories. This is super useful to break apart complicated document packets like resumes, tax forms, identification docs, expense reports, and more. Check out @itscleliaβs video below, and come sign up to LlamaParse if youβre interested! Docs: https://t.co/UdxT3sJfkF LlamaParse: https://t.co/TqP6OT5U5O
Creating agent workflows and architecting the logic is one thing, making them durable, fail-safe, and scalable is anotherπ New integration for durable agent workflows with @DBOS_Inc execution - Make sure your agents survive crashes, restarts, and errors without writing any checkpoint code. π Every step transition persists automatically - workflows resume exactly where they left off β‘ Zero external dependencies with SQLite, or scale to multi-replica deployments with Postgres π―ββοΈ Built for replication - each replica owns its workflows, with Postgres coordinating across instances π€ Idle release feature frees memory for long-running workflows waiting on human input π‘οΈ Built-in crash recovery detects and relaunches incomplete workflows automatically This integration with DBOS removes all the manual snapshot work from durable workflows. Just pass a DBOS runtime to your workflow and get great reliability β whether you're running a single process or multiple replicas in production. Learn how to build durable agents on our new docs: https://t.co/9AfefFWkXl
"Just send the PDF to GPT-4o" Ok. We did. Here's what happened: β’ Reading order? Wrong. β’ Tables? Half missing. β’ Hallucinated data? Everywhere. β’ Bounding boxes? Nonexistent. β’ Cost at 100K pages? Brutal. So we're doing it live. LlamaParse vs. The LLMs β a free webinar where we parse the ugliest documents we can find across every leading model and show the results side by side. Hosted by George, Head of Engineering, LlamaIndex When: March 26th; 9 AM PST Register π https://t.co/To4m9Zmu7m
I love the Big Arch Burger π I also love Big Harnessesβ’ and Big Complex PDFsβ’ with hundreds of pages of tables, images and forms. https://t.co/deD8sUcyj0
"Just send the PDF to GPT-5.4" Ok. We did. Here's what happened: β’ Reading order? Wrong. β’ Tables? Half missing. β’ Hallucinated data? Everywhere. β’ Bounding boxes? Nonexistent. β’ Cost at 100K pages? Brutal. So we're doing it live. LlamaParse vs. The LLMs β a free webinar where we parse the ugliest documents we can find across every leading model and show the results side by side. Hosted by George, Head of Engineer at @llama_index Register π https://t.co/To4m9ZlWhO
Prof. Donald Knuth opened his new paper with "Shock! Shock!" Claude Opus 4.6 had just solved an open problem he'd been working on for weeks β a graph decomposition conjecture from The Art of Computer Programming. He named the paper "Claude's Cycles." 31 explorations. ~1 hour. Knuth read the output, wrote the formal proof, and closed with: "It seems I'll have to revise my opinions about generative AI one of these days." The man who wrote the bible of computer science just said that. In a paper named after an AI. Paper: https://t.co/juSOmK9vOt
A new Nature paper from Johns Hopkins (by Prof. Lin @DingchangLin ) just solved one of the hardest problems in biology: how do you record what every cell in a tissue experienced over time, not just what it looks like right now? The answer: GEMINI β Granularly Expanding Memory for Intracellular Narrative Integration. It works exactly like tree rings. Cells are genetically engineered to express a computationally designed protein assembly. As the assembly grows inside the cell, it captures cellular activity as fluorescent ring patterns β each ring a timestamp, each ring's properties encoding signal intensity. Look at a cross-section under a microscope and you can read the cell's history backward, with ~15-minute resolution. The key: cells build the recorder themselves. GEMINI doesn't interfere with normal function β it just quietly writes. What they demonstrated: In a full tumor xenograft, GEMINI captured every cancer cell's activity history across the entire tumor while it continued to grow normally. For the first time, researchers can look back and see how different regions of the same tumor responded differently to therapy over time β not snapshots, but film. In a mouse brain, GEMINI recorded neural activity dynamics without disrupting behavior, coordination, or memory. It could temporally resolve the history of a brain seizure. Why this matters: Every tool we have in biology gives you state β what the cell looks like now. Sequencing, imaging, proteomics β all snapshots. GEMINI gives you trajectory. It's the difference between a photograph and a video, applied to every cell in an organ simultaneously. The team is explicit that AI-based decoding tools will be central to reading GEMINI's output at whole-brain scale. This is the data layer that makes temporal single-cell atlases possible. Paper: https://t.co/TsObknQqga Congratulations @DingchangLin
GPT-5.4 is our most factual and efficient model: fewer tokens, faster speed. In ChatGPT, GPT-5.4 Thinking has improved deep web research, better context retention when it thinks for longerβand ohβyou can now interrupt the model and add instructions or adjust its direction mid-response. Steering is available this week on Android and web. iOS coming soon.
Building for the AI era means rethinking the stack from the ground up. Modular co-founder and CEO @clattner_llvm joined @shanselman on @Hanselminutes to talk about Mojo π₯, heterogeneous compute, and why AI infrastructure demands new abstractions. Watch hereβ https://t.co/AKCJQEoKNJ
MAX is how Modular is rethinking the AI stack from first principles, bringing together modeling, performance, and portability in one open framework. Hear directly from our co-founder and CEO @clattner_llvm on why the stack needs to evolve and what that means for the future of AI infrastructure.
Watch here: https://t.co/vBI679nVqH
You shouldn't have to choose between peak GPU performance and code you can actually maintain. We built Structured Mojo π₯ Kernels to fix that. Performance, usability, and portability without the tradeoff. 14k to 7k lines. ~1.8k TFLOPS held. We wrote a 4-part series on how. Part 1 is up https://t.co/zMYWMfDOb2
We just shipped Designs. Here's the problem it solves: most UI work fails because you don't know what it should look like until after your coding agent already built it wrong. You describe a dashboard. The agent builds it. You realize the layout doesn't work. You prompt again. The agent rebuilds. Something else breaks. Three iterations later you're debugging CSS instead of shipping features. Designs puts the iteration where it belongs, before a single line of code gets written. BrainGrid now generates actual UI designs for your requirements. You can iterate on them with the agent, annotate what needs to change, select specific elements to tweak. Once you lock it in, that design becomes part of the requirement that gets handed to your coding tool. No more building the wrong UI three times because you couldn't visualize it from a text prompt. It works with new apps and existing ones. If you're adding a feature to something you've already built, BrainGrid matches your existing app's look and feel so the new design doesn't feel bolted on. The designs get included in your Requirements doc when you fetch from CLI or MCP. Your coding agent knows exactly what to build. This is the part most builders skip, and it's why UI work takes twice as long as it should. Now you can see it, fix it, and lock it before the agent touches your codebase.
Claude Code worked non stop for 5 hours and 5 mins doing this refactor. Another level https://t.co/DIiidHbgHz

BIG BIG SPACE IV https://t.co/KDKd0f9bw9
Introducing Voice Mode in Perplexity Computer. You can now just talk and do things. https://t.co/eTZW1F8tUW