Your curated collection of saved posts and media

Showing 24 posts Β· last 7 days Β· quality filtered
L
leah_pierson
@leah_pierson
πŸ“…
Mar 04, 2026
10d ago
πŸ†”54760301

omg this title, this paper https://t.co/OXL9C4v2nX

Media 1
πŸ–ΌοΈ Media
πŸ”random_walker retweeted
L
Leah Pierson
@leah_pierson
πŸ“…
Mar 04, 2026
10d ago
πŸ†”54760301

omg this title, this paper https://t.co/OXL9C4v2nX

Media 1Media 2
❀️19,486
likes
πŸ”1,329
retweets
πŸ–ΌοΈ Media
T
Tom_Westgarth15
@Tom_Westgarth15
πŸ“…
Mar 05, 2026
8d ago
πŸ†”87920397

Fascinating paper with so many interesting observations. One that jumped out to me, which arguably could have got more attention, is the divergence between discrimination and calibration of agents. Calibration (see "CAL" on the predictability column) β€” the alignment between predicted confidence and actual accuracy β€” has improved noticeably in recent frontier models. But discrimination ( "AUROC" on the predictability column) β€” the ability to distinguish tasks the agent will solve from those it won't β€” shows divergent trends and has in some cases worsened. This matters enormously for deployment in real world contexts. An agent can be well-calibrated in aggregate (e.g. saying "I'm 70% confident" and being right 70% of the time) while being completely unable to flag which specific tasks it will fail at. Discrimination is therefore critical for anyone building autonomous workflows. You need the agent to know when to escalate, rather than just having good statistical properties across a population of tasks. I'm intrigued by what this means from a hardware perspective. Most of these reliability failures will stem from properties of model weights and training. But if this paper is correct, and trends in agent reliability continue to lag capabilities, it creates a strong case for architectures that enable rapid re-inference and consistency-checking (running the same query multiple times and comparing outputs). Here, low-latency, high-throughput inference hardware would have an outsized advantage. In this sense, the reliability tax on compute is basically a multiplier on inference demand.

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 03, 2026
10d ago
πŸ†”79048472

Most agents don’t fail on models… they fail on context: those ugly, messy, complex documents that trip up even the latest LLMs (PDFs, tables, messy scans). Don't worry. We got you. πŸš€ VC-backed (seed+) startup? Join the LlamaParse Startup Program: βœ… free credits βœ… dedicated slack channel + priority support βœ… alignment call with our founder Jerry Liu βœ… community spotlight (millions of devs) βœ… production-ready ingestion pipelines Apply today spots are limited β†’ https://t.co/61csPhQULp

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 03, 2026
10d ago
πŸ†”69269691

LlamaIndex has evolved far beyond a RAG framework - we're now focused on agentic document processing that automates knowledge work. πŸš€ Agent orchestration has fundamentally changed with sophisticated reasoning loops, tool discovery through Skills/MCP, and coding agents that write Python for you πŸ“„ Document understanding remains a massive opportunity - frontier vision models still struggle with complex tables, charts, and long documents at scale 🏒 LlamaParse now processes 300k+ users across 50+ formats for enterprises like @OneCarlyle, @CEMEX, and @KPMG with multi-agent workflows combining OCR, computer vision, and LLM reasoning βš™οΈ Real automation potential exists in workflows where humans manually process documents daily - financial analysis, contract review, insurance underwriting can all become end-to-end agentic processes Our mission is now providing core infrastructure to automate knowledge work over documents, not just being connective tissue between LLMs and data. Read about our evolution and what's next: https://t.co/M0DbsIdGrF

Media 1Media 2
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Mar 03, 2026
10d ago
πŸ†”79643299

3 years ago, you might’ve known @llama_index as a RAG framework. Today we are not a RAG framework. We are an agentic document processing platform πŸ¦™πŸ“‘ I wrote a blog post detailing the evolution of our company over the past ~3 years and why we believe our current position is enduring in the rapidly evolving landscape of evolving AI. There are two main points that I want to highlight: 1️⃣ One of the most important opportunities in today’s world is to provide high-quality unstructured context to AI agents. We see ourselves as the best in class OCR module that can unlock context from the hardest document containers (PDFs, Word, Powerpoint, Excel, and more) 2️⃣ Agent reasoning loops have gotten a lot more sophisticated. General LLM abstractions are a lot less relevant. Retrieval patterns have completely changed. We need to build deep, focused tooling that actually provides value in this world of long-running agents. Note: We are not giving up on OSS tooling. We think open-source software is extremely important for democratizing AI access. We will continue to build OSS that is more aligned with our core focus area of AI-native document processing. We will continue to support framework users and point them to updated resources for relevant releases. Come check out our blog: https://t.co/2hGgzYtI3v Our core managed platform is LlamaParse. If you’re interested come check out our platform: https://t.co/TqP6OT5U5O

Media 1Media 2
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 04, 2026
9d ago
πŸ†”06983152

If you need to split complex or composite documents into structured categories or sections, LlamaSplit is built for the job βœ‚οΈ With the intuitive UI, you can: β€’Define a custom configuration for how your documents should be categorized β€’Specify the exact sections or impact types you want extracted β€’Run the job and explore the results through an interactive interface πŸ” In this walkthrough, @itsclelia demonstrates how to configure LlamaSplit to break down Environmental Impact Reports into clearly defined impact categories 🌳 πŸŽ₯ Watch the full video here: πŸ“˜ Or get started right away with the docs (UI + code examples): https://t.co/kAMUqwOCDW

πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 04, 2026
9d ago
πŸ†”96714965

Huge thank you to everyone who joined the @GoogleDeepMind hackathon in NYC with us over the weekend πŸ’› Our DevRel @tuanacelik gave a 30 minute workshop to get participants started on document agents with LlamaParse. We saw some amazing projects being submitted with no lack of creativity and imagination. Congrats to the 3 winning teams, and see you next time!

Media 1
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Mar 05, 2026
9d ago
πŸ†”30425369

Adobe Acrobat has PDF splitting. We have agentic PDF splitting πŸ€–βœ‚οΈ Simply define the categories you want in natural language, and our split agent will automatically β€œchunk” the document into subsets of pages and tag them with the appropriate categories. This is super useful to break apart complicated document packets like resumes, tax forms, identification docs, expense reports, and more. Check out @itsclelia’s video below, and come sign up to LlamaParse if you’re interested! Docs: https://t.co/UdxT3sJfkF LlamaParse: https://t.co/TqP6OT5U5O

πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 05, 2026
8d ago
πŸ†”83795631

Creating agent workflows and architecting the logic is one thing, making them durable, fail-safe, and scalable is anotherπŸ‘‡ New integration for durable agent workflows with @DBOS_Inc execution - Make sure your agents survive crashes, restarts, and errors without writing any checkpoint code. πŸ”„ Every step transition persists automatically - workflows resume exactly where they left off ⚑ Zero external dependencies with SQLite, or scale to multi-replica deployments with Postgres πŸ‘―β€β™€οΈ Built for replication - each replica owns its workflows, with Postgres coordinating across instances πŸ’€ Idle release feature frees memory for long-running workflows waiting on human input πŸ›‘οΈ Built-in crash recovery detects and relaunches incomplete workflows automatically This integration with DBOS removes all the manual snapshot work from durable workflows. Just pass a DBOS runtime to your workflow and get great reliability β€” whether you're running a single process or multiple replicas in production. Learn how to build durable agents on our new docs: https://t.co/9AfefFWkXl

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 05, 2026
8d ago
πŸ†”90767806

"Just send the PDF to GPT-4o" Ok. We did. Here's what happened: β€’ Reading order? Wrong. β€’ Tables? Half missing. β€’ Hallucinated data? Everywhere. β€’ Bounding boxes? Nonexistent. β€’ Cost at 100K pages? Brutal. So we're doing it live. LlamaParse vs. The LLMs β€” a free webinar where we parse the ugliest documents we can find across every leading model and show the results side by side. Hosted by George, Head of Engineering, LlamaIndex When: March 26th; 9 AM PST Register πŸ‘‡ https://t.co/To4m9Zmu7m

Media 1
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Mar 05, 2026
8d ago
πŸ†”65563933

I love the Big Arch Burger πŸ” I also love Big Harnessesβ„’ and Big Complex PDFsβ„’ with hundreds of pages of tables, images and forms. https://t.co/deD8sUcyj0

πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Mar 06, 2026
8d ago
πŸ†”95117278

"Just send the PDF to GPT-5.4" Ok. We did. Here's what happened: β€’ Reading order? Wrong. β€’ Tables? Half missing. β€’ Hallucinated data? Everywhere. β€’ Bounding boxes? Nonexistent. β€’ Cost at 100K pages? Brutal. So we're doing it live. LlamaParse vs. The LLMs β€” a free webinar where we parse the ugliest documents we can find across every leading model and show the results side by side. Hosted by George, Head of Engineer at @llama_index Register πŸ‘‡ https://t.co/To4m9ZlWhO

Media 1
πŸ–ΌοΈ Media
B
BoWang87
@BoWang87
πŸ“…
Mar 03, 2026
10d ago
πŸ†”77475623

Prof. Donald Knuth opened his new paper with "Shock! Shock!" Claude Opus 4.6 had just solved an open problem he'd been working on for weeks β€” a graph decomposition conjecture from The Art of Computer Programming. He named the paper "Claude's Cycles." 31 explorations. ~1 hour. Knuth read the output, wrote the formal proof, and closed with: "It seems I'll have to revise my opinions about generative AI one of these days." The man who wrote the bible of computer science just said that. In a paper named after an AI. Paper: https://t.co/juSOmK9vOt

Media 1
πŸ–ΌοΈ Media
B
BoWang87
@BoWang87
πŸ“…
Mar 04, 2026
9d ago
πŸ†”33929065

A new Nature paper from Johns Hopkins (by Prof. Lin @DingchangLin ) just solved one of the hardest problems in biology: how do you record what every cell in a tissue experienced over time, not just what it looks like right now? The answer: GEMINI β€” Granularly Expanding Memory for Intracellular Narrative Integration. It works exactly like tree rings. Cells are genetically engineered to express a computationally designed protein assembly. As the assembly grows inside the cell, it captures cellular activity as fluorescent ring patterns β€” each ring a timestamp, each ring's properties encoding signal intensity. Look at a cross-section under a microscope and you can read the cell's history backward, with ~15-minute resolution. The key: cells build the recorder themselves. GEMINI doesn't interfere with normal function β€” it just quietly writes. What they demonstrated: In a full tumor xenograft, GEMINI captured every cancer cell's activity history across the entire tumor while it continued to grow normally. For the first time, researchers can look back and see how different regions of the same tumor responded differently to therapy over time β€” not snapshots, but film. In a mouse brain, GEMINI recorded neural activity dynamics without disrupting behavior, coordination, or memory. It could temporally resolve the history of a brain seizure. Why this matters: Every tool we have in biology gives you state β€” what the cell looks like now. Sequencing, imaging, proteomics β€” all snapshots. GEMINI gives you trajectory. It's the difference between a photograph and a video, applied to every cell in an organ simultaneously. The team is explicit that AI-based decoding tools will be central to reading GEMINI's output at whole-brain scale. This is the data layer that makes temporal single-cell atlases possible. Paper: https://t.co/TsObknQqga Congratulations @DingchangLin

πŸ–ΌοΈ Media
O
OpenAI
@OpenAI
πŸ“…
Mar 05, 2026
8d ago
πŸ†”99326334

GPT-5.4 is our most factual and efficient model: fewer tokens, faster speed. In ChatGPT, GPT-5.4 Thinking has improved deep web research, better context retention when it thinks for longerβ€”and ohβ€”you can now interrupt the model and add instructions or adjust its direction mid-response. Steering is available this week on Android and web. iOS coming soon.

πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Mar 02, 2026
11d ago
πŸ†”96696317

Building for the AI era means rethinking the stack from the ground up. Modular co-founder and CEO @clattner_llvm joined @shanselman on @Hanselminutes to talk about Mojo πŸ”₯, heterogeneous compute, and why AI infrastructure demands new abstractions. Watch here↓ https://t.co/AKCJQEoKNJ

Media 1
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Mar 03, 2026
10d ago
πŸ†”76941593

MAX is how Modular is rethinking the AI stack from first principles, bringing together modeling, performance, and portability in one open framework. Hear directly from our co-founder and CEO @clattner_llvm on why the stack needs to evolve and what that means for the future of AI infrastructure.

Media 1
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Mar 03, 2026
10d ago
πŸ†”15706284

Watch here: https://t.co/vBI679nVqH

Media 1
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Mar 05, 2026
8d ago
πŸ†”12130301

You shouldn't have to choose between peak GPU performance and code you can actually maintain. We built Structured Mojo πŸ”₯ Kernels to fix that. Performance, usability, and portability without the tradeoff. 14k to 7k lines. ~1.8k TFLOPS held. We wrote a 4-part series on how. Part 1 is up https://t.co/zMYWMfDOb2

Media 1
πŸ–ΌοΈ Media
B
braingridai
@braingridai
πŸ“…
Mar 03, 2026
10d ago
πŸ†”97137307

We just shipped Designs. Here's the problem it solves: most UI work fails because you don't know what it should look like until after your coding agent already built it wrong. You describe a dashboard. The agent builds it. You realize the layout doesn't work. You prompt again. The agent rebuilds. Something else breaks. Three iterations later you're debugging CSS instead of shipping features. Designs puts the iteration where it belongs, before a single line of code gets written. BrainGrid now generates actual UI designs for your requirements. You can iterate on them with the agent, annotate what needs to change, select specific elements to tweak. Once you lock it in, that design becomes part of the requirement that gets handed to your coding tool. No more building the wrong UI three times because you couldn't visualize it from a text prompt. It works with new apps and existing ones. If you're adding a feature to something you've already built, BrainGrid matches your existing app's look and feel so the new design doesn't feel bolted on. The designs get included in your Requirements doc when you fetch from CLI or MCP. Your coding agent knows exactly what to build. This is the part most builders skip, and it's why UI work takes twice as long as it should. Now you can see it, fix it, and lock it before the agent touches your codebase.

Media 1
πŸ–ΌοΈ Media
A
acossta
@acossta
πŸ“…
Mar 05, 2026
8d ago
πŸ†”34522195

Claude Code worked non stop for 5 hours and 5 mins doing this refactor. Another level https://t.co/DIiidHbgHz

Media 1Media 2
πŸ–ΌοΈ Media
A
aiordieshow
@aiordieshow
πŸ“…
Mar 03, 2026
11d ago
πŸ†”59708265

BIG BIG SPACE IV https://t.co/KDKd0f9bw9

πŸ–ΌοΈ Media
P
perplexity_ai
@perplexity_ai
πŸ“…
Mar 04, 2026
9d ago
πŸ†”26853379

Introducing Voice Mode in Perplexity Computer. You can now just talk and do things. https://t.co/eTZW1F8tUW

πŸ–ΌοΈ Media