Your curated collection of saved posts and media

Showing 24 posts Β· last 7 days Β· quality filtered
L
llama_index
@llama_index
πŸ“…
Feb 12, 2026
31d ago
πŸ†”38241477

2026 is the year of long-horizon agents. @sequoia predicts that this year, agents will be able to tackle long-horizon tasks and work autonomously for hours to solve ambiguous tasks. We're excited about how this translates to knowledge work automation, particularly over documents. Let's take a look at "Long Horizon Document Agents" πŸ•°οΈ Agents are evolving to work autonomously over weeks, not just minutes, handling complex document tasks end-to-end. πŸ”„ These agents can continuously monitor events like document changes, comments, and deadlines - not just respond to chat prompts πŸ“ They maintain persistent task backlogs and can collaborate iteratively on living documents like FAQs, PRDs, and legal contracts 🎯 The interface shifts from chat boxes to "agent inboxes" that manage ongoing document tasks with clear status and context ⚑ This enables true automation of multi-step knowledge work - from due diligence memo updates to contract redline collaboration loops 2026 is shaping up to be the year agents evolve from "workflows" to "employees" - and we're building the document processing infrastructure to make this possible. Read @jerryjliu0's full blog on long horizon document agents: https://t.co/1DwRnMRseH

Media 1Media 2
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 13, 2026
30d ago
πŸ†”04207766

πŸš€ The @posthog team has just rolled out LlamaIndex support for their LLM Analytics, and we built a demo to showcase what’s possible. Using LlamaIndex, LlamaParse, and OpenAI, our Agent Workflow compares product specifications and matches users with the most suitable option for their use case πŸ› οΈ πŸ¦” Thanks to PostHog’s observability integration, the demo automatically tracks OpenAI usage, including: β€’Token consumption β€’Cost breakdown β€’Latency metrics πŸŽ₯ Check out the video below to see it in action πŸ‘‡ πŸ‘©β€πŸ’» GitHub: https://t.co/elk5VKi8IF πŸ“š Docs: https://t.co/IZI3w6BYKy πŸ¦™ LlamaCloud: https://t.co/wZjhFV29gN

Media 2
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 16, 2026
27d ago
πŸ†”51451294

What if an AI agent could review every invoice against your contracts β€” and flag what doesn't match? That's exactly what our Invoice Reconciler demo does. Here's how it works: πŸ“„ Upload your contracts and invoices β†’ LlamaParse converts them into clean, LLM-readable Markdown πŸ“‚ Everything gets indexed in LlamaCloud β€” searchable and ready for RAG πŸ” Define your reconciliation rules (unit price match, correct math, line item match, etc.) πŸ€– A LlamaAgent workflow analyzes each invoice against your contracts and rules β€” then approves or rejects with confidence scores and detailed reasoning You can even chat with your invoices and contracts directly β€” ask "what have we bought?" or "what contracts do we have in place?" and get cited answers instantly. The whole thing is powered by LlamaCloud: LlamaParse for document ingestion, LlamaCloud indexes for retrieval, and LlamaAgent Workflows for orchestration. πŸŽ₯ Watch the full walkthrough: https://t.co/LX57pjDfwN

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 17, 2026
26d ago
πŸ†”75508302

"It's somewhere in the PDF" is not a citation. Page-level extraction in LlamaExtract gives you: βœ“ Data mapped to specific pages βœ“ Bounding boxes showing exact locations βœ“ Audit-ready citations Turn 200-page docs into skimmable, structured insights πŸ‘‡ https://t.co/BTkwspmefz

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 18, 2026
25d ago
πŸ†”67429826

πŸ† We're running a LlamaAgents contest right now. Throw your hardest documents at our agent builder, and tell us how it goes. Want help getting started? We have a new walkthrough for the LlamaAgent Builder by @tuanacelik πŸ’¬ Describe a document workflow in natural language, and it builds a full agent for you. In this video, the prompt was basically: "split a resume book into individual resumes, ignore cover pages and curriculum pages, extract resume work and education related fields..." πŸ› οΈ From that, the agent builder reasons about which LlamaCloud tools to use, lands on LlamaSplit + LlamaExtract, configures both, iterates on the workflow structure, and gives you a deployable agent with an API and UI. No dragging boxes around. No writing workflow code (unless you want to). Just describe the problem and let it figure out the architecture. You own the code, it pushes to your GitHub. Clone it, open in Cursor, customize whatever you need. https://t.co/QAvGwI3FIg

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 19, 2026
24d ago
πŸ†”62706517

More reasoning doesn't always mean better results - especially for document parsing. We tested GPT-5.2 at four reasoning levels on complex documents and found that higher reasoning actually hurt performance while dramatically increasing costs and latency. 🧠 Reasoning models hallucinate content that isn't there, filling in "missing" table cells with inferred values πŸ“Š They split single tables into multiple sections by overthinking structural boundaries ⚑ Processing time increased 5x with xHigh reasoning (241s vs 47s) while accuracy stayed flat at ~0.79 πŸ’° Our LlamaParse Agentic outperformed all reasoning levels at 18x lower cost and 13x faster speed You can't reason past what you can't see. Vision encoders lose pixel-level information before reasoning even starts, and no amount of thinking tokens can recover that lost detail. Our solution uses a pipeline approach - specialized OCR extracts text at native resolution, then LLMs structure what's already been accurately read. Each component plays to its strengths instead of forcing one model to handle everything. Read the full analysis: https://t.co/gWDOpfHnWm

Media 1Media 2
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Feb 19, 2026
24d ago
πŸ†”58644561

Coding agents are fundamentally changing software engineering in terms of velocity, role, and org structure. We published a memo to our internal engineering team detailing our growing expectations in terms of role/scope. 🟠 Before, the tasks of prioritization, engineering planning, and implementation were divided between EMs, PMs, senior ICs, and junior ICs 🟒 Now, ICs are expected to handle *all* of product prioritization, product speccing, and implementation This is due to a few trends πŸ“ˆ: - Coding agents have brought implementation costs down to ~0. The role of engineers is writing prompts - LLMs and sub-agents have reduced the PM work of synthesizing feedback down to ~0 too The main job of any β€œengineer” is to be an e2e product owner: being able to translate requirements into specifications, and delegate tasks to various subagents for implementation. Every engineer is told to offload as much as possible to their favorite tools, whether it’s Claude Code, Cursor, Devin, Codex, regular ChatGPT and more. We celebrate and share learnings around burning tokens, as long as it helps drive additional productivity!

Media 1
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 20, 2026
23d ago
πŸ†”11679102

πŸš€ Big drop from @GoogleDeepMind: Gemini 3.1 Pro is here, and we built a hands-on demo powered by LlamaCloud to put it to work and turn your receipt photos into real financial insights! Using our Agent Workflows, the app: πŸ“Έ Parses receipt images with LlamaParse (Agentic tier) πŸ—‚ Stores everything locally in an SQLite database πŸ“Š Aggregates your spending monthly 🧠 Uses Gemini 3.1 Pro to analyze trends and generate actionable tips to improve your finances Check out the demo below!πŸ‘‡ πŸ‘©β€πŸ’» GitHub repo: https://t.co/Ny22F4I3n1 πŸ¦™ Get started with LlamaCloud: https://t.co/zyE5lXTPFV

Media 2
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 23, 2026
20d ago
πŸ†”72096324

πŸš€ LlamaAgents Builder just leveled up: File uploads are here! Our natural language interface for building agentic document workflows now supports file uploads. You can provide example documents as context, and the agent will use them as a starting point to design and tailor your workflow. The result? Applications that better match your real-world use case. The more representative your sample files, the more accurate your final app. πŸŽ₯ Watch the full walkthrough: https://t.co/LQW2PEZ8d9 πŸ¦™ Get started with LlamaCloud: https://t.co/wZjhFV29gN

Media 2
πŸ–ΌοΈ Media
J
jerryjliu0
@jerryjliu0
πŸ“…
Feb 23, 2026
20d ago
πŸ†”31451334

We built an AI agent that lets you vibe-code document extraction - high accuracy and citations over the most complex documents. Our latest release lets you upload documents as context. All you then have to do is describe what you want extracted in natural language. πŸ’‘ Our agent will then read the document with file tools to infer the right schema, validation rules, and other pre/postprocessing logic. βœ… It will give you back a workflow that can extract over thousands/millions of documents at scale. You can still of course review and edit every output before approving. Stop handling paperwork manually; just upload files, describe your task, and let our agent handle the rest. Our vision for LlamaAgents is to provide the most advanced and easy-to-use way for you to orchestrate document work. Walkthrough: https://t.co/dAtzlZbot4 Check it out: https://t.co/XYZmx5TFz8 If you’re interested in reducing the operational burden of document extraction (invoices, claims, onboarding forms), come talk to us! https://t.co/Ht5jwxSrQB

Media 2
+1 more
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 24, 2026
19d ago
πŸ†”36396844

Document OCR benchmarks are hitting a ceiling - and that's a problem for real-world AI applications. Our latest analysis reveals why OmniDocBench, the go-to standard for document parsing evaluation, is becoming inadequate as models like GLM-OCR @Zai_org achieve 94.6% accuracy while still failing on complex real-world documents. πŸ“Š Models are saturating OmniDocBench scores but still struggle with complex financial reports, legal filings, and domain-specific documents 🎯 Rigid exact-match evaluation penalizes semantically correct outputs that differ in formatting (HTML vs markdown, spacing, etc.) ⚑ AI agents need semantic correctness, not perfect formatting matches - current benchmarks miss this critical distinction πŸ”¬ The benchmark's 1,355 pages can't capture the full complexity of production document processing needs The document parsing challenge isn't solved just because benchmark scores look impressive. We need evaluation methods that reward semantic understanding over exact formatting, especially as AI agents become the primary consumers of parsed content. We're building parsing models focused on semantic correctness for complex visual documents. If you're scaling OCR workloads in production, LlamaParse handles the edge cases that benchmarks miss. Read our full analysis: https://t.co/tcZP1PM8kv

Media 1Media 2
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 26, 2026
17d ago
πŸ†”02795905

Build a private equity deal sourcing agent that automatically classifies investment opportunities and extracts key financial metrics using our LlamaAgents Builder. This step-by-step guide shows you how to create an agent that processes deal files like teasers and financial summaries: 🎯 Classify deals into buyout, growth, or minority investment strategies πŸ“Š Extract critical metrics including revenue, EBITDA, growth rates, and debt levels πŸš€ Deploy directly to GitHub and get a working UI without writing code πŸ”§ Iterate and refine your agent through natural language conversations The tutorial covers prompt engineering best practices, using example files effectively, visualizing agent workflows, and deploying to production. We demonstrate the complete process from initial prompt to testing the deployed application with real deal documents. Read the full tutorial: https://t.co/WcT2j3nEoi

Media 1Media 2
πŸ–ΌοΈ Media
L
llama_index
@llama_index
πŸ“…
Feb 27, 2026
16d ago
πŸ†”34531120

Turn your PDF charts into pandas DataFrames with specialized chart parsing in LlamaParse! This tutorial walks you through extracting structured data from charts and graphs in PDFs, then running data analysis with pandas - no manual data entry required. πŸ“Š Enable specialized chart parsing to convert visual charts into structured table data 🐼 Extract table rows directly from parsed PDF pages and load them into DataFrames πŸ“ˆ Perform year-over-year analysis, calculate gaps between metrics, and create visualizations ⚑ Use the items view to get per-page structured data including tables and figures We demonstrate this using a 2024 Executive Summary PDF, extracting a fiscal year chart showing Budget Deficit vs Net Operating Cost data spanning 2020-2024, and reproducing the key financial insights. Check out the full tutorial: https://t.co/sOVtFM3xE1

Media 1
πŸ–ΌοΈ Media
T
tuanacelik
@tuanacelik
πŸ“…
Feb 27, 2026
16d ago
πŸ†”40765042

Since joining @llama_index, my focus has shifted from 'everything agents' to 'document agents' : agents that can handle work over all manner of complex documents. So, I tried out the latest chart parsing capabilities of LlamaParse. Charts in PDFs are notoriously painful to work with. You can see the data ) bars, axes, labels) but actually getting it into a format you can analyze means is a different matter. I tried out parsing a U.S. Treasury executive summary PDF, pulling a grouped bar chart showing Budget Deficit vs. Net Operating Cost for fiscal years 2020–2024, and turning it into a pandas DataFrame you can run analysis on (although really you can then do whatever, provide it for downstream tasks to an agent..) Once parsed, the chart's underlying data comes back as a table in the items tree for that page. From there: grab the rows, construct a DataFrame, etc. In the example, I'm computing year-over-year changes in both metrics, measuring the gap between them across the five-year window, and just to be sure, I reproduced a bar chart that mirrors the original PDF visualization. You can try it our here: https://t.co/8WHV4xzcDS

Media 2
πŸ–ΌοΈ Media
H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Feb 10, 2026
33d ago
πŸ†”29930734

We put Opus 4.6 through our Hemingway-bench Writing Leaderboard. How did it fare? Claude continues to dominate GPT-5.2, but lags behind the Geminis. The new writing hierarchy: πŸ‘‘ Gemini 3 Flash πŸ₯ˆ Gemini 3 Pro πŸ₯‰ Opus 4.6 (New!) 4️⃣ Opus 4.5 5️⃣ GPT-5.2 Chat For example: one H-bench prompt requests a cryptic Instagram post for casting auditions. GPT-5.2: "Casting call? Never heard of her." (??? πŸ’€) Opus 4.6: "Currently accepting applications for professional liars, dramatic criers, and people who can walk through a door convincingly on the first take. You know who you are."

Media 1
πŸ–ΌοΈ Media
H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Feb 10, 2026
33d ago
πŸ†”32576001

Another Hemingway-bench prompt asks for an oral presentation about time management. GPT-5.2 writes like a LinkedIn engagement farm: "When people hear β€œworking from home,” they often think it means more freedom, more comfort, and maybe even more free time. And sometimes that’s true. But what doesn’t get talked about enough is how easily work-from-home life can get messy if you don’t manage your time well." (πŸ₯±) Opus 4.6 feels like a charismatic creative working the room: "So... raise your hand if you've ever "worked from home" and somehow ended up four hours into a Netflix series at 2 PM on a Tuesday. No judgment. We've all been there."

Media 1
πŸ–ΌοΈ Media
H
HelloSurgeAI
@HelloSurgeAI
πŸ“…
Feb 11, 2026
32d ago
πŸ†”73396062

We’ve finally done it. Forbes just ranked our CEO *54* spots above Taylor Swift on their America’s Greatest Innovators list. https://t.co/9h6OPZRQy9 While we’re honored that Forbes think Edwin’s strategy is more innovative than a 10-minute song about a scarf, we want to clarify a few things: 1. We will NOT be releasing our next benchmark as a limited-edition vinyl variant. 2. Jake was great in Zodiac. 3. We aren’t saying we’re better at songwriting, but we *are* saying we’ve never seen Taylor build an RL environment. See you at next year's Grammys, @taylorswift13.

Media 1Media 2
πŸ–ΌοΈ Media
E
echen
@echen
πŸ“…
Feb 19, 2026
24d ago
πŸ†”85089539

Everyone’s building $100M "agentic" models, so we @HelloSurgeAI built a simulated company to see if they could actually hold down a job. Spoiler: they're all fired. Welcome to EnterpriseBench -- CoreCraft edition. CoreCraft is a high-growth hardware startup (i.e., RL environment) with 23 tools, 2500 entities, and enough corporate red tape to make Harvey cry. The best agent in the world (Opus 4.6! πŸ‘‘) scored under 30%. The #2 model (GPT-5.2 πŸ₯ˆ) gave up because a search returned 10 results and it couldn't figure out how to change the date filter. Another one (Gemini 3 Flash, #9) literally made up a delivery date just to deny a customer's refund. Savage. (The new Gemini 3.1 Pro? Still lagging behind, at πŸ₯‰) My favorite: GPT-5.2 spent 11 tool calls curating a promotional email to help a customer reach Platinum tier... a tier she was already in. "Here are 3 items over $0 you can buy!" "We would obviously never run ads in the way Anthropic depicts them...." -- thanks Sam. The good news? We trained a model on this chaos and it got better at its job - even translating those skills to other benchmarks. (e.g., +7.4% on Tau2-Bench Retail) Check out the full EnterpriseBench: CoreCraft leaderboard below, and read about our RL environment and research! Blog post: https://t.co/mv4I1dCtOC Paper: https://t.co/EaOHmExm1r Leaderboard: https://t.co/7fb6fewGIQ

Media 1
πŸ–ΌοΈ Media
D
DavidSacks
@DavidSacks
πŸ“…
Feb 26, 2026
17d ago
πŸ†”27237251

Narrative Violation: β€œJob Postings For Software Engineers Are Rapidly Rising” https://t.co/yn2SkpZxPJ

Media 1
πŸ–ΌοΈ Media
πŸ”s_batzoglou retweeted
D
David Sacks
@DavidSacks
πŸ“…
Feb 26, 2026
17d ago
πŸ†”27237251

Narrative Violation: β€œJob Postings For Software Engineers Are Rapidly Rising” https://t.co/yn2SkpZxPJ

Media 1
❀️3,608
likes
πŸ”317
retweets
πŸ–ΌοΈ Media
B
BoWang87
@BoWang87
πŸ“…
Feb 26, 2026
17d ago
πŸ†”01605551

Thrilled to share our review paper, out today in @NatureRevGenet : "Harnessing artificial intelligence to advance CRISPR-based genome editing technologies" Full paper : πŸ”— https://t.co/ZBJcgDZduY CRISPR has already changed medicine. AI is now changing CRISPR. We spent a long time mapping the full landscape of where machine learning and deep learning are having real, measurable impact across the genome editing workflow β€” and where the most exciting opportunities lie ahead. Here's what we cover: Guide RNA design β€” Deep learning models now predict on- and off-target activity for Cas9, Cas12, Cas13, and emerging systems like TnpB and IscB. We've gone from sequence heuristics to transformer-based models that generalize across organisms. Cell-type-specific generalization remains a frontier. Base and prime editing β€” ML models predict bystander effects, product purity, and editing efficiency from sequence context alone. For prime editing, tools like PRIDICT and DeepPE have made pegRNA design far more tractable at scale. Enzyme engineering β€” Protein language models (ESM, EVOLVEpro) are now guiding directed evolution of Cas proteins β€” expanding PAM compatibility, reducing immunogenicity, improving compactness β€” at a pace impossible through classical lab iteration alone. Novel enzyme discovery β€” Foundation models trained on metagenomics are uncovering entirely new CRISPR systems from microbial diversity: new Cas variants, TnpB systems, and eukaryotic Fanzor proteins. The search space is enormous; AI is how we navigate it. Virtual cell models β€” This is where I'm most excited. AI-powered virtual cells can, in principle, predict the functional consequences of any edit in any cell type β€” selecting targets, anticipating off-targets, modeling tissue-specific outcomes. But realizing this vision requires causally-rich, contextually diverse perturbation data. Scale of data matters as much as scale of model. Delivery β€” ML-guided LNP design is closing the last mile between an edit that works in a dish and one that works in a patient. Across all of this, one theme recurs: AI accelerates where data is abundant and well-structured. The field's next challenge is generating that data at the right diversity and scale. This paper was a true collaboration. Huge thanks to Tyler Thomson, Gen Li, Amy Strilchuk, @HAOTIANCUI1 , and Bowen Li β€” you each brought something irreplaceable to this. Special shoutout to @BowenLi_Lab for his leaderhsip in this work!

Media 1Media 2
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Jan 28, 2026
46d ago
πŸ†”14226264

Have questions you’d like addressed during the meeting? Drop them here: https://t.co/4DXYuyzHkP

Media 1
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Feb 09, 2026
34d ago
πŸ†”38149881

From desktop applications to national laboratory research, see what developers are building with MojoπŸ”₯ This month's Community Meeting features GTK bindings with live GUI demos, Oak Ridge National Laboratory's GPU benchmark study comparing NVIDIA and AMD performance, and the 26.1 release including compile-time reflection and Apple Silicon GPU support. https://t.co/aral6XFkJZ

Media 1
πŸ–ΌοΈ Media
M
Modular
@Modular
πŸ“…
Feb 10, 2026
33d ago
πŸ†”26042133

Modular has acquired @bentomlai! 🀝 10K+ orgs use BentoML for production AI, including 50+ Fortune 500 companies. We're pairing their deployment platform with MAX + Mojo's hardware optimization. BentoML stays open source (Apache 2.0), and we’re doubling down on OSS in 2026. Ask BentoML founder @chaoyu_ and @clattner_llvm anything on Feb 17 at 9:30am PT. Get all the details: https://t.co/lifotwMzR2

πŸ–ΌοΈ Media