Your curated collection of saved posts and media
@DiamondEyesFox @durov https://t.co/Drl94NfDOR
Don't overcomplicate your AI agents. As an example, here is a minimal and very capable agent for automated theorem proving. The prevailing approach to automated theorem proving involves complex, multi-component systems with heavy computational overhead. But does it need to be that complex? This research introduces a deliberately minimal agent architecture for formal theorem proving. It interfaces with Lean and demonstrates that a streamlined, pared-down approach can achieve competitive performance on proof generation benchmarks. It turns out that simplicity is a feature, not a limitation. By stripping away unnecessary complexity, the agent becomes more reproducible, efficient, and accessible. Sophisticated results don't require sophisticated infrastructure. Paper: https://t.co/3p5MfNQII4 Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
Interesting research on how hierarchies spontaneously emerge in multi-agent systems. Start with a group of cooperative agents. There are no leaders and no structure. Just collaboration. What happens over time? Hierarchies form on their own. This new research looks at the dynamics of how initially flat, cooperative multi-agent systems naturally transition into hierarchical organizations. They identify the mechanisms and conditions that drive this structural shift. Why does it matter? Understanding hierarchy emergence is critical for designing multi-agent systems where organizational structure matters. Whether you're building agent swarms, collaborative AI teams, or simulating social systems, knowing when and why hierarchies form helps you design better systems or prevent unintended power structures. Paper: https://t.co/cKJKd59JU6 Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Can AI agents agree? Communication is one of the biggest challenges in multi-agent systems. New research tests LLM-based agents on Byzantine consensus games, scenarios where agents must agree on a value even when some participants behave adversarially. The main finding: valid agreement is unreliable even in fully benign settings, and degrades further as group size grows. Most failures come from convergence stalls and timeouts, not subtle value corruption. Why does it matter? Multi-agent systems are being deployed in high-stakes coordination tasks. This paper is an early signal that reliable consensus is not an emergent property you can assume. It needs to be designed explicitly. Paper: https://t.co/3fllhchiKX Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
New research on improving self-reflection in language agents. A core problem with agent self-reflection is that models tend to generate repetitive reflections that add noise instead of signal, hurting overall reasoning performance. It introduces ParamMem, a parametric memory module that encodes cross-sample reflection patterns directly into model parameters, then uses temperature-controlled sampling to generate diverse reflections at inference time. ParamMem shows consistent improvements over SOTA baselines across code generation, mathematical reasoning, and multi-hop QA. It also enables weak-to-strong transfer and self-improvement without needing a stronger external model, making it a practical upgrade for agentic pipelines. Paper: https://t.co/16Yp56j8Jm Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

MCP is dead? What are your thoughts? I mostly use Skills and CLI lately. I still use a few MCP tools for orchestrating agents more efficiently. https://t.co/o6saSxNQ9s
Theory of Mind in Multi-agent LLM Systems. A good read for anyone building systems where agents need to model each other's beliefs to coordinate effectively. This work introduces a multi-agent architecture combining Theory of Mind, Belief-Desire-Intention models, and symbolic solvers for logical verification, then evaluates how these cognitive mechanisms affect collaborative decision-making across multiple LLMs. The results reveal a complex interdependency where cognitive mechanisms like ToM don't automatically improve coordination. Their effectiveness depends heavily on underlying LLM capabilities. Knowing when and how to add these mechanisms is key to building reliable multi-agent systems. Paper: https://t.co/8ASbUgzGjF Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
Pay close attention to proactive AI agents. This is one of the wildest applications of agent harnesses I've seen. The MIT paper introduces NeuroSkill, a real-time agentic system that models human cognitive and emotional state by integrating Brain-Computer Interface signals with foundation models. "Human State of Mind" provided via SKILL dot md. The system runs fully offline on the edge. Its NeuroLoop harness enables agentic workflows that engage users across cognitive and emotional levels, responding to both explicit and implicit requests through actionable tool calls. Why does it matter? Most AI agents respond only to explicit user requests. NeuroSkill explores the frontier of proactive agents that sense and respond to implicit human states, opening new possibilities for adaptive human-AI interaction. Paper: https://t.co/kO3Ie2Dbvz Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
Interesting new research on LLM agent memory. Agent engineers, pay attention to this one. (bookmark it) It introduces a diagnostic framework that separates retrieval failures from utilization failures in agent memory systems. The main findings: - Retrieval method matters far more than how you write memories. - Accuracy varies 20 percentage points across retrieval approaches but only 3-8 points across writing strategies. - Simple raw chunking matches or outperforms expensive alternatives like Mem0-style fact extraction or MemGPT-style summarization. Teams investing heavily in sophisticated memory writing pipelines may be optimizing the wrong thing. Improving retrieval quality yields larger gains than increasing write-time sophistication. Paper: https://t.co/ZZvtsJXIJp Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Banger CLI tool released by Google. CLI for Google Workspace + a bunch of useful Agent Skills to go with it. We had a few unofficial ones floating around, so it's nice to finally see an official one. Testing it already. https://t.co/jDWw45P4oA
PAI just launched publicly, and this feels like a big leap for AI video. Had a chance to test it early. Most video generation models top out at 8-15 seconds. PAI can generate 60-second, 4K videos with up to 16 shots. The real unlock is multi-turn editing. You can actually go back and refine specific scenes, adjust performance and composition, without regenerating the whole thing. Characters stay consistent across shots. It maintains narrative continuity the way you'd expect from a real production pipeline. This is the first AI video tool that feels like it was built for video storytelling, not just clip generation. Worth trying if you're building anything with narrative video.
Don't overload your AGENTS dot md files. Keep them brief. GitHub's analysis of 2,500+ repos found what makes AGENTS dot md files work: - provide the agent a specific job or persona - exact commands to run - well-defined boundaries to follow - clear examples of good outputs https://t.co/GFSEjLsx8C
You can only like this post if you know what this is π https://t.co/IUpWLlWN13
Narrative violation. Cursor goes $1B to $2B in 3mos. Claude Code went $0 to $2.5B in 8mos. Everyone in the tech/X bubble think people are wholesale ditching Cursor, but enterprise diffusion is glacial. Most of the world just got a hold of it. https://t.co/7RBU7mvosz
Ben Thompson with the best take on DOD v. Anthropic, which is basically: if you don't want the government to treat your technology like nuclear weapons, stop comparing your technology to nuclear weapons. Hype Tax. https://t.co/2fUVhI3HY0
omg this title, this paper https://t.co/OXL9C4v2nX
omg this title, this paper https://t.co/OXL9C4v2nX

Fascinating paper with so many interesting observations. One that jumped out to me, which arguably could have got more attention, is the divergence between discrimination and calibration of agents. Calibration (see "CAL" on the predictability column) β the alignment between predicted confidence and actual accuracy β has improved noticeably in recent frontier models. But discrimination ( "AUROC" on the predictability column) β the ability to distinguish tasks the agent will solve from those it won't β shows divergent trends and has in some cases worsened. This matters enormously for deployment in real world contexts. An agent can be well-calibrated in aggregate (e.g. saying "I'm 70% confident" and being right 70% of the time) while being completely unable to flag which specific tasks it will fail at. Discrimination is therefore critical for anyone building autonomous workflows. You need the agent to know when to escalate, rather than just having good statistical properties across a population of tasks. I'm intrigued by what this means from a hardware perspective. Most of these reliability failures will stem from properties of model weights and training. But if this paper is correct, and trends in agent reliability continue to lag capabilities, it creates a strong case for architectures that enable rapid re-inference and consistency-checking (running the same query multiple times and comparing outputs). Here, low-latency, high-throughput inference hardware would have an outsized advantage. In this sense, the reliability tax on compute is basically a multiplier on inference demand.
Most agents donβt fail on modelsβ¦ they fail on context: those ugly, messy, complex documents that trip up even the latest LLMs (PDFs, tables, messy scans). Don't worry. We got you. π VC-backed (seed+) startup? Join the LlamaParse Startup Program: β free credits β dedicated slack channel + priority support β alignment call with our founder Jerry Liu β community spotlight (millions of devs) β production-ready ingestion pipelines Apply today spots are limited β https://t.co/61csPhQULp
LlamaIndex has evolved far beyond a RAG framework - we're now focused on agentic document processing that automates knowledge work. π Agent orchestration has fundamentally changed with sophisticated reasoning loops, tool discovery through Skills/MCP, and coding agents that write Python for you π Document understanding remains a massive opportunity - frontier vision models still struggle with complex tables, charts, and long documents at scale π’ LlamaParse now processes 300k+ users across 50+ formats for enterprises like @OneCarlyle, @CEMEX, and @KPMG with multi-agent workflows combining OCR, computer vision, and LLM reasoning βοΈ Real automation potential exists in workflows where humans manually process documents daily - financial analysis, contract review, insurance underwriting can all become end-to-end agentic processes Our mission is now providing core infrastructure to automate knowledge work over documents, not just being connective tissue between LLMs and data. Read about our evolution and what's next: https://t.co/M0DbsIdGrF

3 years ago, you mightβve known @llama_index as a RAG framework. Today we are not a RAG framework. We are an agentic document processing platform π¦π I wrote a blog post detailing the evolution of our company over the past ~3 years and why we believe our current position is enduring in the rapidly evolving landscape of evolving AI. There are two main points that I want to highlight: 1οΈβ£ One of the most important opportunities in todayβs world is to provide high-quality unstructured context to AI agents. We see ourselves as the best in class OCR module that can unlock context from the hardest document containers (PDFs, Word, Powerpoint, Excel, and more) 2οΈβ£ Agent reasoning loops have gotten a lot more sophisticated. General LLM abstractions are a lot less relevant. Retrieval patterns have completely changed. We need to build deep, focused tooling that actually provides value in this world of long-running agents. Note: We are not giving up on OSS tooling. We think open-source software is extremely important for democratizing AI access. We will continue to build OSS that is more aligned with our core focus area of AI-native document processing. We will continue to support framework users and point them to updated resources for relevant releases. Come check out our blog: https://t.co/2hGgzYtI3v Our core managed platform is LlamaParse. If youβre interested come check out our platform: https://t.co/TqP6OT5U5O

If you need to split complex or composite documents into structured categories or sections, LlamaSplit is built for the job βοΈ With the intuitive UI, you can: β’Define a custom configuration for how your documents should be categorized β’Specify the exact sections or impact types you want extracted β’Run the job and explore the results through an interactive interface π In this walkthrough, @itsclelia demonstrates how to configure LlamaSplit to break down Environmental Impact Reports into clearly defined impact categories π³ π₯ Watch the full video here: π Or get started right away with the docs (UI + code examples): https://t.co/kAMUqwOCDW
Huge thank you to everyone who joined the @GoogleDeepMind hackathon in NYC with us over the weekend π Our DevRel @tuanacelik gave a 30 minute workshop to get participants started on document agents with LlamaParse. We saw some amazing projects being submitted with no lack of creativity and imagination. Congrats to the 3 winning teams, and see you next time!
Adobe Acrobat has PDF splitting. We have agentic PDF splitting π€βοΈ Simply define the categories you want in natural language, and our split agent will automatically βchunkβ the document into subsets of pages and tag them with the appropriate categories. This is super useful to break apart complicated document packets like resumes, tax forms, identification docs, expense reports, and more. Check out @itscleliaβs video below, and come sign up to LlamaParse if youβre interested! Docs: https://t.co/UdxT3sJfkF LlamaParse: https://t.co/TqP6OT5U5O