Your curated collection of saved posts and media
Great work showing prompt synthesis as a new scaling axis for reasoning. Good training data is scarce. This work showcases a framework that might make it possible to construct high-quality training problems for reasoning-focused LLMs. Technical details below: https://t.co/TUlFsSRAqN
It was an honour to speak at the @UN this week to address the UN Security Council on the impacts of AI on international peace & security, and to join the high-level multi-stakeholder informal meeting to launch the Global Dialogue on AI Governance. We need global cooperation to steer AI toward a safe and beneficial future, and I was very pleased to see so many people come together these past few days with that shared goal.

We raised $28M seed from Threshold Ventures, AIX Ventures, and NVentures (Nvidia's venture capital arm) โalongside 10+ unicorn founders and top AI researchersโ to build reasoning models that generate real-time simulations and games. Models are bottlenecked by practical simulations that can act as Reinforcement Learning environments. Human self-expression is bounded by tools that let us create alternate realities. At Moonlake, we are building a future where anyone can create interactive worlds, bring their child-like wonder to life, learn within them, and most importantly, share experiences with people we care about. More in ๐งต
GTC is on again at DC! I will be hand picking one golden ticket winner for a complimentary pass, special seating for Jensen's keynote, NV swags, and other perks! Reply with your coolest open-source project on GR00T N1/N1.5/Dream models! https://t.co/XVfvqzTo7G
The rise of humanoid platforms presents new opportunities and unique challenges. ๐ค Join @yukez at #CoRL2025 as he shares the latest research on robot foundation models and presents new updates with the #NVIDIAIsaac GR00T platform. Learn more ๐https://t.co/LrzONs1Gzc https://t.co/v8GRGPUveA

@sebkrier There is an LW post by Scott Alexander making a similar point https://t.co/3R8x3s2ENt
Assembling a select group of federal AI specialists. Engage leading scholars and policymakers in the Princeton AI Policy Precepts! ๐Washington DC ๐๏ธ Oct 21, Nov 14 ๐จ Only 24 spots: Apply! https://t.co/B05PbGvmO0 @random_walker @sayashk @PeterHndrsn @PrincetonSPIA @PrincetonCITP https://t.co/IpAgyk8UnP
On our evals for HAL, we found that agents figure out they're being evaluated even on capability evals. For example, here Claude 3.7 Sonnet *looks up the benchmark on HuggingFace* to find the answer to an AssistantBench question. There were many such cases across benchmarks and models. Of course, you can make it harder for the agent to cheat, such as by blocking HuggingFace or encrypting the dataset. But so long as the benchmark is available somewhere in public, an agent could theoretically follow the same steps a human would to access it (e.g., decrypting a password-protected benchmark on its sandbox). So agent log analysis will become necessary even for capability evals. HAL now has logs from 20,000+ rollouts across 9 benchmarks, and we are analyzing all of these logs using @TransluceAI's Docent.
To make a model that *doesn't* instantly learn to distinguish between "fake-ass alignment test" and "normal task." ...seems like the first thing to do seems like it would be "make all alignment evals very small variations on actual capability evals." Do people do this?
Most videos in my Sora feed (context: scrolling for 25 min, havenโt liked any vids, no follows) are clear copyright infringement ranging from cute Pokรฉmon videos to mediocre Family Guy spoofs and what can seems like (not described as) Nazi/WWII German uniform inspired SpongeBob. https://t.co/DSAhGNof4C

Many AI policy decisions are complicated. "Don't ban self-driving cars" is really not. Good new piece from @KelseyTuoc, with a lede that pulls no punches: https://t.co/OdjSizgALP

SemTools has gotten some huge updates over the last few weeks: 1. A new `workspace` feature to speed up search calls over large datasets by caching embeddings with @lancedb. On a dataset of 1000 papers, search time goes from minutes to seconds 2. Now installable with npm! https://t.co/IjNRy990QJ

Don't miss $20,000 in prizes at the Fullstack Agents Hackathon, next Saturday! We are partnering with CopilotKit, Composio, Microsoft for Startups, B Capital & AI Tinkerers to put on an amazing hackathon. Participants will start with a boilerplate fullstack agent application connecting a LlamaIndex Agent to a frontend with AG-UI. The Agent will have access to thousands of tools via Composio. $20k+ in prizes on the line for the teams that can transform the template into a powerful fullstack agent for their use-case. Venue is the Microsoft SV Center, ๐๏ธSeptember 27th -- register today! https://t.co/zre2OXf1bw
Coding agents like Claude Code are powerful, but they can't understand your business documents by default - here's how to fix that. We've identified three complementary approaches to give coding agents true document intelligence: ๐ก Access docs through MCP - Connect @claude_code to your indexed document repositories via Model Context Protocol, giving it instant access to your business context like policies, reports, and specifications โก Operate through enhanced CLI - Extend familiar command-line tools with document parsing capabilities, letting agents use grep, cat, and find operations on complex PDFs and structured documents ๐ ๏ธ Build agentic workflows - Teach @claude_code to generate AI-native applications that adapt to new document formats instead of breaking on hard-coded rules ๐ฏ Best approach? Use all three together for maximum impact The reality is that 90% of enterprise data lives in documents, and coding agents that can't process this information will build generic, brittle applications. By combining MCP integration, enhanced CLI tools, and workflow generation capabilities, you can bridge the gap between natural language business requirements and real enterprise automation. Learn the complete strategy: https://t.co/wgBJhKEapx

THIS SATURDAY: don't miss $20,000 in prizes at the Fullstack Agents Hackathon! We are partnering with CopilotKit, Composio, Microsoft for Startups, B Capital & AI Tinkerers to put on an amazing hackathon. Participants will start with a boilerplate fullstack agent application connecting a LlamaIndex Agent to a frontend with AG-UI. The Agent will have access to thousands of tools via Composio. $20k+ in prizes on the line for the teams that can transform the template into a powerful fullstack agent for their use-case. Venue is the Microsoft SV Center, ๐๏ธ September 27th -- register today! https://t.co/zre2OXf1bw
Building a full-stack agent application? Our OSS engineer @itsclelia just dropped an example using LlamaIndex Workflows and @nextjs + @tailwindcss for the UI! How does it work? ๐๏ธ The workflow retrieves context from a @qdrant vector DB ๐ฌ๐ง @OpenAI is used as a translation layer ๐ The result is rendered in the UI Try it out ๐ https://t.co/ikR2pfhLSH Find the repo on GitHub ๐ https://t.co/zlGRw1Tiog Learn more about LlamaIndex TS workflows ๐ https://t.co/EfV0Sil0wv
I was at Vector Space Day in Berlin today, and it was nothing short of incredible. So many people, an amazing energy, a wonderful location and awesome talks! I got the opportunity of being among the speakers, and talking about how you can use @llama_index workflows in combination with @qdrant_engine to build context rich AI systems. I also got to meet many amazing people from the AI space I am looking to keep in touch with and, if I didn't get a chance to talk to you, you can always reach out to me on any platforms! Thanks @qdrant_engine again for this amazing event and looking forward to what's coming next!

Give Claude Code a semantic filesystem ๐๏ธ๐ ๏ธ Giving Claude Code access to the right CLI tools over your filesystem turns it into a general agent capable of automating far more knowledge work beyond code - it can do dynamic financial/legal/medical/technical/backoffice analysis over any subset of documents. With our latest release of semtools ๐ซ, you can now manually or *agentically* create a persistent workspace over any subset of files. This gives Claude Code the ability to get blazing-fast, local semantic search over any data, while still allowing it to chain with commands like grep/cat/etc. so that it can load in dynamic context instead of naive top-k vector search. The coding agent can dynamically index data and use those indexes, instead of having to rebuild it every time. So you get the benefits of fast search along with agentic reasoning over CLI tools mentioned above. Come check it out! https://t.co/xg1iqbghIr
This open-source NotebookLM alternative demonstrates a complete architecture for document-powered AI apps: ๐๏ธ Event-driven workflows orchestrate complex multi-step processes like document parsing, summary generation, and podcast creation โ๏ธ LlamaCloud handles the heavy lifting with automated document ingestion pipelines and structured data extraction ๐ State management allows workflows to save progress and resume later, perfect for long-running document processing tasks ๐ Built-in observability with @opentelemetry integration gives you insights into every step of your workflow execution The project integrates LlamaExtract for transforming documents into an initial notebook with a mind map, FAQs, summaries, and services like @elevenlabsio for text-to-speech generation. Explore the complete NotebookLlaMa implementation: https://t.co/Mj4I4Isjkp
Claude Sonnet 4.5 is here and we have day 0 support as usual! Sonnet 4.5 excels at coding, so we got it to generate its own fireworks celebration of its launch! Read Anthropic's announcement post here: https://t.co/QdRZ5qMyVn Check out our demo notebook (with the fireworks code) here: https://t.co/2WZrlMuTVQ Or get started with building right now! https://t.co/ABfVJyrWpW
Build an Express Agent with LlamaIndex TypeScript Workflows โก We have new docs walking through a production-ready agent system that combines tool-calling capabilities with real-world deployment patterns using our TypeScript workflows framework. ๐ค Create a workflow with a basic tool-calling agent loop that can execute tasks autonomously ๐ Add state management to track agent progress and maintain context across interactions ๐ค Implement human-in-the-loop patterns for critical decisions that require human approval ๐ Deploy your agent to an Express server for real-world usage Build a complete agent system from scratch, starting with basic tool calling and progressing to sophisticated state management and human oversight. You'll work with @OpenAI APIs and see how our workflow system handles complex agentic patterns with clean, readable TypeScript code. Perfect for developers who want to move beyond simple chatbots and build agents that can handle multi-step processes with proper oversight and deployment infrastructure. Start building: https://t.co/LGRtJIcmb3
Build and ship document agents 10ร faster ๐ We have some great news! LlamaAgents are here, and we're opening it up to early access for a small group of users to start trying! LlamaAgents help you deploy document agents with a single click! Automate document-centric tasks like invoice processing, contract review, and claims handling without the typical months-long development cycle: ๐ 90% ready-to-use templates powered by LlamaIndex, LlamaExtract, and LlamaCloud Index get you from idea to deployed agent in minutes โ๏ธ Deploy anywhere - use our managed cloud or bring your own infrastructure with headless, single-click deployments ๐ง Built for infinite extensibility unlike rigid SaaS agents - adapt workflows to your business logic, not the other way around โก Free your team from routine document processing to focus on higher-value work Join the early access waitlist and be among the first to build with LlamaAgents: https://t.co/a15jhO0xvP

We tested whether coding agents with CLI tools are all you need for complex document search and analysis tasks ๐งช Our new SemTools benchmark used 1000 @arxiv papers to compare agents with and without semantic search capabilities: ๐ Agents with semantic search provided more detailed, comprehensive answers across all question types โก CLI-based approach proves incredibly powerful relative to the effort - Unix tooling gives agents grep, find, and file system navigation out of the box ๐ Complex cross-referencing and temporal analysis tasks showed the biggest improvement with semantic search tools ๐ ๏ธ SemTools adds parse (via LlamaParse) and semantic search capabilities directly to command-line agents like @claudeai Code and Gemini CLI The combination of existing Unix tools plus semantic search capabilities can often replace more complex RAG setups while being faster to implement and more flexible to use. Read the full benchmark results and methodology: https://t.co/3LeaejfRWc

Had a great conversation with @jerryjliu0 from @llama_index, @dvellante and @knightrm about agents, open source, and enterprise-grade infra. ๐ Thank you to the team at @SiliconANGLE & @theCUBE for joining us at @UiPath FUSION! Check it out: https://t.co/vZJdRZW2OW
Build a full-stack website powered by a LlamaIndex agent in minutes with AG-UI from @CopilotKit and @composiohq! This full-featured template application for building Agent apps get you up and running in no time! Battle-tested from dozens of hackathon participants at our recent joint hackathon, it has everything you need to get started including the right docs, MCP servers, and troubleshooting. Check out the repo here: https://t.co/RpODVIlgmC
Surge is not a body shop. We work with humans who are skilled, creative, unique (+ paid well, imagine!). So we deliver the best data/AI training available. It helps that our CEO is the โMichael Jordan of AI data.โ https://t.co/YGYLkBTkl7
Thrilled to see @MetaAI launch ๐๐ฎ๐ถ๐ฎ๐ฎ, built inside their new ๐๐ด๐ฒ๐ป๐ ๐ฅ๐ ๐๐ป๐๐ถ๐ฟ๐ผ๐ป๐บ๐ฒ๐ป๐ platform! ๐ Proud that Surge AI helped contribute โ just as we did with GAIA two years ago. A quick story ๐งต https://t.co/Mc4whQiRsY
Infusing AI with humanity comes in many forms. Sometimes it involves drinking. https://t.co/xK3cmtNPHG
ReasoningBank: memory for self-evolving LLM agents โข Distills strategies from both successes & failures โข Enables agents to learn, reuse, and improve over time โข Outperforms prior memory methods on web & SWE tasks (+34.2% eff., โ16% steps) https://t.co/yxvj9vsMmM
RLHI: Reinforcement Learning from Human Interaction โข Moves beyond expert-annotated data โ learns from real user conversations โข Two methods: 1. User-Guided Rewrites 2. User-Based Rewards โข Outperforms baselines in personalization, instruction-following & reasoning https://t.co/JA6PR6it59
SWAX: short windows, long memory โข Hybrid of sliding-window attn + xLSTM RNN โข Counter-intuitive: shorter windows โ better long-term recall โข Fix: stochastic window sizes = strong short + long context performance โข Outperforms fixed window attention https://t.co/Lg44gM7S8E
Today, I am launching @axiommathai At Axiom, we are building a self-improving superintelligent reasoner, starting with an AI mathematician. https://t.co/pDedXx7SJc
@teortaxesTex @TheZvi As for context length, I think we will have more sophisticated context management, which should have greater impact than brute-force extension of attention span extension as 100M tokens length. No need to have everything stored as kv cache but make sure to store all the info organized and reachable in some way, and we can possibly store the table of contents fo the recipe in the kv cache. For example, we humans remember all the memorable GRPO papers w/o every fine-details, as we remember how to retrieve them (i.e. how to search the paper and which pages to go to find the relevant info, etc). This and some Claude Code updates may be relevant: https://t.co/FKWvyoSRFs