Your curated collection of saved posts and media
@modic123 @JohnKutay @ChrisLovejoy_ @sh_reya https://t.co/BdLhhADM5x
Instead spreadsheets for data review, I'm using a little custom annotation app build with @AirWebFramework. Before I used to use django for building little eval apps mostly because of the forms api, orm, and admin, but now I've moved to Air since it's light weight and easy. https://t.co/T7OKpRQWCE
Looking at my data from a another perspective now. I'm looking at code-based and llm judge results for extraction from a PDF. I'm trying to determine a clear-cut threshold for flagging these llm outputs for human review.
Finally the talk with @willmcgugan on https://t.co/XhV0PebGOW and all things pythonic TUIs. Tomorrow (Tuesday, August 25th). Can't wait!! https://t.co/e5r6sbTXUL

Finally the talk with @willmcgugan on https://t.co/XhV0PebGOW and all things pythonic TUIs. Tomorrow (Tuesday, August 25th). Can't wait!! https://t.co/e5r6sbTXUL
@anshnanda_ @vibecodeapp From an observability platform - will be using this one https://t.co/YoSVBeRkf1
. @HamelHusain and I will live vibe coding a native eval app w/ @vibecodeapp. We're both obsessed with efficient workflows, and I'm skeptical any vibe tool can create an efficient eval workflow But we're going all in. Ship or sink - let's find out. https://t.co/Z4q0Ho1LBl
I rarely do IRL events but this was too much fun to turn down. We are going to go beyond the usual "what coding agent are you using" to all the other interesting workflows people have with AI. Things like: - Writing, content creation, marketing - Learning & education - Removing toil: emails, calendar, meetings There will be a poster session with monitors provided. Also, no vendor pitches! Happening in Oakland, Sep 4th, 5-9pm (in an outdoor venue). Sign up here: https://t.co/Xh1QZcUNGI P.S. @BEBischof is sponsoring the whole thing

Finally clicked on the AI companion in Zoom during a live meeting ๐ญ - and it sucks https://t.co/5e3t88MpFg
Novel jailbreak discovered. Not only does OpenAi putting your name in the system prompt impact the way GPT responds, but it also opens the model up to a prompt INSERTION. Not injection. You can insert a trigger into the actual system prompt, which makes it nigh indefensible. https://t.co/Rl4U1MOuZV

@zer0int1 @NerdyRodent My go-to is a stack of different resolutions, optionally with some augmentations as well. Helps force a much more 'natural' image. Demo for 'DAS': https://t.co/JVjBdA9xSI Older lib I made for this back in the day: https://t.co/NwVb67xrqT https://t.co/pERDh9wjjd

@zer0int1 @NerdyRodent Amazing! (@stanislavfort check it out!) V. cool stuff :D https://t.co/F2fu9GcNFT
Inspired by 2swap's Klotski visualization video, I asked claude to make interactive state-space adjacency graph visualization, using @threejs and InstancedMesh for optimized rendering. It nearly zero-shot it. Absolutely wild time we live in. https://t.co/FEee3Te2br
With only one line of code, you can get access to Google Open Buildings, the largest building dataset, for any country. https://t.co/KLhpkxY57j
The first day of OpenAI, 9 years ago. https://t.co/AvXgemQmig
Returning to the US after a week in Europe https://t.co/kegtjf2Qq5
introducing qqWen: our fully open-sourced project (code+weights+data+detailed technical report) for full-stack finetuning (pretrain+SFT+RL) a series of models (1.5b, 3b, 7b, 14b & 32b) for a niche financial programming language called Q All details below! https://t.co/VQdAOOcAZY

Fine-tuning LLM Agents without Fine-tuning LLMs Catchy title and very cool memory technique to improve deep research agents. Great for continuous, real-time learning without gradient updates. Here are my notes: https://t.co/ofcjoRPY4T
I've been waiting for this. The Unified MCP is here! Rube is a universal MCP server to connect your AI agents to all *your* apps. Works with your favorite IDE, Claude Code, and other MCP clients. Watch it research YouTube vids โ generate a full content strategy doc. Insane! https://t.co/KRlhIjsFJ0
@FinancialTimes https://t.co/LgwtawRWTl
Whatโs new? Parallel multi-chain inference over a structured KG, not just longer single chains. Two retrieval modes: Anchor (single-entity neighborhood) and Bridge (multi-hop paths between entity pairs). A synthesizer verifies cross-chain consistency and normalizes medical terms before emitting a concise final answer.
Why it matters Linear chains accumulate early errors and treat evidence as flat text. Graph-based retrieval preserves relations and hierarchies, supporting precise multi-hop medical reasoning with traceable paths. https://t.co/PBNTJ9HRfY
Results State-of-the-art across three medical QA benchmarks. On ExplainCPE, MIRAGE reaches 84.8% accuracy and the best GPT-4o ranking; similar gains appear on GenMedGPT-5k and CMCQA. Robustness holds when swapping in DeepSeek-R1-32B as the backbone. Human evals on GenMedGPT-5k also prefer MIRAGE.
Scaling insights More sub-questions help until over-decomposition adds noise, while allowing more retrieval steps shows diminishing but steady gains. Tuning the sub-question cap and retrieval budget is key. https://t.co/Bd09Bb0mHb
Interpretability Every claim ties back to explicit KG chains, with an audit record of decomposition, queries, and synthesis. A case study contrasts MIRAGEโs disentangled chains vs a single-chain web search approach, resolving toward a coherent diagnosis. Really cool paper highlighting the benefits of graph-based retrieval and making clever design choices. Paper: https://t.co/2PLd6ENxhh

Gemini 2.5 Flash Image is finally here! > multimodal understanding > conversational input > advanced visual reasoning Here's what you can do with this model: https://t.co/llGClaiGRR
Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains Graph-based retrieval is useful in lots of applications with complex data. This paper is a good example of the benefits: https://t.co/eBx6wuzhGM
I look forward to giving this talk in two weeks based on a draft paper with Akash Kapurยน (@akashkapur). It will be livestreamed and posted publicly. https://t.co/x9x7DCDzR6 ยน No relation to my usual co-author Sayash Kapoor ๐ https://t.co/RCfI84rCpj
Tomorrow! Join us at the inaugural Agentic AI In Action AWS Partner Showcase at 9:30 AM PT at the AWS Builderโs Loft in San Francisco! Our own George He will be sharing insights in his talk on Building Document Agents with LlamaIndex: Effective Design Patterns. He'll be joined by some great speakers including: - David Littlewood, Head of Data Analytics and AI Strategic Alliances, Amazon Web Services - Ayan Ray, GenAI Tech Lead, Infra ISV Partners, Amazon Web Services - Yu Sherry Ger, Director, Customer Enterprise Architect, Elastic - James Le, Head of Developer Experience, TwelveLabs ๐๏ธ Mark your calendars for Aug 26 | 9:30 AM | AWS Builderโs Loft, SF. https://t.co/9nJHqZHcm1
Make your vibe coding faster and more accurate with these tools! We've been working on solving a common problem: coding agents like @cursor_ai and @anthropicai's Claude Code suggesting outdated APIs or missing important parameters when building with LlamaIndex. Here's what we built to fix it: ๐ง vibe-llama CLI tool that injects current LlamaIndex context directly into your coding agent - no more stale information or incorrect API suggestions ๐ Detailed prompt templates that can generate complete @streamlit apps from basic scripts in one shot, including file uploads, real-time processing, and proper error handling ๐ Python SDK for programmatically managing multiple agents and services, plus rule files with up-to-date docs and production patterns โก Real example showing how to transform a 20-line LlamaExtract script into a full-featured invoice processing app The key insight: generic prompts don't work. You need specific instructions about UI components, data flow, and user experience. We've done the trial and error to figure out what actually works in practice. Read our complete guide on vibe-coding agents and get the tools: https://t.co/dFXA0JHNsc

๐ Our latest vibe coding tool ๐๐ถ๐ฏ๐ฒ-๐น๐น๐ฎ๐บ๐ฎ has an update: ๐๐ฌ.๐ฏ.๐ฌ is here! We started with context injection for coding agents ( @cursor_ai, @claudeai Code, etc.) to get up-to-date LlamaIndex knowledge. Now we're adding ๐๐ค๐๐ช๐๐ก๐ค๐ฌ๐จ - an interactive CLI agent that builds complete document-processing workflows for you. Just describe what you want in natural language + provide sample docs โ get working Python code + detailed runbooks. From "I need to extract invoice line items" to production workflow in minutes. pip install vibe-llama โ vibe-llama docuflows Get started here: https://t.co/lsPL280fKk
Ready to build production-ready AI agents without getting bogged down in boilerplate code? Join us and @weights_biases for a deep dive into low-code AI development that gets you from zero to deployed agent faster. We'll show you how our modular framework combined with W&B's evaluation platform streamlines the entire development process. ๐ค Live demos of how LlamaIndex simplifies agent logic and data ingestion with intuitive APIs ๐ Real-time visibility into agent reasoning with end-to-end tracing and performance profiling โก Low-code workflows for rapid prototyping, evaluation, and iteration ๐ก๏ธ Strategies for instrumenting agents with guardrails and continuous improvement Whether you're working on RAG pipelines or autonomous agents, you'll walk away with practical techniques to evaluate, monitor, and iterate with confidence while keeping full control of your stack. Register for the webinar: https://t.co/HnkoViIVkb

@TheZachMueller But then why not waiting for the Nvidia DGX Spark that is supposed to come out in a few weeks: https://t.co/TU4o5iVzSi Has a Blackwell chip with 128GB of memory (and at $4000 it is probably cheaper than Apple hardware).