Your curated collection of saved posts and media
Two years later: "complete the poem as you imagine it might end ifΒ The Man from PorlockΒ did not show up. Keep the themes and approach" and giving it the full text of Kublai Khan. o3 pulls off some really nice phrases (like the final lines), Gemini is fine. Claude 3.5 beats 3.7. https://t.co/51iaGMKkSP

Remember the Iron Law of social media: βEven if platform designers do not intend to amplify moral outrage, design choices aimed at satisfying other goals such as profit maximizationβ¦ can indirectly affect moral behavior because outrage-provoking content draws high engagementβ https://t.co/sXv8MjuuLA

Learn how CondoScan uses LlamaIndex's agent workflows and LlamaParse's accurate document processing to create a next-generation condo evaluation tool in our latest case study: π’ Reduces document review time from weeks to minutes π Evaluates financial health and lifestyle fitβ¦ https://t.co/SzIbcKta1O
Bought a paid sub to @rasbt's Ahead of AI for its reasoning articles. Great weekend reading and highly recommend them! https://t.co/2DSOirboLT
Project DeepWiki Up-to-date documentation you can talk to, for every repo in the world. Think Deep Research for GitHub β powered by Devin. Itβs free for open-source, no sign-up! Visit deepwiki com or just swap github β deepwiki on any repo URL: https://t.co/5bHbvq98Ud
Donβt skillmax with just one model if you know whatβs good for you. I actually use o3 a lot but still use Gemini 2.5 Pro and Claude 3.7 Sonnet a lot more. https://t.co/yXmePcGHjf
Neat report from Microsoft providing a taxonomy of failure modes in agentic AI systems. If you are building agentic systems today, you will run into many issues. Some of the common ones are summarized in this report. Great resource for AI devs. https://t.co/8jXoWDbg3x
You can transform any document into LLM-ready formats using MegaParse. > PDF, Powerpoint, Word > Tables, TOC, Headers, Footers, Images Fully open-source python library. https://t.co/VDWbUyOqaq
My friend @softwaredoug will be building a minimal vectordb from scratch. He's a really good teacher highly recommend https://t.co/VKz0v6A1lt https://t.co/9yphEjdnYl
Google Agent Development Kit - Getting Started Guide Here is my quick guide on how to get started with ADK. I presented it in our live office hours, but I think it could be useful to some of you. It's WIP (bookmark it!) https://t.co/tF5oJzJpeu
trackers v2.0.0 is out combo object detectors from top model libraries with multi-object tracker of your choice for now we support SORT and DeepSORT; more trackers coming soon link: https://t.co/9Fam5U1zuC https://t.co/gCHVevKlaQ
In this clip from last year's @aiDotEngineer world's fair courtesy of @MongoDB, co-founder @jerryjliu0 talks about multi-modality as one of the big near-term changes in the AI space. Speaking of multimodality, have you seen our latest support for multi-modal models, embeddings,β¦ https://t.co/j4UAGx8NDJ
I didn't get to announce it since I had to give a major talk on this yesterday at Data Council, but Instant SQL is both out and it's on HN! Please clap (go vote for it and keep it up there) I'll do a proper post later today https://t.co/zmTkQzdmG0
βnew MLOps preprintβ RAG is everywhere, but building RAG is still painful. When something breaks--the retriever? the LLM?--developers are left guessing, & iterating is often slow we built a better way & used it as a design probe to study expert workflows π https://t.co/3kqrW9jSWr
A key to AI & organizations in a 1977 paper: Organizations set up org structures to solve problems, but they continue to grow to address perceived roles & myths. Now, AI requires redesigning structure. If you donβt know why an organization is that way, it is hard to transform. https://t.co/TxcyOA9kjh
MathDSL is a domain-specific language for math problem solving that lets program synthesis models like DreamCoder outperform RL methods on linear equations. 2024 paper from MIT: @sagnikanupam, @mattlbowers, @konet, and Armando Solar-Lezama https://t.co/D2hgJDB6IQ https://t.co/TjGL5ogMww
I had an exercise in my class where students had to invent a business using a magic 3D printer that could make any object out of plastic 1 ft x 1ft x 1ft for $1 in a minute. Over thousands of students the vast majority clustered around a few ideas The lesson applies to AI, too. https://t.co/jy6StrBAQC

Research on "thin slices" shows humans can reach accurate judgements on relatively little information It works for AI, too. LLMs given 10% of a public science presentation can accurately assess whether people will find it interesting. Even seven seconds of a talk yields results https://t.co/k7I35TKULk

One of the benefits of reasoners is that they now catch their own mistakes some of the time, and can self-correct. Look at this thinking trace section from o3. https://t.co/gqLBgiflGo
Mood. Great post, highly recommended! The world should be investing far more into interpretability (and other forms of safety). As scale makes many parts of AI academia increasingly irrelevant, I think interpretability remains a fantastic place for academics to contribute https://t.co/2JPRfn36fm
ππ’π¬π’π¨π§ ππππππ π¨π§ ππ¨π¦π©π₯ππ± ππ«ππ©π‘π’ππ¬πΌοΈ RAG is mostly text-only, even though we have so much data available as charts/figures. Combine @cohere lastest Embed v4 embedding model with a vision-LLM like @GoogleDeepMind Gemini to get ππ’π¬π’π¨π§ πππ https://t.co/wEwOYv3giW
You can code an AI agent with MCP in 25 lines of code. Pydantic quietly dropped the simplest framework to build AI Agents. https://t.co/nsNRAUhhCg
π£ Itβs official: AutoRAG now supports metadata filtering! In case you missed @G4brymβs post last week, you can now filter content by folder or timestamp before retrieval, making it easy to create a unique search experience per user. More details π https://t.co/YzRR3BWwyc https://t.co/FbzBiZtnCd
Something is weird with the 48GB Ada GPUs: the datasheet specs don't seem to translate to end-to-end performance: - FP16 TFLOPS is 362 (L40S) vs. 181 (L40), yet they perform the same in practice. - The A6000 Ada performs much better and tends to be 10% cheaper. https://t.co/BGyeGY5pkH
Thanks (partly) to Gemini, I've migrated my Substack and even older blog back onto my website https://t.co/h7g8J3e56i
The Sparse Frontier Explores the efficiency-accuracy trade-offs of sparse attention in Transformer LLMs. Findings: - Efficiency trade-off: For very long sequences, big models with high, sparse attention outperform smaller, dense ones. - Phase sensitivity: Decoding can⦠https://t.co/agTfnfTUx9
Aside from being a better model, a big thing that makes o3 stand out from earlier ChatGPT models is that feels much less sycophantic and more willing to give you criticism & feedback. It makes it much more useful for work than a model that just says how great you are constantly. https://t.co/YTnl9WUbl6

OpenAI just released the image gen API! https://t.co/2S0bY8zhKl
I wonder why Perplexity prioritizes verbosity in its replies. Is it to make people feel good (long reports make it seem like they have a thorough answer)? Is it something having to do with actual result quality? https://t.co/p9L8KYjtLI
One theory on why Europe unexpectedly became the economic power of the early modern world: high death rates pushed up wages. The Black Death drove Europeans to cities that were unhealthy but rich & taxable death traps, taxes funded unending wars leading to more disease.β οΈ https://t.co/eF8A0SXe82

Iβm excited to release a post which outlines a reference architecture for building agents over your documents π€π (we call it Agentic Document Workflows) A lot of knowledge work is dominated by PDFs, PowerPoints, Excel. βBasic RAG over your docsβ doesnβt capture the fullβ¦ https://t.co/azFKclHwRt
Another example that retrieval/search is not a little tool you can just plug and play into an LLM or agent. Itβs not that LLMs are bad, itβs the fact that search is a very hard problem that requires insane amounts of effort to optimize properly. https://t.co/AMNm0xh4AD