Your curated collection of saved posts and media
how to make ai applications fast w/ @AarushSah_ https://t.co/e19VIOupFU
Open source notebooklm Today we're open sourcing our 100M voice models that can render conversations. This includes a 40kh base finetune that is capable of voice cloning. Our models can do a variety of non speech sounds! Try them out yourself! ... https://t.co/G1ptbF58CT
Knowledge or Reasoning? Evaluation matters, and even more so when using reasoning LLMs. Look at final response accuracy, but also pay attention to thinking trajectories. Lots of good findings on this one. Here are my notes: https://t.co/88Sk9LP7n6
Model growth by provider https://t.co/ETPk0UH0X9
Hereβs how to build an AI agent that auto-generates a company risk report over dozens of public filings ππ Batch analyzing a ton of documents and writing up a memo would take 20+ hours of work. Agents have the potential to automate this but they completely fall apart withoutβ¦ https://t.co/LAbnlBmNzu
One thing that drives me nuts is all the different coding agent rules that aren't immediately portable across Claude, Codex, Cursor, etc. BUT @intellectronica 's project fixes this π & provides one place to define your rules!! https://t.co/lBkQFGjcoa https://t.co/CLEdad2gHH
In this hands-on example, learn how we use LlamaExtract and agent workflows to automate SEC Form 4 extractions SEC Form 4 plays an important role in market transparency by forcing corporate officers, directors, and large shareholders to disclose their stock trades.. In this⦠https://t.co/19mlnbarPI
NEW: OpenAI Deep Research now connects with sources beyond the web. Integrations include Gmail, Google Calendar, HubSpot, GDrive, Linear, etc. More from the live stream below: https://t.co/XJa1IUqZZY
π¨ New Paper Alert π¨ We found that Supervised Fine-tuning on ONE problem can achieve similar performance gain as RL on ONE problem with 20x less compute! Paper: https://t.co/K5cxDNs1Gu Recently, people have shown that RL can work even with ONE example. This indicates that theβ¦ https://t.co/p9d2OwgIGo
Coding Agents π€ Multimodal Browsing Can AI agents generalize beyond their intended scope? Great paper on how you can build generalist agents with superior performance over specialized agents. What models and tools work the best? Here are my notes: https://t.co/ntJWM1Hr8H
"Claude 4 Opus, build an elaborate game that makes it feel like I'm a brilliant chess player without knowing anything at all about chess. It should make me feel like I'm a grand master. Feel free to go as meta as you want." https://t.co/iMfVGErlfb
Claude 4 models are great at coding, just way too costly in @cursor_ai max mode, 4 days of usage into June. Back to @GoogleAI Gemini 2.5 Pro, best bang for buck. https://t.co/9ZnR8dKbbq
Granola getting Sherlocked by OpenAI On the heels of Anthropic throttling Windsurfβs access to Claude 4 At some point model providers are going to need to decide if they want to be stable platforms or compete for every vertical Platform risk has never been higher https://t.co/MhhjQQRoY8
impressions are correlated with number of posts https://t.co/wc9JXhGIxz
I think the general flow now is literally 1. Get amp to write types, and I approve 2. Get amp to write tests and I approve 3. refactor galore without fear Really good flow so far, next step git work trees! https://t.co/hfAYnkIOMC
New short course: DSPy: Build and Optimize Agentic Apps DSPy is a powerful open-source framework for automatically tuning prompts for GenAI applications. In this course, you'll learn to use DSPy, together with MLflow. This is built in partnership with @databricks and taught by⦠https://t.co/1KRz7XvxXe
It's not just gaming the system. Even the seemingly simplest institutional processes that elites navigate without even thinking about it present huge barriers to those without the right background and cultural capital. Reading some of these stories can be illuminating, like the⦠https://t.co/y6mlddB2gv
Eric and the team atΒ @genspark_ai have been crushing it lately -- proud of them. They keep adding new specialized agents for common use cases. Latest one is the Download Agent. Imagine you need a bunch of files β YouTube tutorials, research papers, logo images. Instead of aβ¦ https://t.co/4Or20YtA1U

One of our latest LlamaIndex integrations is everything MCP, enhancing agent capabilities and workflow deployment. π This integration brings two key features: β‘οΈ Helper functions for LlamaIndex agents to use MCP server tools β‘οΈ Ability to serve any LlamaIndex workflow as an MCPβ¦ https://t.co/Uwcn27wF6Z
π¨ [New paper alert] Esoteric Language Models (Eso-LMs) First Diffusion LM to support KV caching w/o compromising parallel generation. π₯ Sets new SOTA on the sampling speedβquality Pareto frontier π₯ π 65Γ faster than MDLM β‘ 4Γ faster than Block Diffusion π Paper:β¦ https://t.co/e2kmAnYYR5
We trained all of the Nomic Embed models on limited compute. One trick that helped us train SoTA embeddings on 16 H100s? GradCache, a gradient checkpointing-like technique tailored for contrastive learning. I kept forgetting how it works, so I dug into the math and wrote about it⦠https://t.co/d0s8zR7nK0
I think AI has huge educational potential, but more in-class tests is a reasonable short-term response to cheating risks. Low-stakes testing is a powerful learning (not just assessment) tool. Tests help you remember better, access unrelated knowledge & learn more in the future. https://t.co/bWreV9n2GX

We've reached a major milestone in fully decentralized training: for the first time, we've demonstrated that a large language model can be split and trained across consumer devices connected over the internet - with no loss in speed or performance. https://t.co/mNuhcvS0g2
you will stop vibe coding you will not spend 2h prompting something you can code in 15min you will learn python https://t.co/cnqNxDBBMh
Assistant Agents vs. Automation Agents π¬π€βοΈ Iβve noticed that a lot of agent use cases can be characterized as having an assistant UX vs. an automation UX, and use cases are increasingly falling in the latter bucket. Assistant UX - your standard chat assistant that can doβ¦ https://t.co/UmupOcLeln

Huge fan of Claude Codeβso I built a python version using smolagents! Introducing SmolCC π€ππ οΈ An open source coding agent with Claude Code style tools (bash, grep, editβ¦β¨) that can be easily customized. https://t.co/n7Pujmqy6S
notes from @atroyn's talk last cohort https://t.co/GGaSFpj6le
Dozens of teams have asked my advice on running LLMs. How fast is @deepseek_ai V3 with vLLM on 8 GPUs? What's the max throughput of @Alibaba_Qwen 2.5 Coder with SGLang on one H100? Running & sharing benchmarks ad hoc was too slow So we built a tiny app, the LLM Engine Advisor https://t.co/FJP0oVUEye
Here are those warnings about why you have to be careful giving Codex access to the internet https://t.co/ckA95vGm5j https://t.co/8YV28l6MnM
TLDR, you can speed up training by initializing some layers with dinov2 layers instead of random weights https://t.co/SAXlAj12Ke
π« Never manually parse a PDF again β Build e2e agents that automate knowledge work @llama_index is officially at the @aiDotEngineer World Fair this year. Come check us out at Booth G11! π https://t.co/JZjM8AuDxs
Should I build automated evaluators for every failure mode I find? Links in reply https://t.co/tJzsRpHv5G