Your curated collection of saved posts and media
Deleted this post because Kevin is right - it isnβt clear that a better prompted LLM, or a consensus or pass experiment with multiple attempts, would not be able to solve more of these problems. Testing LLMs is challenging for these reasons. https://t.co/nM6wc4Rg7b
Game changer for scraping. This repo lets you easily scrape web pages and have the output in LLM-friendly formats (JSON, cleaned HTML, markdown). ⒠Supports crawling multiple URLs simultaneously ⒠Extracts and returns all media tags (Images, Audio, and Video) ⒠Extracts all⦠https://t.co/bVx1OUeEIB
Axolotl is out v0.8.0 today! Major features include support for Sequence Parallelism, Gemma3, Multimodal (beta), Muon optimizer, and a major expansion to our docs! We've worked to make sure that our features are composable leading to 3.6x speedups over vanilla HF+FA2 with >50%β¦ https://t.co/WCZ0IJaqcz
Deleted this. The key point is true (all the major labs offer secure versions that will not train on your data governed by the same rules as other cloud services) and also it is true Claude will not train on your data. However, the policies for free Gemini are not that clear. https://t.co/yfhOY8dhZz
Pretty cool "Multi-Head Attention Shape Transformations (Cheat Sheet)" shared by a reader: https://t.co/9Nprk4XHgJ https://t.co/VBft7wtuR7
We've implemented a simple toolkit for fine-tuning powerful coding models using only RL with an entirely local, zero-setup sandboxed code interpreter. We found very promising results using a tiny fraction of data & training time vs SFT. Check out our blogpost for more details! πβ¦ https://t.co/IMiRO3LS3C
AI 2027? π€ AI predictions are wild... what do you think? https://t.co/HbEz3UmmT6
This was pretty impressive for a one-shot from Gemini 2.5 with the only prompt being: "the poem lepanto, but about the war of 1812" https://t.co/OxF5Kz8rFP
This is very cool, and a really impressive step forward. I do think the classic cartoon format makes some of the less coherent storytelling and prompt-following seem less relevant (which the authors acknowledge), this is still the weakness of AI video. For example, this is the⦠https://t.co/d3c5j5XGJL
Working with multimodal content has never been easier with our new Image, Audio and PDF classes. Added 200,000 new downloads with the new launch! Check it out at https://t.co/ch3nc9HEXd https://t.co/ARwf3aZhDW
Going with a crazy marketing copy for Parlance Labs. Is it too crazy? https://t.co/c5kFtRPYkI
Google might've created the successor of the Transformer architecture. It's a new architecture that pairs attention with a learnable long-term memory module. Attention handles short-term context with accurate dependency modeling. The memory module stores and retrieves⦠https://t.co/HO1cS3RIG1
Weβre excited to introduce a brand-new layout agent within LlamaParse that gives you the best-in-class document parsing and extraction with precise visual citations. It uses SOTA VLM models to 1) detect all the blocks on a page (tables/charts/paragraphs), and 2) dynamicallyβ¦ https://t.co/2WRRXxIRa1
Our brand-new layout agent uses state-of-the-art LLMs ranging from faster/cheaper to larger/better (Flash 2.0 to Sonnet-3.7) to dynamically parse a page in a layout aware way. The layout agent first parses the overall layout of the document and breaks it into chunks. It then⦠https://t.co/mh8lXJGUYq
I still receive several consulting requests for optimizing prompts for RAG and agentic systems. It's usually the same techniques that work really well. So I've packaged prompting best practices in this 4hr course (all code). Learn it once and you're good going forward. https://t.co/xcMvdNoSVX
// Tracing LLM Outputs Back to Trillions of Training Tokens // Presents OLMOTRACE, the first system that can trace LLM outputs verbatim back to their entire multi-trillion-token training sets in real time! https://t.co/Xs4R7vJcx7
Some interesting points: - They are now data-constrained, not compute-constrained. Future progress relies on algos w/ better sample-efficiecy. - Training GPT-4 now requires only 5-10 people. - Expect 10M+ GPU training runs, potentially βsemi-synchronousβ or decentralized. https://t.co/dV82TY0S7P
I can confirm that Gemini 2.5 Pro is really great at creative writing. Already using it for agentic systems that require editing, reviewing, writing, and refining outputs. https://t.co/pdXQI09UsD
Multimodal foundation models will enable new workflows in medical imaging (and healthcare more broadly), especially compared to the conventional approach This is what we focus on at @SophontAI https://t.co/E3ujIBnqXw
I built a small voice agent that goes out and searches the web using GPT-4o's web browsing when you ask it for recommendations with @livekit Tool Calling a bit funky but otherwise quite happy with the progress so far. https://t.co/6gqZoDFuxn
As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (https://t.co/SwxBs8RsTq) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL⦠https://t.co/JnTmDNiVgg
people used to pay me like 20k just explaining and helping people set up generative benchmarking Come check out this lightning lesson where kelly explains why it's so important that evaluations are used as we move systems from prototype to production https://t.co/nN8Z3p3sNk https://t.co/YuJ0bkil2P
made the mistake of continuing down this rabbit π³οΈ apparently babistories itself is a synthetically generated dataset from the memory mosaics paper (https://t.co/lYKd4dPsBH), which is a synthetic copy of tinystories, which itself is also synthetically generated so the "strongβ¦ https://t.co/gN4bUzwJDT
New video + post + experiment. What if you could take a dumb model and a smart model and interpolate between them? Then extrapolate out to get an even better response? I talk about some related papers, then try out an idea, documenting the process and the (negative) result. https://t.co/ofOR118TwW
Google quietly released a powerful recommender systems library optimized for JAX and TPUs, based on Keras. It's called RecML. It has native support for SparseCore (latest hardware for handling large distributed embeddings) https://t.co/IBNlKXUwcz
π Excited to share that we won the AI Math Olympiad competition on @kaggle with a mind-blowing score of solving 34/50 student-level math problems using a LLM. Summary of our solution below. https://t.co/kpNygidHAV
Starting today, memory in ChatGPT can now reference all of your past chats to provide more personalized responses, drawing on your preferences and interests to make it even more helpful for writing, getting advice, learning, and beyond. https://t.co/s9BrWl94iY
from my notes on the childhoods of people who went on to do exceptional work https://t.co/cIxmlAh3et
AI agents fact-checking each other reduce hallucinations by over 2,800%. This new research paper introduces a 4-agent NLP pipeline that flags, explains, and rewrites hallucinated content. Each agent runs a different LLM and focuses on a distinct taskβgeneration, review,β¦ https://t.co/h5KKXvrcib
which VLM is the best? we are building @roboflow VLM playground to find out - test multiple VLMs in parallel and for free - open-source VLMs like PaliGemma and DeepSeek-VL coming soon - we had GPT-4o as well, but @OpenAI banned us for "distillation"; wtf? link:β¦ https://t.co/G9NyC2J726
added gemini 2.5 pro support to Claude Code feels faster and smarter than Sonnet 3.7 my go to local coding assistant now link below β¬οΈ https://t.co/Q2nnIM2Dzo
You can now get `fastkmeans` at your nearest PyPi reseller. It serves just one (1) purpose: run GPU-accelerated k-means that can do 200k+ clusters without going OOM without any installation pain. (bonus: the API mimics both faiss & sklearn, so it slots in just about anywhere.) https://t.co/WlfzN4t89k