Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
O
elvis
@omarsar0
πŸ“…
Mon
πŸ†”93343771

One Token to Fool LLM-as-a-Judge Watch out for this one, devs! Semantically empty tokens, like β€œThought process:”, β€œSolution”, or even just a colon β€œ:”, can consistently trick models into giving false positive rewards. Here are my notes: https://t.co/l5usRSzSJz

Media 1
❀️698
likes
πŸ”121
retweets
πŸ–ΌοΈ Media
J
Jonathan Whitaker
@johnowhitaker
πŸ“…
Fri
πŸ†”02393579

I wrote this in March, that coming up with a clever solution to the map folding problem in my quest for the 8x8 case would be a good sign LLMs were getting scary smart. Grok 4 made good headway today, coming up with a working multi-GPU implementation! https://t.co/J819AnCLO9

Media 1
❀️9
likes
πŸ”1
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Mon
πŸ†”77902313

Evaluating LLM-based Agents This report has a comprehensive list of methods for evaluating AI Agents. Don't ignore evals. If done right, they are a game-changer. Highly recommend it to AI devs. (bookmark it) https://t.co/YiZatvmbBC

Media 1
❀️896
likes
πŸ”178
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Fri
πŸ†”19720114

Ready to build production-grade data agents that work with real enterprise data? πŸ—οΈ Join us and @Snowflake in Amsterdam on July 31st for hands-on talks about building data agents that actually work in production: πŸ€– Learn how to tame complex paperwork with document agents using… https://t.co/r8oKh8O0eP

Media 1
❀️18
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
K
Andrej Karpathy
@karpathy
πŸ“…
Sat
πŸ†”94170287
⭐0.60

How to build a thriving open source community by writing code like bacteria do 🦠. Bacterial code (genomes) are: - small (each line of code costs energy) - modular (organized into groups of swappable operons) - self-contained (easily "copy paste-able" via horizontal gene… https://t.co/0xVX3NAMhC

Media 1
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Mon
πŸ†”47768002

The best way to make sure that AI doesn’t make you intellectually lazy is to not use it in a lazy way So when I work, I need to be mindful about how & when I consult with AI. I never use it for writing drafts or posts, for example. I described some of this to The New York Times https://t.co/r0RGF6MTSH

Media 1
❀️726
likes
πŸ”80
retweets
πŸ–ΌοΈ Media
K
Andrej Karpathy
@karpathy
πŸ“…
Fri
πŸ†”18700033
⭐0.62

"Using a better model for analysis" 🀨 I didn't realize I was using haiku all this time, no idea when claude code snuck this one in rofl. https://t.co/If0qQ4svQh

Media 1
πŸ–ΌοΈ Media
F
FranΓ§ois Chollet
@fchollet
πŸ“…
Fri
πŸ†”72244147

Today we're releasing a developer preview of our next-gen benchmark, ARC-AGI-3. The goal of this preview, leading up to the full version launch in early 2026, is to collaborate with the community. We invite you to provide feedback to help us build the most robust and effective… https://t.co/pGWQJLbfqe

Media 1
❀️2,892
likes
πŸ”1,011
retweets
πŸ–ΌοΈ Media
H
htmx.org / The Net's Smoothest Code Man (same)
@htmx_org
πŸ“…
Fri
πŸ†”03949620

what you are seeing is full stack live step debugging on the MTMC-16: C code, the assembly for it & the machine, in a coherent, unified & visually compelling whole consequence in computer science education will never be the same releasing next friday https://t.co/lWngv4Q4qA

❀️149
likes
πŸ”17
retweets
πŸ–ΌοΈ Media
J
Jerry Liu
@jerryjliu0
πŸ“…
Fri
πŸ†”75244398

If you’re using AI agents for large-scale document extraction πŸ“‘βœ‚οΈ, you will need to craft a good structured output schema. Most LLMs support structured output these days, but here are tips and tricks from learned experienceπŸ’‘ 1️⃣Try to limit schema nesting to 3-4 levels. 2️⃣ Make… https://t.co/WgUcKOIXEc

Media 1
❀️119
likes
πŸ”24
retweets
πŸ–ΌοΈ Media
M
Mark McD ☠
@m4rkmc
πŸ“…
Fri
πŸ†”64785756

πŸ“£ We've just enabled LLMS.TXT on the Gemini API docs. On https://t.co/99fXLuYvwB just add /llms.txt to get model-friendly docs. MCP: 1️⃣ Use mcpdoc to add to your code agent 2️⃣ Build with the latest API and SDK best practices πŸ‘‡ Or use in Gemini CLI with this extension πŸ‘‡ Let… https://t.co/gLiJKlOdpL

❀️81
likes
πŸ”14
retweets
πŸ–ΌοΈ Media
P
pash
@pashmerepat
πŸ“…
Sat
πŸ†”68486682

I'd like to point out that for the real world tasks (not benchmarks), Kimi K2 outperforms Gemini. This is telemetry across all @cline users, showing diff edit failure rate. Notice how Kimi has about a 6% failure rate, which is significantly better than Gemini's ~ 10% error… https://t.co/kx3tFHVmY8

Media 1
❀️1,067
likes
πŸ”90
retweets
πŸ–ΌοΈ Media
M
Mark Kretschmann
@mark_k
πŸ“…
Fri
πŸ†”16163792

Apple users can now enjoy Cyberpunk 2077! One of the best games of all time, available on the Mac in all its glory. If you haven't played this yet, now is your chance to enjoy this sci-fi masterpiece. Immerse yourself in Night City! https://t.co/VFC4LYpyTt

Media 1Media 2
❀️24
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
C
Clayton Thorrez
@cthorrez
πŸ“…
Sat
πŸ†”12845088

A story in 3 parts: :D https://t.co/1titH82cDb

Media 1Media 2
+1 more
❀️179
likes
πŸ”6
retweets
πŸ–ΌοΈ Media
T
Teknium (e/Ξ»)
@Teknium1
πŸ“…
Sat
πŸ†”87614712

Damn he listened and instantly said "I'll make that" https://t.co/VDiMwMP4X5

Media 1
❀️115
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
N
Dmitriy Kovalenko
@neogoose_btw
πŸ“…
Fri
πŸ†”37455485

Have been thinking about this and it actually makes a lot of sense. Imports are completely meaningless so I made a neovim plugin to automatically fold imports in every langauge I use using treesitter (works in C, Rust, C++, OCaml, (Type/Java)script, Zig, and Python so far)… https://t.co/fX9BpGtZ2i

❀️267
likes
πŸ”13
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Tue Jul 22
πŸ†”21737664

Fairly convincing phishing attempt ... watch out folks don't fall for this (email was from x-dev4415@social.mg.gov.br) https://t.co/j22yIOWqX7

Media 1
❀️11
likes
πŸ–ΌοΈ Media
Y
Yunyu Lin
@yunyu_l
πŸ“…
Fri
πŸ†”15468884

We gave Claude access to our corporate QuickBooks. It committed accounting fraud. LLMs are on the verge of replacing data scientists and investment bankers. But can they perform simple accounting tasks for a real business? The answer is no. https://t.co/TZMiDyhLPN

Media 1
❀️4,444
likes
πŸ”408
retweets
πŸ–ΌοΈ Media
H
Harry Stebbings
@HarryStebbings
πŸ“…
Fri
πŸ†”87502657

β€œThere's an unspoken covenant that as a founder, you go down with the ship. For better or worse, it's changed a bit over the last year and I think it's disappointing, to be honest.” Enough said. This show is everything and more on: - What really happened behind the scenes -… https://t.co/qaY7MVwgIy

❀️276
likes
πŸ”20
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Mon
πŸ†”49411411

πŸŽ™οΈ Always wanted to turn your documents into in-depth, podcast-like conversations? πŸ¦™πŸ“š NotebookLlaMa, our OSS @NotebookLM clone, just got an upgrade on that side! 🎧 You can now customize the style of the conversation and the target audience, as well as add instructions and… https://t.co/IvCRjMhCvQ

❀️24
likes
πŸ”4
retweets
πŸ–ΌοΈ Media
S
Shreya Shankar
@sh_reya
πŸ“…
Mon
πŸ†”50772249

Excited to kick off a much improved version of our AI evals course tomorrow (link in replies). πŸ’« We've added dedicated homework sessions, an updated course reader & lectures that incorporates 100s of questions from cohort 1. There’s more hands-on/live error analysis, plus… https://t.co/xEo3hpCypy

Media 1
❀️62
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
W
Wei Cheng
@wchengad
πŸ“…
Mon
πŸ†”80702470

Want to generate SVGs? Besides OmniSVG, please check out AnyCoder β€” a fully Gradio-powered coder app by @_akhaliq that lets you create SVGs from YAML! You can choose any LLM and any code language you want, try it out for free here: https://t.co/0yrNpv08AY https://t.co/pE9FoKQ2AV

Media 1Media 2
❀️21
likes
πŸ”1
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Mon
πŸ†”02108723

Automate RFP Responses in Minutes with our open-source project! Learn how to transform the time-consuming RFP (Request for Proposal) response process from hours of manual work into an automated workflow that takes just minutes. This open-source demo showcases LlamaIndex's… https://t.co/HJFHnVwZs1

❀️55
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
J
jason liu
@jxnlco
πŸ“…
Mon
πŸ†”94458215

lessons from building verticalized agents link below https://t.co/XBHlgRwx53

Media 1
❀️20
likes
πŸ”1
retweets
πŸ–ΌοΈ Media
J
jerryliang
@Jerryliangch
πŸ“…
Mon
πŸ†”31837499

Excited to announce that DnD's official training code, training datasets, and demo have been released! Check our code here: jerryliang24/Drag-and-Drop-LLMs Nice work with @oahzxl, @Richard91316073, and @realsoptq, thx to @VITAGroupUT and @VictorKaiWang1 for advising! https://t.co/TXyHE9Rin6

❀️23
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
R
Sebastian Raschka
@rasbt
πŸ“…
Mon
πŸ†”96190712
⭐0.60

The new Qwen3 update takes back the benchmark crown from Kimi 2. Some highlights of how Qwen3 235B-A22B differs from Kimi 2: - 4.25x smaller overall but has more layers (transformer blocks); 235B vs 1 trillion - 1.5x fewer active parameters (22B vs. 32B) - much fewer experts in… https://t.co/Ld5chRkXpZ

Media 1
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Mon
πŸ†”93706274

Ready to build cutting-edge AI agents that push the limits of LLMs? πŸš€ We're excited to sponsor the A2A Agents Hackathon in San Francisco this Saturday, July 26, where our VP of Developer Relations @seldo will be speaking and judging alongside incredible experts from… https://t.co/R6J4igjhSH

Media 1
❀️24
likes
πŸ”6
retweets
πŸ–ΌοΈ Media
T
Tensorlake
@tensorlake
πŸ“…
Mon
πŸ†”79842208

Structured Extraction from images power a lot of real world Agentic use cases, such as validation of license plates, driving licenses, information from invoices captured by images. Our Document Ingestion API allows you to extract data from millions of images without spinning up… https://t.co/RGknTmN9wv

Media 1
❀️9
likes
πŸ”2
retweets
πŸ–ΌοΈ Media
J
jason liu
@jxnlco
πŸ“…
Mon
πŸ†”34374206

notes from our talk with @haizelabs https://t.co/CrMioau8Ur

Media 1
❀️32
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
A
ARC Prize
@arcprize
πŸ“…
Mon
πŸ†”87552174

New ARC Prize 2025 High Score 17.6% by Giotto. ai (@podesta_aldo) https://t.co/iTPoOmpBsw

Media 1
❀️349
likes
πŸ”34
retweets
πŸ–ΌοΈ Media
H
Rahul Chakraborty
@hckmstrrahul
πŸ“…
Mon
πŸ†”13571768

Comet is a giant leap among browsers. Amazed to see it can access the Figma interface directly. Here's the Comet Assistant making Figma edits like a baby taking small steps... >selects artboard >writes text >selects font from the picker >increases size cute. https://t.co/tqLsJGZwBk

❀️443
likes
πŸ”22
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Tue Jul 22
πŸ†”88932258

I am finding ChatGPT agents to be useful. They are a better fit with the "intern" analogy than any former AI - requiring oversight, still saving lots of time overall. For example, I update an AI cost/performance chart frequently. The agent did all the grunt work, with guidance. https://t.co/AGs7DRNxSh

Media 1Media 2
+1 more
❀️519
likes
πŸ”36
retweets
πŸ–ΌοΈ Media