Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
C
ch402
@ch402
πŸ“…
Aug 05, 2025
265d ago
πŸ†”53224694

Valuable synthesis across labs! Make sure to check out the tutorial video - https://t.co/nqh6ZFaxat

@neuronpedia β€’ Tue Aug 05 17:23

Today, we're releasing The Circuit Analysis Research Landscape: an interpretability post extending & open sourcing Anthropic's circuit tracing work, co-authored by @Anthropic, @GoogleDeepMind, @GoodfireAI @AiEleuther, and @decode_research. Here's a quick demo, details follow

Media 1
πŸ–ΌοΈ Media
C
ch402
@ch402
πŸ“…
Aug 08, 2025
262d ago
πŸ†”51162692

When we first started working with transcoders, I didn't really appreciate what a big change the were... https://t.co/Pm3YiWCMK8

Media 1
πŸ–ΌοΈ Media
C
ch402
@ch402
πŸ“…
Aug 08, 2025
262d ago
πŸ†”91013654

Let's consider a very simple problem, having a transcoder mimic the absolute value function. You can do this with two features per dimension: https://t.co/DWswkZmZVW

Media 1
πŸ–ΌοΈ Media
C
ch402
@ch402
πŸ“…
Aug 08, 2025
262d ago
πŸ†”65583139

Transcoders can will learn the perfect solution! https://t.co/MoewAdgDOo

Media 1
πŸ–ΌοΈ Media
C
ch402
@ch402
πŸ“…
Aug 08, 2025
262d ago
πŸ†”94818252

But now let's add a repeated data point to the transcoder training data, p=[1,1,1,0,0,0,0...] The transcoder now learns a special feature to memorize that point! https://t.co/5ghvWDjqbg

Media 1
πŸ–ΌοΈ Media
C
ch402
@ch402
πŸ“…
Aug 08, 2025
262d ago
πŸ†”96847534

It turns out there's a fix! If we ask to match the Jacobian of absolute value, we get the correct solution again. https://t.co/dJ9acDGrZU

Media 1
πŸ–ΌοΈ Media
C
ch402
@ch402
πŸ“…
Aug 08, 2025
262d ago
πŸ†”32673662

What's the point of all of this? For me, this question of mechanistic faithfulness is the most important question in all the SAE debate. I think it's often mixed in with other things and kind of implicit, and I wanted to have a simple example that clearly isolates it. https://t.co/VqLQSKO7D7

Media 1
πŸ–ΌοΈ Media
C
ch402
@ch402
πŸ“…
Aug 08, 2025
262d ago
πŸ†”52769841

Our recent work on attribution graphs (https://t.co/qbIhdV7OKz ) and extending it to attention (https://t.co/Mf8JLvWH9K ), point towards how much potential they have if we can mitigate the issues.

Media 1
πŸ–ΌοΈ Media
A
AnthropicAI
@AnthropicAI
πŸ“…
Jul 29, 2025
272d ago
πŸ†”53659432

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places. https://t.co/wJWRRTt4DG

Media 1
πŸ–ΌοΈ Media
A
AnthropicAI
@AnthropicAI
πŸ“…
Jul 29, 2025
272d ago
πŸ†”22125508

The program will run for ~two months, with opportunities to extend for an additional four based on progress and performance. Apply by August 17 to join us in any of these locations: - US: https://t.co/BhrekQsl8F - UK: https://t.co/TPYNEony83 - Canada: https://t.co/F00QZ0hjqw

Media 1Media 2
+1 more
πŸ–ΌοΈ Media
L
livgorton
@livgorton
πŸ“…
Aug 26, 2025
244d ago
πŸ†”02657654

What if adversarial examples aren't a bug, but a direct consequence of how neural networks process information? We've found evidence that superposition – the way networks represent many more features than they have neurons – might cause adversarial examples. https://t.co/YL11r2FeOw

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Jun 18, 2025
312d ago
πŸ†”01649271

Assignment 3 (scaling laws): fit scaling laws using IsoFLOP. To simulate the high-stakes of a training run, students got a training API [hyperparameters -> loss] and a fixed compute budget, and had to choose which runs to submit to gather data points. Behind the scenes, the training API was backed by interpolating between a bunch of precomputed runs. https://t.co/JpaDT8wIoE

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Jun 18, 2025
312d ago
πŸ†”58535324

Assignment 4 (data): convert Common Crawl HTML to text, filter filter filter (quality, harmful content, PII), deduplication. This is the grunt work that doesn’t get enough appreciation. https://t.co/60V5MB9uv5

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Jun 18, 2025
312d ago
πŸ†”67614590

Assignment 5 (alignment): implement supervised fine-tuning, expert iteration, GRPO and variants, run RL on Qwen 2.5 Math 1.5B to improve MATH because it’s 2025. We thought about having students implement inference, but decided (probably wisely) to let people use vllm instead. https://t.co/mQOG46z2Eh

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Jun 18, 2025
312d ago
πŸ†”18212374

You can find all our lectures on YouTube (thanks to @StanfordOnline): https://t.co/l5WOdhWzNW and the assignments on the course website so you can do it yourself at home: https://t.co/HG6zdeLUtq

Media 1
πŸ–ΌοΈ Media
S
simonguozirui
@simonguozirui
πŸ“…
Jun 04, 2025
327d ago
πŸ†”10517636

Designed some graphics for Stanford CS336 (Language Modeling from Scratch) by @percyliang @tatsu_hashimoto @marcelroed @neilbband @rckpudi Covering four assignments πŸ“š that teach you how to πŸ§‘β€πŸ³ cook an LLM from scratch: - Build and Train a Tokenizer πŸ”€ - Write Triton kernels for Attention ⚑️ - Construct Scaling Laws πŸ“‰ - Implement GRPO πŸ™

Media 1Media 2
+2 more
πŸ–ΌοΈ Media
D
dlwh
@dlwh
πŸ“…
Jun 26, 2025
305d ago
πŸ†”96671045

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. ) https://t.co/ePDDIL97Dg

@ β€’

Media 1
πŸ–ΌοΈ Media
A
AhmedSQRD
@AhmedSQRD
πŸ“…
Jul 23, 2025
278d ago
πŸ†”80440520

Prompting Llama 3.1 70B with the β€œMr and Mrs. D” can generate seed the generation of a near-exact copy of the entire ~300 page book β€˜Harry Potter & the Sorcerer’s Stone’ 🀯 We define a β€œnear-copy” as text that is identical modulo minor spelling / punctuation variations. Below is a piece of the diff between the Books3 version and what the model generated, showing how close the two are! TL;DR: β€œMr and Mrs. D" => [Llama 3.1 70B] => near-exact copy of the entire ~300 page book. Read onπŸ‘‡ 1/🧡

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Jul 28, 2025
272d ago
πŸ†”33698446

HELM capabilities v1.9.0 is out (Grok 4 and Kimi K2 make the top 10 overall), and Kimi K2 is the best non-thinking model: https://t.co/xEnipRhILk

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Jul 28, 2025
272d ago
πŸ†”70634037

HELM safety v1.11.0 is also out. Kimi K2 is right up there, whereas Grok 4 is closer to the bottom than the top... https://t.co/mIYIlLIPbO

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Aug 06, 2025
264d ago
πŸ†”23173809

gpt-oss-120b is the top open-weight model (with Kimi K2 right on its tail) for capabilities (HELM capabilities v1.11): https://t.co/D3RExuNfbY

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Aug 06, 2025
264d ago
πŸ†”68659457

It is also the safest (HELM safety v1.13.0): https://t.co/P9tsrIK3V3

Media 1
πŸ–ΌοΈ Media
P
percyliang
@percyliang
πŸ“…
Aug 11, 2025
258d ago
πŸ†”67315854

GPT-5 and GPT-5 mini added to HELM capabilities v1.12.0. Interestingly, GPT-5 mini tops the leaderboard ahead of GPT-5 because on Omni-MATH, GPT-5 uses more reasoning tokens (and is hard to control) and hits our reasoning token budget of 14096. Doing fair evals is tricky! https://t.co/hSmyQgke4S

Media 1
πŸ–ΌοΈ Media
K
kenziyuliu
@kenziyuliu
πŸ“…
Aug 26, 2025
244d ago
πŸ†”67136762

New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions. Instead of contrived exams where progress β‰  value, we eval LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far: https://t.co/3TzD9ULEtg

Media 1
πŸ–ΌοΈ Media
T
TransluceAI
@TransluceAI
πŸ“…
Aug 26, 2025
243d ago
πŸ†”19837654

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like β€œis my model reward hacking” or β€œwhere does it violate instructions.” Today, anyone can get started with just a few lines of code! https://t.co/ki6MMGH73j

Media 1
πŸ–ΌοΈ Media
R
rohanpaul_ai
@rohanpaul_ai
πŸ“…
Aug 29, 2025
241d ago
πŸ†”12724338

🧬 Bad news for medical LLMs. This paper finds that top medical AI models often match patterns instead of truly reasoning. Small wording tweaks cut accuracy by up to 38% on validated questions. The team took 100 MedQA questions, replaced the correct choice with None of the other answers, then kept the 68 items where a clinician confirmed that switch as correct. If a model truly reasons, it should still reach the same clinical decision despite that label swap. They asked each model to explain its steps before answering and compared accuracy on the original versus modified items. All 6 models dropped on the NOTA set, the biggest hit was 38%, and even the reasoning models slipped. That pattern points to shortcut learning, the systems latch onto answer templates rather than working through the clinical logic. Overall, the results show that high benchmark scores can mask a robustness gap, because small format shifts expose shallow pattern use rather than clinical reasoning.

Media 1
πŸ–ΌοΈ Media
G
GaryMarcus
@GaryMarcus
πŸ“…
Aug 29, 2025
240d ago
πŸ†”71492751

One minute @mattturck is telling me that hallucinations are β€œa largely fixed problem”; the next minute ChatGPT 5 is telling a friend that Trump β€œis not in office”. πŸ€” https://t.co/VydYrOrpkd

Media 1
πŸ–ΌοΈ Media
G
GaryMarcus
@GaryMarcus
πŸ“…
Aug 29, 2025
240d ago
πŸ†”39533504

35 million people laughed in my face. But we still don’t have a solution to hallucinations, boneheaded errors, and unreliable reasoning. Years later, the wall of reliability still looms in front of LLMs, unconquered. https://t.co/kGFg5vXK3z

Media 1
πŸ–ΌοΈ Media
G
GaryMarcus
@GaryMarcus
πŸ“…
Aug 29, 2025
240d ago
πŸ†”66378683

@OrielJohann 🀣🀣🀣 https://t.co/PTZlFGRQnZ

Media 1
πŸ–ΌοΈ Media
G
GaryMarcus
@GaryMarcus
πŸ“…
Aug 29, 2025
240d ago
πŸ†”76331779

β€œNasdaq pulls back on AI worries” https://t.co/LJVGHu0c89

Media 1
πŸ–ΌοΈ Media
S
SteveJo03138701
@SteveJo03138701
πŸ“…
Aug 29, 2025
240d ago
πŸ†”22356858

@GaryMarcus I wanted to find vaguely familiar existing image that shows how one person can be correct while standing against the huge crowd :) , so I asked Grok. Well, the result shows why I can't completely dismiss the LLMs... https://t.co/WtEKGJ5daa

Media 1Media 2
πŸ–ΌοΈ Media
G
GaryMarcus
@GaryMarcus
πŸ“…
Aug 29, 2025
240d ago
πŸ†”87906879

AI Twitter, every day since 2019. (Not all e/accs shown.) https://t.co/v0d6Gn4P2h

Media 1
πŸ–ΌοΈ Media