Your curated collection of saved posts and media

Recent Top

Showing 32 posts · last 14 days · by score

🖼️ Media

C

ch402

@ch402

📅

Aug 05, 2025

265d ago

🆔53224694

Valuable synthesis across labs! Make sure to check out the tutorial video - https://t.co/nqh6ZFaxat

@neuronpedia • Tue Aug 05 17:23

Today, we're releasing The Circuit Analysis Research Landscape: an interpretability post extending & open sourcing Anthropic's circuit tracing work, co-authored by @Anthropic, @GoogleDeepMind, @GoodfireAI @AiEleuther, and @decode_research. Here's a quick demo, details follow

🖼️ Media

View Details View on X ↗

C

ch402

@ch402

📅

Aug 08, 2025

262d ago

🆔51162692

When we first started working with transcoders, I didn't really appreciate what a big change the were... https://t.co/Pm3YiWCMK8

🖼️ Media

View Details View on X ↗

C

ch402

@ch402

📅

Aug 08, 2025

262d ago

🆔91013654

Let's consider a very simple problem, having a transcoder mimic the absolute value function. You can do this with two features per dimension: https://t.co/DWswkZmZVW

🖼️ Media

View Details View on X ↗

C

ch402

@ch402

📅

Aug 08, 2025

262d ago

🆔65583139

Transcoders can will learn the perfect solution! https://t.co/MoewAdgDOo

🖼️ Media

View Details View on X ↗

C

ch402

@ch402

📅

Aug 08, 2025

262d ago

🆔94818252

But now let's add a repeated data point to the transcoder training data, p=[1,1,1,0,0,0,0...] The transcoder now learns a special feature to memorize that point! https://t.co/5ghvWDjqbg

🖼️ Media

View Details View on X ↗

C

ch402

@ch402

📅

Aug 08, 2025

262d ago

🆔96847534

It turns out there's a fix! If we ask to match the Jacobian of absolute value, we get the correct solution again. https://t.co/dJ9acDGrZU

🖼️ Media

View Details View on X ↗

C

ch402

@ch402

📅

Aug 08, 2025

262d ago

🆔32673662

What's the point of all of this? For me, this question of mechanistic faithfulness is the most important question in all the SAE debate. I think it's often mixed in with other things and kind of implicit, and I wanted to have a simple example that clearly isolates it. https://t.co/VqLQSKO7D7

🖼️ Media

View Details View on X ↗

C

ch402

@ch402

📅

Aug 08, 2025

262d ago

🆔52769841

Our recent work on attribution graphs (https://t.co/qbIhdV7OKz ) and extending it to attention (https://t.co/Mf8JLvWH9K ), point towards how much potential they have if we can mitigate the issues.

🖼️ Media

View Details View on X ↗

A

AnthropicAI

@AnthropicAI

📅

Jul 29, 2025

272d ago

🆔53659432

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places. https://t.co/wJWRRTt4DG

🖼️ Media

View Details View on X ↗

A

AnthropicAI

@AnthropicAI

📅

Jul 29, 2025

272d ago

🆔22125508

The program will run for ~two months, with opportunities to extend for an additional four based on progress and performance. Apply by August 17 to join us in any of these locations: - US: https://t.co/BhrekQsl8F - UK: https://t.co/TPYNEony83 - Canada: https://t.co/F00QZ0hjqw

+1 more

🖼️ Media

View Details View on X ↗

L

livgorton

@livgorton

📅

Aug 26, 2025

244d ago

🆔02657654

What if adversarial examples aren't a bug, but a direct consequence of how neural networks process information? We've found evidence that superposition – the way networks represent many more features than they have neurons – might cause adversarial examples. https://t.co/YL11r2FeOw

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Jun 18, 2025

312d ago

🆔01649271

Assignment 3 (scaling laws): fit scaling laws using IsoFLOP. To simulate the high-stakes of a training run, students got a training API [hyperparameters -> loss] and a fixed compute budget, and had to choose which runs to submit to gather data points. Behind the scenes, the training API was backed by interpolating between a bunch of precomputed runs. https://t.co/JpaDT8wIoE

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Jun 18, 2025

312d ago

🆔58535324

Assignment 4 (data): convert Common Crawl HTML to text, filter filter filter (quality, harmful content, PII), deduplication. This is the grunt work that doesn’t get enough appreciation. https://t.co/60V5MB9uv5

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Jun 18, 2025

312d ago

🆔67614590

Assignment 5 (alignment): implement supervised fine-tuning, expert iteration, GRPO and variants, run RL on Qwen 2.5 Math 1.5B to improve MATH because it’s 2025. We thought about having students implement inference, but decided (probably wisely) to let people use vllm instead. https://t.co/mQOG46z2Eh

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Jun 18, 2025

312d ago

🆔18212374

You can find all our lectures on YouTube (thanks to @StanfordOnline): https://t.co/l5WOdhWzNW and the assignments on the course website so you can do it yourself at home: https://t.co/HG6zdeLUtq

🖼️ Media

View Details View on X ↗

S

simonguozirui

@simonguozirui

📅

Jun 04, 2025

327d ago

🆔10517636

Designed some graphics for Stanford CS336 (Language Modeling from Scratch) by @percyliang @tatsu_hashimoto @marcelroed @neilbband @rckpudi Covering four assignments 📚 that teach you how to 🧑‍🍳 cook an LLM from scratch: - Build and Train a Tokenizer 🔤 - Write Triton kernels for Attention ⚡️ - Construct Scaling Laws 📉 - Implement GRPO 🐙

+2 more

🖼️ Media

View Details View on X ↗

D

dlwh

@dlwh

📅

Jun 26, 2025

305d ago

🆔96671045

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. ) https://t.co/ePDDIL97Dg

@ •

🖼️ Media

View Details View on X ↗

A

AhmedSQRD

@AhmedSQRD

📅

Jul 23, 2025

278d ago

🆔80440520

Prompting Llama 3.1 70B with the “Mr and Mrs. D” can generate seed the generation of a near-exact copy of the entire ~300 page book ‘Harry Potter & the Sorcerer’s Stone’ 🤯 We define a “near-copy” as text that is identical modulo minor spelling / punctuation variations. Below is a piece of the diff between the Books3 version and what the model generated, showing how close the two are! TL;DR: “Mr and Mrs. D" => [Llama 3.1 70B] => near-exact copy of the entire ~300 page book. Read on👇 1/🧵

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Jul 28, 2025

272d ago

🆔33698446

HELM capabilities v1.9.0 is out (Grok 4 and Kimi K2 make the top 10 overall), and Kimi K2 is the best non-thinking model: https://t.co/xEnipRhILk

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Jul 28, 2025

272d ago

🆔70634037

HELM safety v1.11.0 is also out. Kimi K2 is right up there, whereas Grok 4 is closer to the bottom than the top... https://t.co/mIYIlLIPbO

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Aug 06, 2025

264d ago

🆔23173809

gpt-oss-120b is the top open-weight model (with Kimi K2 right on its tail) for capabilities (HELM capabilities v1.11): https://t.co/D3RExuNfbY

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Aug 06, 2025

264d ago

🆔68659457

It is also the safest (HELM safety v1.13.0): https://t.co/P9tsrIK3V3

🖼️ Media

View Details View on X ↗

P

percyliang

@percyliang

📅

Aug 11, 2025

258d ago

🆔67315854

GPT-5 and GPT-5 mini added to HELM capabilities v1.12.0. Interestingly, GPT-5 mini tops the leaderboard ahead of GPT-5 because on Omni-MATH, GPT-5 uses more reasoning tokens (and is hard to control) and hits our reasoning token budget of 14096. Doing fair evals is tricky! https://t.co/hSmyQgke4S

🖼️ Media

View Details View on X ↗

K

kenziyuliu

@kenziyuliu

📅

Aug 26, 2025

244d ago

🆔67136762

New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions. Instead of contrived exams where progress ≠ value, we eval LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far: https://t.co/3TzD9ULEtg

🖼️ Media

View Details View on X ↗

T

TransluceAI

@TransluceAI

📅

Aug 26, 2025

243d ago

🆔19837654

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code! https://t.co/ki6MMGH73j

🖼️ Media

View Details View on X ↗

R

rohanpaul_ai

@rohanpaul_ai

📅

Aug 29, 2025

241d ago

🆔12724338

🧬 Bad news for medical LLMs. This paper finds that top medical AI models often match patterns instead of truly reasoning. Small wording tweaks cut accuracy by up to 38% on validated questions. The team took 100 MedQA questions, replaced the correct choice with None of the other answers, then kept the 68 items where a clinician confirmed that switch as correct. If a model truly reasons, it should still reach the same clinical decision despite that label swap. They asked each model to explain its steps before answering and compared accuracy on the original versus modified items. All 6 models dropped on the NOTA set, the biggest hit was 38%, and even the reasoning models slipped. That pattern points to shortcut learning, the systems latch onto answer templates rather than working through the clinical logic. Overall, the results show that high benchmark scores can mask a robustness gap, because small format shifts expose shallow pattern use rather than clinical reasoning.

🖼️ Media

View Details View on X ↗

G

GaryMarcus

@GaryMarcus

📅

Aug 29, 2025

240d ago

🆔71492751

One minute @mattturck is telling me that hallucinations are “a largely fixed problem”; the next minute ChatGPT 5 is telling a friend that Trump “is not in office”. 🤔 https://t.co/VydYrOrpkd

🖼️ Media

View Details View on X ↗

G

GaryMarcus

@GaryMarcus

📅

Aug 29, 2025

240d ago

🆔39533504

35 million people laughed in my face. But we still don’t have a solution to hallucinations, boneheaded errors, and unreliable reasoning. Years later, the wall of reliability still looms in front of LLMs, unconquered. https://t.co/kGFg5vXK3z

🖼️ Media

View Details View on X ↗

G

GaryMarcus

@GaryMarcus

📅

Aug 29, 2025

240d ago

🆔66378683

@OrielJohann 🤣🤣🤣 https://t.co/PTZlFGRQnZ

🖼️ Media

View Details View on X ↗

G

GaryMarcus

@GaryMarcus

📅

Aug 29, 2025

240d ago

🆔76331779

“Nasdaq pulls back on AI worries” https://t.co/LJVGHu0c89

🖼️ Media

View Details View on X ↗

S

SteveJo03138701

@SteveJo03138701

📅

Aug 29, 2025

240d ago

🆔22356858

@GaryMarcus I wanted to find vaguely familiar existing image that shows how one person can be correct while standing against the huge crowd :) , so I asked Grok. Well, the result shows why I can't completely dismiss the LLMs... https://t.co/WtEKGJ5daa

🖼️ Media

View Details View on X ↗

G

GaryMarcus

@GaryMarcus

📅

Aug 29, 2025

240d ago

🆔87906879

AI Twitter, every day since 2019. (Not all e/accs shown.) https://t.co/v0d6Gn4P2h

🖼️ Media

View Details View on X ↗

← PreviousPage 505 of 656Next →