Your curated collection of saved posts and media

Showing 14 posts Β· last 7 days Β· quality filtered
N
nthngdy
@nthngdy
πŸ“…
Mar 12, 2026
3d ago
πŸ†”21712152

🧡New paper: "Lost in Backpropagation: The LM Head is a Gradient Bottleneck" The output layer of LLMs destroys 95-99% of your training signal during backpropagation, and this significantly slows down pretraining πŸ‘‡ https://t.co/lnbGfesIFA

Media 1
πŸ–ΌοΈ Media
W
winglian
@winglian
πŸ“…
Mar 16, 2026
49m ago
πŸ†”99383394
⭐0.36

I'm pretty confident this can be leveraged to graft a modified backwards pass onto the LM head of a pretrained model to improve the validation loss over standard LM head bwd. More to come soon.

X
XianghuiXie
@XianghuiXie
πŸ“…
Mar 15, 2026
9h ago
πŸ†”66289235

Do you want a 3D character interacting with an object/pet/another person, following a desired action? Presenting Hoi3DGen: Generating High-Quality Human-Object-Interactions in 3D. Project: https://t.co/EE87KSjQCX Code: https://t.co/ddpLjciTWC https://t.co/QPTyXw45kk

Media 2
πŸ–ΌοΈ Media
πŸ”Scobleizer retweeted
X
Xianghui Xie
@XianghuiXie
πŸ“…
Mar 15, 2026
9h ago
πŸ†”66289235
⭐0.32

Do you want a 3D character interacting with an object/pet/another person, following a desired action? Presenting Hoi3DGen: Generating High-Quality Human-Object-Interactions in 3D. Project: https://t.co/EE87KSjQCX Code: https://t.co/ddpLjciTWC https://t.co/QPTyXw45kk

❀️16
likes
πŸ”6
retweets
S
Scobleizer
@Scobleizer
πŸ“…
Mar 16, 2026
2h ago
πŸ†”24269942
⭐0.40

All AI posters at GTC. This is not for human consumption. This video is for AI to watch. Click the grok button and talk to it about what it learned by seeing all the AI posters (highly technical) presented at @NVIDIAGTC tonight. Thanks NVIDIA for the badge and access. https://t.co/mKqIv1f6Dt

S
Scobleizer
@Scobleizer
πŸ“…
Mar 16, 2026
2h ago
πŸ†”32585940
⭐0.36

Wow. Grok watched this video and made a complete list of everything it saw: https://t.co/fqC1fuwhwX Do you have any idea how cool this is? It read every poster.

K
karpathy
@karpathy
πŸ“…
Mar 16, 2026
2h ago
πŸ†”46107835
⭐0.34

@Yulun_Du @ilyasut SGD is a ResNet too (the blocks of it are fwd+bwd), the residual stream is the weights so... πŸ€” We're not taking the Attention is All You Need part literally enough? :D

πŸ”omarsar0 retweeted
O
elvis
@omarsar0
πŸ“…
Mar 15, 2026
14h ago
πŸ†”07999894
⭐0.34

We mostly solved multi-node coordination decades ago in distributed computing. Turns out LLM teams face some of the same coordination problems today. Here is a really good read for anyone designing multi-agent systems. It applies distributed systems theory to LLM teams and finds the same O(nΒ²) communication bottlenecks, straggler delays, and consistency conflicts showing up directly. Decentralized teams wasted more rounds communicating without making progress, but they also recovered faster when individual agents stalled. How does this relate to distributed systems? The work attempts to evaluate LLM teams as distributed systems. It lays out a principled framework instead of trial and error for deciding when teams help, how many agents to use, and what coordination structure fits the task. Designing LLM teams without distributed systems principles is like building a cluster without understanding consensus protocols. Paper: https://t.co/klHzUFJL1R

❀️119
likes
πŸ”29
retweets
_
_akhaliq
@_akhaliq
πŸ“…
Mar 16, 2026
2h ago
πŸ†”23785123

LookaheadKV Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation paper: https://t.co/j8lLnqUARR https://t.co/URKtNQkFKx

Media 1Media 2
πŸ–ΌοΈ Media
_
_akhaliq
@_akhaliq
πŸ“…
Mar 16, 2026
2h ago
πŸ†”78945671

LMEB Long-horizon Memory Embedding Benchmark paper: https://t.co/fT3sEwCRgd https://t.co/lCyEY9tadB

Media 1Media 2
πŸ–ΌοΈ Media
_
_akhaliq
@_akhaliq
πŸ“…
Mar 16, 2026
3h ago
πŸ†”38022438

Multimodal OCR Parse Anything from Documents On document parsing benchmarks, it ranks second only to Gemini 3 Pro on our OCR Arena Elo leaderboard, surpasses existing open-source document parsing systems, and sets a new state of the art of 83.9 on olmOCR Bench. On structured graphics parsing, dots.mocr achieves higher reconstruction quality than Gemini 3 Pro across image-to-SVG benchmarks, demonstrating strong performance on charts, UI layouts, scientific figures, and chemical diagrams paper: https://t.co/d3MkBHMuWc

Media 1Media 2
πŸ–ΌοΈ Media
K
karpathy
@karpathy
πŸ“…
Mar 16, 2026
3h ago
πŸ†”37734847
⭐0.38

@ChristosTzamos Wait this is so awesome!! Both 1) the C compiler to LLM weights and 2) the logarithmic complexity hard-max attention and its potential generalizations. Inspiring!

H
HuggingPapers
@HuggingPapers
πŸ“…
Mar 15, 2026
14h ago
πŸ†”52394270

IBM released NLE: Non-autoregressive LLM-based ASR by Transcript Editing A non-autoregressive approach that formulates speech recognition as conditional transcript editing, achieving 27x speedup over autoregressive baselines with 5.67% WER. https://t.co/LtjPtUxf5a

Media 1
πŸ–ΌοΈ Media
H
HuggingPapers
@HuggingPapers
πŸ“…
Mar 15, 2026
18h ago
πŸ†”28856067

XSkill: Continual learning from experience and skills A dual-stream framework enabling multimodal agents to accumulate and reuse knowledge without parameter updates. Grounded in visual context, it distills structured workflows and tactical insights to improve reasoning and tool use.

Media 1
πŸ–ΌοΈ Media