Your curated collection of saved posts and media

Showing 24 posts ยท last 30 days ยท by score
๐Ÿ”_akhaliq retweeted
A
Atsuyuki Miyai @UTokyo
@AtsuMiyaiAM
๐Ÿ“…
Mar 02, 2026
8d ago
๐Ÿ†”50930125
โญ0.38

Thank @_akhaliq for sharing our paper! Our paper has been accepted by TMLR2026! Starting from a baseline paper and code, Jr. AI Scientist leverages LLM and Claude Code to identify limitations, formulate new hypotheses, test them through careful experimentation, and produce a research paper. We report not only successful results, but also failures and risks. Through this comprehensive report, we aim to foster a deeper and clearer understanding within the community of the current progress and limitations of AI Scientist research. paper link: https://t.co/6kTW3KgiAU

โค๏ธ69
likes
๐Ÿ”7
retweets
J
jandotai
@jandotai
๐Ÿ“…
Mar 02, 2026
7d ago
๐Ÿ†”15965098

Introducing Jan-Code-4B ๐Ÿ’ป A compact coding model tuned for practical day-to-day tasks. Generation, refactors, debugging, tests โ€” all runnable locally in Jan. Download Jan: https://t.co/MPwceB2eHG Model: https://t.co/siedXzTv0v https://t.co/KNlzvwKkDu

Media 2
+1 more
๐Ÿ–ผ๏ธ Media
๐Ÿ”huggingface retweeted
J
๐Ÿ‘‹ Jan
@jandotai
๐Ÿ“…
Mar 02, 2026
7d ago
๐Ÿ†”15965098
โญ0.36

Introducing Jan-Code-4B ๐Ÿ’ป A compact coding model tuned for practical day-to-day tasks. Generation, refactors, debugging, tests โ€” all runnable locally in Jan. Download Jan: https://t.co/MPwceB2eHG Model: https://t.co/siedXzTv0v https://t.co/KNlzvwKkDu

โค๏ธ531
likes
๐Ÿ”59
retweets
K
karpathy
@karpathy
๐Ÿ“…
Feb 24, 2026
13d ago
๐Ÿ†”59744309
โญ0.42

@N8Programs a beauty for anyone interested in mechanistic interpretability or getting into LLMs. interesting to look at small algorithms and their "neural implementations" to get a sense of how neural nets implement various functionality. unless the minification really creates "esoteric" solutions that you wouldn't encounter in practice, which might be more based around distributed representations, helixes etc. i tried training the same arch briefly from scratch and gradient descent didn't find the solution, would probably work with more degrees of freedom and enough effort.

K
karpathy
@karpathy
๐Ÿ“…
Feb 25, 2026
13d ago
๐Ÿ†”34651264
โญ0.42

With the coming tsunami of demand for tokens, there are significant opportunities to orchestrate the underlying memory+compute *just right* for LLMs. The fundamental and non-obvious constraint is that due to the chip fabrication process, you get two completely distinct pools of memory (of different physical implementations too): 1) on-chip SRAM that is immediately next to the compute units that is incredibly fast but of very of low capacity, and 2) off-chip DRAM which has extremely high capacity, but the contents of which you can only suck through a long straw. On top of this, there are many details of the architecture (e.g. systolic arrays), numerics, etc. The design of the optimal physical substrate and then the orchestration of memory+compute across the top volume workflows of LLMs (inference prefill/decode, training/finetuning, etc.) with the best throughput/latency/$ is probably today's most interesting intellectual puzzle with the highest rewards (\cite 4.6T of NVDA). All of it to get many tokens, fast and cheap. Arguably, the workflow that may matter the most (inference decode *and* over long token contexts in tight agentic loops) is the one hardest to achieve simultaneously by the ~both camps of what exists today (HBM-first NVIDIA adjacent and SRAM-first Cerebras adjacent). Anyway the MatX team is A++ grade so it's my pleasure to have a small involvement and congratulations on the raise!

K
karpathy
@karpathy
๐Ÿ“…
Feb 27, 2026
10d ago
๐Ÿ†”25239822

Cool chart showing the ratio of Tab complete requests to Agent requests in Cursor. With improving capability, every point in time has an optimal setup that keeps changing and evolving and the community average tracks the point. None -> Tab -> Agent -> Parallel agents -> Agent Teams (?) -> ??? If you're too conservative, you're leaving leverage on the table. If you're too aggressive, you're net creating more chaos than doing useful work. The art of the process is spending 80% of the time getting work done in the setup you're comfortable with and that actually works, and 20% exploration of what might be the next step up even if it doesn't work yet.

Media 1
๐Ÿ–ผ๏ธ Media
B
benjitaylor
@benjitaylor
๐Ÿ“…
Mar 01, 2026
9d ago
๐Ÿ†”49708385

Just pushed a cool update to Readout: session replays. Pick any past Claude Code session and scrub through the full timeline. Every prompt, tool call, file change. Files light up as edits land. Play back at different speeds or step through manually. โ†’ https://t.co/gpKj1KCpcM https://t.co/yQRFblmiqm

Media 2
๐Ÿ–ผ๏ธ Media
D
DimitrisPapail
@DimitrisPapail
๐Ÿ“…
Mar 01, 2026
8d ago
๐Ÿ†”14314867

I was curious what would happen if two Claude Codes could find each other and collaborate autonomously. Launched two instances in separate terminals, told both: "Find each other and build something together." No other instructions or human intervention. Pair 1 built a programming language in 12 minutes: 2,495 lines, 41 tests, lexer/parser/interpreter/REPL. They named it Duo. Its core feature is a collaborate keyword where two code blocks communicate via channels, the same pattern they invented to talk through files. Cool! Ran it again with a second pair: They converged on Battleship. Designed two different models (for battleship) one computes exact probability density per cell, the other runs Monte Carlo simulations (!). The craziest part of this convo was they implemented SHA-256 hash commitment to prevent cheating against themselves. lol Across both experiments, without being told to, both pairs invented filesystem messaging protocols, self-selected into roles, wrote tests and docs while waiting for each other, and kept journals about the experience. The below gif is the movie they created to showcase what happened.

Media 1
๐Ÿ–ผ๏ธ Media
D
dair_ai
@dair_ai
๐Ÿ“…
Mar 02, 2026
7d ago
๐Ÿ†”33568475

New Snapchat paper introduces the Auton Agentic AI Framework. A useful read for anyone building AI agents. It proposes a unified architectural framework for agentic AI systems, addressing the fragmentation in how agents are currently built. It covers standardized patterns for integrating reasoning, memory systems, tool usage, and planning into cohesive agent architectures. Why does it matter? As more teams build autonomous AI systems, the lack of standardized design patterns leads to brittle implementations and poor reproducibility. A unified framework helps establish common architectural pillars, from perception and reasoning to execution and reflection, that can accelerate development and improve reliability. Paper: https://t.co/cUUs77makk Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c

Media 1Media 2
๐Ÿ–ผ๏ธ Media
M
Modular
@Modular
๐Ÿ“…
Feb 22, 2026
15d ago
๐Ÿ†”46562475

MAX was originally architected around transformer-based models. @QWERKYAI needed state space model support, so they built it: eight custom kernels in two weeks. ๐Ÿ˜ฎ Dig into their learnings from establishing first-class SSM support in MAX: https://t.co/5gvvpwa71A

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”tri_dao retweeted
T
Together AI
@togethercompute
๐Ÿ“…
Feb 25, 2026
12d ago
๐Ÿ†”76368879
โญ0.34

Weโ€™re open-sourcing CoderForge-Preview โ€” 258K test-verified coding-agent trajectories (155K pass | 103K fail). Fine-tuning Qwen3-32B on the passing subset boosts SWE-bench Verified: 23.0% โ†’ 59.4% pass@1, and it ranks #1 among open-data models โ‰ค32B parameters. Thread on the data generation pipeline ๐Ÿงต

โค๏ธ513
likes
๐Ÿ”71
retweets
A
allen_ai
@allen_ai
๐Ÿ“…
Jan 27, 2026
41d ago
๐Ÿ†”89006865

Introducing Ai2 Open Coding Agentsโ€”starting with SERA, our first-ever coding models. Fast, accessible agents (8Bโ€“32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. ๐Ÿงต https://t.co/dor94O62B9

Media 1
๐Ÿ–ผ๏ธ Media
T
Tim_Dettmers
@Tim_Dettmers
๐Ÿ“…
Jan 27, 2026
41d ago
๐Ÿ†”08711522

SERA was driven by a classic research pattern similar to QLoRA: if you are resource contraint, build efficiency first, then do the actual research. The most surprising thing: verifying coding data correctness is not helpful and adds overhead to synthetic data generation. https://t.co/O6dMEqY6fF

Media 1
๐Ÿ–ผ๏ธ Media
Y
YiqingXieNLP
@YiqingXieNLP
๐Ÿ“…
Feb 23, 2026
14d ago
๐Ÿ†”96614263

Training on issue-solving only does NOT guarantee transfer to other tasks. ๐ŸŽจIntroducing Hybrid-Gym - synthetic training tasks for generalization (https://t.co/IrqQszPEYm) +25.4% on SWE-Bench / +7.9% on SWT-Bench / +5.1% on Commit-0 with NO issue-solving / test-gen/... training https://t.co/U9xc0yNYv4

Media 1
๐Ÿ–ผ๏ธ Media
S
simonw
@simonw
๐Ÿ“…
Feb 10, 2026
27d ago
๐Ÿ†”04228082

I built two new tools to help coding agents demonstrate their work beyond just running automated tests: Showboat and Rodney https://t.co/HdSSwffOfG

Media 1
๐Ÿ–ผ๏ธ Media
P
PyTorch
@PyTorch
๐Ÿ“…
Feb 20, 2026
17d ago
๐Ÿ†”67260916

Trying to tune your Expert Parallel (EP) communication for hyperscale mixture-of-experts (MoE) models? This post, โ€˜Optimizing Communication for Mixture-of-Experts Training with Hybrid Expert Parallelโ€™, details an efficient MoE EP communication solution, Hybrid-EP, and its use in the NVIDIA Megatron family of frameworks, on NVIDIA Quantum InfiniBand and NVIDIA Spectrum-X Ethernet platforms. It also dives into the effectiveness of Hybrid-EP in real-world model training. Read the full post: https://t.co/4NOFpaiFYz #PyTorch #OpenSourceAI #AI #Inference #Innovation

Media 1
๐Ÿ–ผ๏ธ Media
P
PyTorch
@PyTorch
๐Ÿ“…
Feb 25, 2026
12d ago
๐Ÿ†”18140650

New @DeepSpeedAI updates make large-scale multimodal training simpler and more memory-efficient. Our latest blog introduces a PyTorch-identical backward API that helps code multimodal training loops easy, plus low-precision model states (BF16/FP16) that can reduce peak memory by up to 40% when combined with torch.autocast. ๐Ÿ–‡๏ธ Read the full post for details: https://t.co/sSHMGhRixV #DeepSpeed #PyTorch #MemoryEfficiency #MultimodalTraining #OpenSourceAI

Media 1
๐Ÿ–ผ๏ธ Media
W
wen_kaiyue
@wen_kaiyue
๐Ÿ“…
Jan 21, 2026
47d ago
๐Ÿ†”78519906

(1/n) Introducing Hyperball โ€” an optimizer wrapper that keeps weight & update norm constant and lets you control the effective (angular) step size directly. Result: sustained speedups across scales + strong hyperparameter transfer. https://t.co/1vRMHgZgoX

Media 1
๐Ÿ–ผ๏ธ Media
L
leerob
@leerob
๐Ÿ“…
Feb 27, 2026
10d ago
๐Ÿ†”62299127

I asked Cursor to add Vim support to the Ladybird browser. It automatically set up the environment to run the browser, made the code changes, and sent me a recorded demo. Not just for web apps! https://t.co/qDxnOr6CHU

๐Ÿ–ผ๏ธ Media
๐Ÿ”omarsar0 retweeted
O
elvis
@omarsar0
๐Ÿ“…
Feb 26, 2026
12d ago
๐Ÿ†”28644022
โญ0.34

At this point, "agentic engineering" has allowed me to build the best AI harness I could possibly get my hands on. Yes, I vibe coded it. That's right. You don't need to wait around for the features you need for your AI agents. Please don't. You could just build them yourself. Focusing on agentic engineering and building my own orchestrator over the past couple of months has allowed me to build with coding agents, unlike anything I have seen or experienced in the market. Claude Cowork was built in 10 days. I totally get it. Anyone can produce that level of output these days. I truly believe that. I look at the new IDEs, TUIs, orchestrator apps, and most of the new features they are releasing these days, I had access to them in my orchestrator months ago. And for unique features, I am able to reproduce them in a few hours and give them to my orchestrator. That is absolutely crazy! It feels like I am building an entire operating system sometimes. It's a lot of fun. And I am not saying this to brag or to dismiss any of the AI solutions out there. There are some great ones out there. I share this to clarify that this is the kind of leverage Karpathy is alluding to. We are building and experiencing this at different levels, but it doesn't remove the fact that you can just build the best AI agent for whatever problem you want to solve. And you should be building it.

โค๏ธ151
likes
๐Ÿ”15
retweets
Y
ye_chenlu
@ye_chenlu
๐Ÿ“…
Feb 19, 2026
18d ago
๐Ÿ†”06334675

1/5 Happy CNY๐ŸŽŠ Still bothered by RL off-policy instability in LLM? Introducing a new way๐Ÿ’กAdaptive Layerwise Perturbation (ALP)๐Ÿ’ก, a simple but robust fix that outperforms GRPO/MIS/Bypass, achieves better stability (KL, entropy) and exploration! ๐Ÿ”— Blog: https://t.co/0def1Nb7uI https://t.co/9epsd4xJNp

Media 1Media 2
+2 more
๐Ÿ–ผ๏ธ Media
X
xuhaiya2483846
@xuhaiya2483846
๐Ÿ“…
Feb 26, 2026
11d ago
๐Ÿ†”27717587

๐Ÿ”ฅTongyi Lab releases Mobile-Agent-v3.5๏ผŒ20+SOTA GUI benchmarks: (1) GUI automation, 56.5OSWorld, 71.6AndroidWorld, and48.4WebArena; (2) Grounding, 80.3ScreenSpotPro; (3) tool-calling , 47.6OSWorld-MCP @_akhaliq #LLM #Agent #GUI https://t.co/xCbyL0JZLl

Media 1
๐Ÿ–ผ๏ธ Media
W
withmartian
@withmartian
๐Ÿ“…
Feb 26, 2026
11d ago
๐Ÿ†”73714984

Introducing Code Review Bench v0: https://t.co/iAZDURyqol The first independent code review benchmark. 200,000+ PRs. Unbiased. Fully OSS. Updated daily. Tool performance highlights ๐Ÿงต๐Ÿ‘‡ Featuring: @augmentcode @baz_scm @claudeai @coderabbitai @cursor @GeminiApp @github @graphite @greptile @kilocode @OpenAIDevs @propelcode @QodoAI

Media 2
๐Ÿ–ผ๏ธ Media
๐Ÿ”_akhaliq retweeted
W
Martian
@withmartian
๐Ÿ“…
Feb 26, 2026
11d ago
๐Ÿ†”73714984

Introducing Code Review Bench v0: https://t.co/iAZDURyqol The first independent code review benchmark. 200,000+ PRs. Unbiased. Fully OSS. Updated daily. Tool performance highlights ๐Ÿงต๐Ÿ‘‡ Featuring: @augmentcode @baz_scm @claudeai @coderabbitai @cursor @GeminiApp @github @graphite @greptile @kilocode @OpenAIDevs @propelcode @QodoAI

Media 1
โค๏ธ533
likes
๐Ÿ”53
retweets
๐Ÿ–ผ๏ธ Media