Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
S
ShenaoZhang
@ShenaoZhang
๐Ÿ“…
Oct 01, 2025
206d ago
๐Ÿ†”65751331

๐Ÿš€Excited to share our recent research:๐Ÿš€ โ€œLearning to Reason as Action Abstractions with Scalable Mid-Training RLโ€ We theoretically study ๐™๐™ค๐™ฌ ๐™ข๐™ž๐™™-๐™ฉ๐™ง๐™–๐™ž๐™ฃ๐™ž๐™ฃ๐™œ ๐™จ๐™๐™–๐™ฅ๐™š๐™จ ๐™ฅ๐™ค๐™จ๐™ฉ-๐™ฉ๐™ง๐™–๐™ž๐™ฃ๐™ž๐™ฃ๐™œ ๐™๐™‡. The findings lead to a scalable algorithm for learning action hierarchies from expert demonstrations, which we successfully apply to ๐Ÿญ๐˜ฝ Python code data. A thread:๐Ÿงต

Media 1
๐Ÿ–ผ๏ธ Media
J
jclin808
@jclin808
๐Ÿ“…
Sep 30, 2025
207d ago
๐Ÿ†”51617548

๐Ÿ“‰SFT might not suffer as much catastrophic forgetting as you think. Lately, much debate around GRPO in the community. RL is hotโ€”but letโ€™s not forget, in the context of LLMs: SFT is the bedrock of almost all RL. Also, thereโ€™s still a lot we donโ€™t fully understand about SFT. Paper link: https://t.co/iawopsRn7b ๐Ÿค”We revisit domain-specific SFT and find that even only with a small learning rate, you can achieve a sweet trade-off: (1) General-purpose degradation is largely mitigated; (2) Target-domain performance stays strong as the larger lr. From both theory & experiments, we next propose TALR (Token-Adaptive Loss Reweighting)โ€”a method that further alleviates forgetting and achieves favorable trade-offs. #GRPO #LLM #Amazon #Claude #DeepSeek #GLM

Media 1
๐Ÿ–ผ๏ธ Media
G
GaotangLi
@GaotangLi
๐Ÿ“…
Oct 02, 2025
205d ago
๐Ÿ†”16692292

Negative Log-Likelihood (NLL) has long been the go-to objective for classification and SFT, but is it universally optimal? We explore when alternative objectives outperform NLL and when they don't, based on two key factors: the objective's prior-leaningness and the model's capability. ๐Ÿ“„ Paper: https://t.co/HbGXy60fzZ ๐Ÿ’ป Code: https://t.co/XtoFiok4F7 (1/n)

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
J
jiqizhixin
@jiqizhixin
๐Ÿ“…
Oct 05, 2025
202d ago
๐Ÿ†”69334451

An intriguing paper from Apple. MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE Paper: https://t.co/C08s1qXgCJ

Media 1
๐Ÿ–ผ๏ธ Media
C
clattner_llvm
@clattner_llvm
๐Ÿ“…
Sep 21, 2025
216d ago
๐Ÿ†”14833272

We know that one of the biggest barriers to programming GPUs is access to hardware: "Code youโ€™ve written for NVIDIA or AMD GPUs should now mostly just work on an Apple๐ŸŽ Silicon GPU, assuming no device-specific features were being used." Preview here:๐Ÿ‘‡ https://t.co/WBDRDnLbqP

Media 1
๐Ÿ–ผ๏ธ Media
M
Modular
@Modular
๐Ÿ“…
Sep 24, 2025
213d ago
๐Ÿ†”33273524

We raised $250M to accelerate building AI's unified compute layer! ๐Ÿ”ฅ Weโ€™re now powering trillions of tokens, making AI workloads 4x faster ๐Ÿš€ and 2.5x cheaper โฌ‡๏ธ for our customers, and welcomed 10Kโ€™s of new developers ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿ’ป. We're excited for the future! https://t.co/hjIusgu9EX

Media 1
๐Ÿ–ผ๏ธ Media
Y
YiTayML
@YiTayML
๐Ÿ“…
Oct 01, 2025
206d ago
๐Ÿ†”31099267

Was great to hang out with @XueFz for the past year at the GDM Singapore office. He's finally relocating to London ๐Ÿฅน. We enjoyed many inside jokes and even coined our own "gemini MK" in the SG office. ๐Ÿ˜‚ Thanks for being a great founding member, all the fun research conversations and advising on hiring. I'll be alone for a bit until my team arrives. ๐Ÿซก It's the end of the era but I think the two of us made really outsized impact to Gemini relative to the number of people here. ๐Ÿ”ฅ

Media 1
๐Ÿ–ผ๏ธ Media
A
acossta
@acossta
๐Ÿ“…
Oct 02, 2025
205d ago
๐Ÿ†”63730931

Working on spec'ing BrainGrid's quota service in BrainGrid This service will enforce quotas for the different plans: How it works: โ—ˆ The app uses a getQuota method passing the quota key like "projects" and the account ID. โ—ˆ This returns the number of projects this account is allowed to have โ—ˆ A quota key, plan, value combo define this in the db โ—ˆ It also has organization_id to allow org level overrides โ—ˆ Full admin interface to manage. โ—ˆ Ability to create and modify quotas in the UI Read ๐Ÿ‘‡๐Ÿผ to see the actual requirement and tasks generated

Media 1
๐Ÿ–ผ๏ธ Media
A
acossta
@acossta
๐Ÿ“…
Oct 02, 2025
205d ago
๐Ÿ†”45569936

Once the spec is defined, BrainGrid help me create the task breakdown with perfectly-prompted tasks I can feed into Claude Code. With the BrainGrid MCP the build out of this is pretty smooth https://t.co/xyjcNbhta9

Media 1
๐Ÿ–ผ๏ธ Media
A
acossta
@acossta
๐Ÿ“…
Oct 02, 2025
205d ago
๐Ÿ†”80354626

Here is the actual requirement generated if you wanna take a peak ๐Ÿ‘€: ๐Ÿ‘‰๐Ÿผ https://t.co/hA218a0qOx

Media 1
๐Ÿ–ผ๏ธ Media
A
acossta
@acossta
๐Ÿ“…
Oct 02, 2025
205d ago
๐Ÿ†”29673798

And here is one of the tasks: ๐Ÿ‘‰๐Ÿผ https://t.co/s6T2cwSrBI https://t.co/npmvBgpRmS

Media 1Media 2
๐Ÿ–ผ๏ธ Media
A
acossta
@acossta
๐Ÿ“…
Oct 03, 2025
204d ago
๐Ÿ†”26059515

Revamped BrainGrid's homepage problem section. Like with code, I typically start by defining the problem in this section. Defining the problem well is the foundation for the rest of the page. https://t.co/EKtUCjLGpm

Media 1
๐Ÿ–ผ๏ธ Media
A
aiordieshow
@aiordieshow
๐Ÿ“…
Oct 02, 2025
205d ago
๐Ÿ†”25563390

Sora 2 https://t.co/VVRrFW7RBc

๐Ÿ–ผ๏ธ Media
A
aiordieshow
@aiordieshow
๐Ÿ“…
Oct 03, 2025
203d ago
๐Ÿ†”43501821

AI OR DIE cameo on Sora 2 https://t.co/kryx192pPZ

๐Ÿ–ผ๏ธ Media
M
MotorsportMP4
@MotorsportMP4
๐Ÿ“…
Oct 03, 2025
204d ago
๐Ÿ†”83707024

A lap from Lewis Hamilton in Singapore ๐Ÿ‡ธ๐Ÿ‡ฌ https://t.co/5GDk3UFNKe

๐Ÿ–ผ๏ธ Media
P
prudent_AI
@prudent_AI
๐Ÿ“…
Oct 04, 2025
203d ago
๐Ÿ†”56704314

Breaking ๐Ÿšจ Perplexity No.1 in playstore top charts ๐Ÿ”ฅ in India ๐Ÿ‡ฎ๐Ÿ‡ณ https://t.co/yCUhf1qnxZ

Media 1
๐Ÿ–ผ๏ธ Media
L
leerodgersx
@leerodgersx
๐Ÿ“…
Oct 04, 2025
202d ago
๐Ÿ†”15598102

Wow @AravSrinivas @perplexity_ai UX always amazes and continuously improving https://t.co/ORlRmSc01Y

Media 1
๐Ÿ–ผ๏ธ Media
A
AravSrinivas
@AravSrinivas
๐Ÿ“…
Oct 05, 2025
201d ago
๐Ÿ†”51433752

Cool https://t.co/7DTf23NjEd

Media 1
๐Ÿ–ผ๏ธ Media
A
AravSrinivas
@AravSrinivas
๐Ÿ“…
Oct 06, 2025
201d ago
๐Ÿ†”95713555

Good video explaining how to make good use of Comet for agent prompts: https://t.co/jALt3Pjxhr

Media 1
๐Ÿ–ผ๏ธ Media
A
AravSrinivas
@AravSrinivas
๐Ÿ“…
Oct 06, 2025
201d ago
๐Ÿ†”33446265

New addiction: Opening a long Youtube video (podcast, interview) on Comet, not listening to it linearly, banging question after question on Comet Assistant (Option + A), and only listening to parts I really want to listen to (which Comet can link me to exact time stamp). Eg: https://t.co/KQofTJNmpr

Media 1
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Oct 04, 2025
202d ago
๐Ÿ†”20771146

The challenge: create the most over-the-top Hallmark movie clip that can fit into 10 seconds. I managed to cram in a humble neighborhood baker, a prince, a cruel rival princess, and the holiday season in this one. https://t.co/wiUfxh4oJA

@mikepilla โ€ข Sat Oct 04 19:07

@emollick Hallmark benchmark

๐Ÿ–ผ๏ธ Media
M
MIT_CSAIL
@MIT_CSAIL
๐Ÿ“…
Oct 04, 2025
203d ago
๐Ÿ†”05172295

58 years ago, Larry Roberts presented his idea for an "ARPANet" for connecting multiple computers together across the United States. Full paper: https://t.co/drgMRJ5aLV https://t.co/TjoN5eMmGl

Media 1
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Oct 05, 2025
202d ago
๐Ÿ†”30469020

This switch turned off 1/3 of the Internet. Or at least it did in the earliest days of ARPANET in 1970, where it controlled the key BBN node in Boston. For better or worse, it no longer works (I tried flipping it when I visited the company). https://t.co/ilNW953A8g

@MIT_CSAIL โ€ข Sat Oct 04 17:00

58 years ago, Larry Roberts presented his idea for an "ARPANet" for connecting multiple computers together across the United States. Full paper: https://t.co/drgMRJ5aLV https://t.co/TjoN5eMmGl

Media 1
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Oct 05, 2025
202d ago
๐Ÿ†”17067888

Deleted this, not because it is wrong but because I probably should wait for a pre-publication or other confirmation of the proof before disseminating widely. https://t.co/YLcKnKEbPp

Media 1
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Oct 05, 2025
201d ago
๐Ÿ†”39642040

Both of these are true https://t.co/63Dey6d9AO

Media 1Media 2
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Oct 06, 2025
201d ago
๐Ÿ†”63244014

Very soon, the blocker to using AI to accelerate science is not going to be the ability of AI, but rather the systems of science itself, as creaky as they are. The scientific process is already breaking under a flood of human-created knowledge. How do we incorporate AI usefully? https://t.co/i8QCIIYzLb

Media 1
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Oct 07, 2021
1661d ago
๐Ÿ†”87731716

The paradox of our Golden Age of science: more research is being published by more scientists than ever, but the result is actually slowing progress! With too much to read & absorb, papers in more crowded fields are citing new work less, and canonizing highly-cited articles more. https://t.co/uHZVYLKJ23

Media 1Media 2
๐Ÿ–ผ๏ธ Media
A
alxfazio
@alxfazio
๐Ÿ“…
Oct 05, 2025
202d ago
๐Ÿ†”89452848

this paper shows that at this point the problems with vibe coding are mostly human hallucinations are rare and barely noticeable in practice the real issue is development driven by instant gratification and weak qa, like skipping tests and relying on llms for verification https://t.co/0AhlLOmm99

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
E
emollick
@emollick
๐Ÿ“…
Oct 06, 2025
201d ago
๐Ÿ†”30460416

Maybe some of the big problems with vibe coding are process problems, not AI problems... https://t.co/mzXtW7i04i

@alxfazio โ€ข Sun Oct 05 06:47

this paper shows that at this point the problems with vibe coding are mostly human hallucinations are rare and barely noticeable in practice the real issue is development driven by instant gratification and weak qa, like skipping tests and relying on llms for verification https

Media 1
๐Ÿ–ผ๏ธ Media
J
janleike
@janleike
๐Ÿ“…
Sep 29, 2025
207d ago
๐Ÿ†”80718734

Sonnet 4.5 is out! Itโ€™s the most aligned frontier model yet; a lot of progress relative to Sonnet 4 and Opus 4.1! https://t.co/w5cRNcR3ma

Media 1
๐Ÿ–ผ๏ธ Media
J
janleike
@janleike
๐Ÿ“…
Sep 29, 2025
207d ago
๐Ÿ†”72341438

Noticeably, Sonnet 4.5 verbalizes eval awareness much more than previous models. Does that invalidate our results? We did an audit based on model internals and the answer is โ€œprobably a little, but mostly not.โ€ https://t.co/gyio068XXz

Media 1
๐Ÿ–ผ๏ธ Media
J
Jack_W_Lindsey
@Jack_W_Lindsey
๐Ÿ“…
Sep 29, 2025
207d ago
๐Ÿ†”95153126

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to โ€œread the modelโ€™s mindโ€ in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15) https://t.co/2FPWPAHnZt

Media 1
๐Ÿ–ผ๏ธ Media