Your curated collection of saved posts and media

Recent Top

Showing 32 posts · last 14 days · by score

🖼️ Media

A

Aran Komatsuzaki

@arankomatsuzaki

📅

Wed

🆔55611479

ReasonIR: Training Retrievers for Reasoning Tasks - Presents REASONIR-8B, the first retriever specifically trained for general reasoning tasks - Improves MMLU and GPQA scores by 6.4% and 22.6% respectively, relative to the closed-book baseline https://t.co/71cOOuUbH0

❤️164

likes

🔁27

retweets

🖼️ Media

View Details View on X ↗

A

Aran Komatsuzaki

@arankomatsuzaki

📅

Wed

🆔45261915

Reinforcement Learning for Reasoning in Large Language Models with One Training Example - 36.0% -> 73.6% on MATH500 by performing RLVR on a single example - Applying entropy loss alone, without any outcome reward, improves perf by 27.4% https://t.co/09jHOmxBTh

❤️445

likes

🔁57

retweets

🖼️ Media

View Details View on X ↗

T

Teknium (e/λ)

@Teknium1

📅

Wed

🆔45507127

ChatGPT in it's recent glazemax mode vs Hermes Left: ChatGPT, Right: Hermes https://t.co/yMpw55GVgs

❤️119

likes

🔁3

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Tue Apr 29

🆔21700134

Building Production-Ready AI Agents with Scalable Long-Term Memory Memory is one of the most challenging bits of building production-ready agentic systems. Lots of goodies in this paper. Here is my breakdown: https://t.co/wImK3ncl4G

❤️1,164

likes

🔁225

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Wed

🆔02257538

It turns out that Meta had 27 different models on LM Arena prior to the launch of Llama 4, but they announced it as if they had one model that topped the leaderboard. An extreme example of benchmark hacking (which other labs also do to lesser degrees). https://t.co/JfPmqyZiOg https://t.co/bVsUWh1218

❤️435

likes

🔁37

retweets

🖼️ Media

View Details View on X ↗

D

Scott Manley

@DJSnM

📅

Wed

🆔14860615

For those who want to take Kerbal Space Program to the next level: https://t.co/IctnCjtvG0

+2 more

❤️884

likes

🔁62

retweets

🖼️ Media

View Details View on X ↗

W

Jeffrey Wang

@wangzjeff

📅

Wed

🆔31431272

my experience with o3 https://t.co/17oBXFzXHH

❤️1,459

likes

🔁24

retweets

🖼️ Media

View Details View on X ↗

I

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

📅

Wed

🆔46322943

Turns out DeepSeek does have a new release (671B math/prover model) but it's not R2 https://t.co/GRJa9unXuD

❤️193

likes

🔁15

retweets

🖼️ Media

View Details View on X ↗

R

Arvind Narayanan

@random_walker

📅

Wed

🆔77994378

Devastating takedown of Chatbot Arena. It's one thing for leaderboards to suck because they try to quantify the unquantifiable but quite another thing to actively choose flagrantly unscientific and nontransparent practices that benefit the big dogs. https://t.co/pFGQQw0mao https://t.co/rpnNy2CmdK

❤️373

likes

🔁60

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Tue Apr 29

🆔29588788

A Survey of Efficient LLM Inference Serving This one provides a comprehensive taxonomy of recent system-level innovations for efficient LLM inference serving. Great overview for devs working on inference. Here is what's included: https://t.co/yRl9lkFlPD

❤️279

likes

🔁69

retweets

🖼️ Media

View Details View on X ↗

L

Lior⚡

@LiorOnAI

📅

Mon

🆔47472415

ByteDance might've released a paper explaining the TikTok algorithm. https://t.co/kfGIeCNYD9

❤️323

likes

🔁36

retweets

🖼️ Media

View Details View on X ↗

H

Hamel Husain

@HamelHusain

📅

Mon

🆔89155831

https://t.co/zsNroCiS1L https://t.co/4IfI6zF86g

❤️38

likes

🔁2

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Mon

🆔58931084

👀Today’s AIs are already hyper persuasive. A controversial study where LLMs tried to persuade users on Reddit found: “Notably, all our treatments surpass human performance substantially, achieving persuasive rates between three and six times higher than the human baseline.” https://t.co/D7i6fdklD7

+2 more

❤️1,023

likes

🔁182

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Mon

🆔05801700

Use create-llama's "Deep Researcher" template to write legal reports in seconds! Ask a question and Deep Researcher will generate a set of sub-questions to ask of your documents, answer all of them, and then generate a report! Try it right now with npx create-llama Or learn… https://t.co/XpVtmPCv11

❤️98

likes

🔁10

retweets

🖼️ Media

View Details View on X ↗

I

nolen royalty

@itseieio

📅

Mon

🆔57357452

I made a website. It's called "one million chessboards dot com". it has one million chessboards on it. moving a piece moves it for everyone, instantly. no turns. you can move between boards. that's it. have fun! https://t.co/T9GqvfwJKC

❤️13,770

likes

🔁1,192

retweets

🖼️ Media

View Details View on X ↗

O

Oriol Vinyals

@OriolVinyalsML

📅

Mon

🆔23139670

It's not only about how long your context is, but how well you use it. Great to see Gemini 2.5 models dominating MRCR and other benchmarks on long context! See 2.5 Pro tackle a complex coding task by reasoning over an entire repo (>500k tokens). Performance and effective use of… https://t.co/asrnajUNdE

❤️295

likes

🔁27

retweets

🖼️ Media

View Details View on X ↗

Y

Yaroslav Bulatov

@yaroslavvb

📅

Mon

🆔15074497

Watching @liuzhuang1234's - "Transformers without Normalization", this slide is a reminder how our optimizer and architecture choices are coupled https://t.co/Jo8KNdPgk2

❤️157

likes

🔁20

retweets

🖼️ Media

View Details View on X ↗

G

Aleksa Gordić (水平问题)

@gordic_aleksa

📅

Mon

🆔99677458

phew, i can finally share what i've been up to since last summer! we just raised a $23 million seed round!! 😅 i co-founded @P_1_AI w/ @PaulEremenko (ex cto of airbus, UTC, etc.) and adam nagel (ex engineering director at airbus) with a mission to build an engineering AGI for… https://t.co/5jjc31hxLv

❤️583

likes

🔁61

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Mon

🆔61830399

I guess "thinking is all you need!" Those are some insane improvements over non-thinking mode. Congrats to the Qwen team on the Qwen3 release. Love seeing the support for more agentic capabilities. Hope R2 brings more of that as well. https://t.co/qsjRbqTDDS

❤️64

likes

🔁5

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Mon

🆔86255468

LlamaDeploy now supports a new message broker: @solacedotcom! LlamaDeploy is an async-first framework for deploying, scaling, and productionizing agentic multi-service systems, based on LlamaIndex Workflows. LlamaDeploy works with a variety of message bus backends, and our… https://t.co/lH6FqUC4Vv

❤️19

likes

🔁7

retweets

🖼️ Media

View Details View on X ↗

W

Wing Lian (caseus)

@winglian

📅

Mon

🆔43103281

Qwen 3 by @Alibaba_Qwen is out and it looks like the 30B MoE is better than the 32B dense model! Some quick checks show you can SFT the 32B on a single 48GB GPU, and it's possible to get it on a 4090 too once we some allocation issues on model load. https://t.co/9s6sqL3QBD

❤️92

likes

🔁5

retweets

🖼️ Media

View Details View on X ↗

L

Lior⚡

@LiorOnAI

📅

Mon

🆔25223240

QWEN-3 is finally out! > Matches Gemini 2.5 Pro performance > Outperforms OpenAI o1 > Open-sourced (Apache 2.0) > 119 languages, 32K–128K context https://t.co/KFIrKFNqzI

❤️142

likes

🔁12

retweets

🖼️ Media

View Details View on X ↗

B

ben

@benhylak

📅

Mon

🆔99444139

AI products fail constantly—in ways both hilarious and terrifying. Regular software throws exceptions. But AI products fail silently. Meet @raindrop_ai : the first Sentry-like monitoring platform for AI products. https://t.co/Olx2umPUa7

❤️682

likes

🔁63

retweets

🖼️ Media

View Details View on X ↗

H

Hamel Husain

@HamelHusain

📅

Mon

🆔13545342

Most business data is structured or semi-structured (tables, spreadsheets, etc), but we tend to over-emphasize unstructured data retrieval in RAG @svonava is going to tell us everything he knows about optimizing structured data retrieval with LLMs https://t.co/bhztbxABxs https://t.co/PX94XyoFQH

❤️45

likes

🔁5

retweets

🖼️ Media

View Details View on X ↗

S

Daniel Svonava

@svonava

📅

Tue Apr 29

🆔55455288

https://t.co/CHsHdKezRa

❤️14

likes

🔁1

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Tue Apr 29

🆔88142926

So Qwen 3-235B with thinking seems good, but not blowing away any of my weird frontier tests, some of which DeepSeek r1 did better. It did okay generating a p5js starship (though it had errors to correct), but failed the Lem Test and couldn't do a twigl shader in many attempts. https://t.co/bcdtTXq3HZ

❤️135

likes

🔁7

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Mon

🆔10554423

MAGI is a new multi-agent system that dynamically navigates clinical logic via four specialized agents. Great example of how to combine reasoning and agents. Read on for more: https://t.co/9HMm3JYbUT

❤️204

likes

🔁57

retweets

🖼️ Media

View Details View on X ↗

E

Eugene Yan

@eugeneyan

📅

Tue Apr 29

🆔55979479

Tenets from Duolingo's push to be AI-first • AI will be everywhere in our product • Start with AI for every task • Spend 10% of your time learning • Share what you learn • Avoid overbuilding • Build and experiment carefully • Technical excellence still matters https://t.co/EZMtZNaKSp

❤️121

likes

🔁12

retweets

🖼️ Media

View Details View on X ↗

A

Aran Komatsuzaki

@arankomatsuzaki

📅

Tue Apr 29

🆔16146353

Scaling Laws For Scalable Oversight Scalable oversight, the process by which weaker AI systems supervise stronger ones, has been proposed as a key strategy to control future superintelligent systems. However, it is still unclear how scalable oversight itself scales. To address… https://t.co/jel5RtvBJt

❤️179

likes

🔁37

retweets

🖼️ Media

View Details View on X ↗

U

Unsloth AI

@UnslothAI

📅

Mon

🆔96809017

You can now Run Qwen3 locally with our Dynamic GGUFs! 🌠 With 128K Context Length added. Our Dynamic 2.0 GGUFs achieve superior accuracy, outperforming other methods on 5-shot MMLU & KL Divergence. Qwen3-235B-A22B coming soon. GGUFs: https://t.co/3OH7kpzXL3 https://t.co/wQjgJG34WW

❤️619

likes

🔁106

retweets

🖼️ Media

View Details View on X ↗

I

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

📅

Tue Apr 29

🆔62111634

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning "we introduce SelfPlay Critic (SPC), a novel approach where a critic model evolves its ability to assess reasoning steps through adversarial self-play games, eliminating the need for manual step-level… https://t.co/gkAt6tVlOe

❤️115

likes

🔁29

retweets

🖼️ Media

View Details View on X ↗

I

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

📅

Tue Apr 29

🆔54666432

BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text "we present BRIDGE, a comprehensive multilingual benchmark comprising 87 tasks sourced from real-world clinical data sources across nine languages. We systematically evaluated 52… https://t.co/4P6Um4Qme7

❤️38

likes

🔁8

retweets

🖼️ Media

View Details View on X ↗

← PreviousPage 580 of 656Next →