Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
V
vishal
@vishal_learner
πŸ“…
Mon
πŸ†”10167101

7 standout ideas from Lesson 2 (Error Analysis) of Hamel and Shreya's AI Evals course https://t.co/MSwtYF7P1m

Media 1
❀️23
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
T
Tarun Amasa
@TarunAmasa
πŸ“…
Wed
πŸ†”55905140

It’s official. We’ve raised $14m led by @OpenAI Startup Fund to bring AI to Excel. Endex is the first AI agent to live inside Excel. For the past year, we've been working with financial firms. Today we’re releasing it to the world. Our capacity is limited; comment below for… https://t.co/ULbDFajjCQ

❀️7,024
likes
πŸ”500
retweets
πŸ–ΌοΈ Media
S
Simone Scardapane
@s_scardapane
πŸ“…
Fri
πŸ†”74730520

*The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs* by @p_nawrot @PontiEdoardo @cheeesio @seb_ruder They study sparse attention techniques at scale, comparing to small dense models at the same compute budget. https://t.co/8dt7ceWhMe https://t.co/Fke4cDj4UC

Media 1
❀️193
likes
πŸ”28
retweets
πŸ–ΌοΈ Media
I
Sandra Kublik
@itsSandraKublik
πŸ“…
Thu Jul 31
πŸ†”43119504

Say hello to Command A Vision - our first multimodal Command model! It brings SOTA image + text reasoning, enterprise-grade security, and a low serving footprint to visual workflows. Happy building :)) https://t.co/dZngI2fFkA

Media 1
❀️194
likes
πŸ”19
retweets
πŸ–ΌοΈ Media
I
Tanishq Mathew Abraham, Ph.D.
@iScienceLuvr
πŸ“…
Tue Jul 29
πŸ†”61612821

Self-Guided Masked Autoencoder "we propose self-guided masked autoencoder, which internally generates informed mask by utilizing its progress in patch clustering, substituting the naive random masking of the vanilla MAE. Our approach significantly boosts its learning process… https://t.co/3TKwUHjZ3E

Media 1
❀️312
likes
πŸ”51
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Sun
πŸ†”78263912

Kinda amazing: the mystery model "summit" with the prompt "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future" & "make it better" 2,351 lines of code. First time https://t.co/Wkr7vvwYIB

❀️2,592
likes
πŸ”215
retweets
πŸ–ΌοΈ Media
J
jason liu
@jxnlco
πŸ“…
Thu Jul 31
πŸ†”52594711

how do we approach evaluation of rag https://t.co/ChkK3YcaY2

Media 1
❀️5
likes
πŸ–ΌοΈ Media
J
jason liu
@jxnlco
πŸ“…
Thu Jul 31
πŸ†”44191452

notes from @JuliaANeagu 's talk on hallicnations https://t.co/DHVkDDmRbp

Media 1
❀️16
likes
πŸ”4
retweets
πŸ–ΌοΈ Media
W
Wolfram Ravenwolf
@WolframRvnwlf
πŸ“…
Thu Jul 31
πŸ†”42551659

🚨 BREAKING: China is no longer catching up; they're setting the pace! Six Qwen3 models released in one week: from big ones that surpass all open models and nearly all closed AIs to small versions that can run on your laptop - each SOTA and top-tier in its class. I've been… https://t.co/UnX80thQBU

Media 1
❀️468
likes
πŸ”71
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Thu Jul 31
πŸ†”72056732

I keep seeing the Microsoft paper on AI use at work being used as a list of which jobs will be destroyed. But having high overlap with AI does not necessarily mean these jobs are at most risk of replacement with AI. As I described in my book, Co-Intelligence, its complicated https://t.co/kts3JzUxrU

Media 1
❀️289
likes
πŸ”42
retweets
πŸ–ΌοΈ Media
S
Skylar Payne
@skylar_b_payne
πŸ“…
Thu Jul 31
πŸ†”52694262

The answer is always evals. Yesterday I gave a guest lecture in @HamelHusain and @sh_reya AI Evals course... And most people were interested in answers to the same kind of question: "should I do/use X?" These questions always have the same answer: "evals". Link to full post… https://t.co/HmneAiHQ4d

Media 1
❀️12
likes
πŸ”1
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Thu Jul 31
πŸ†”09495249

One thing to pay attention to in benchmarking AI is how success is being measured. Models can be very fragile, getting the right answer rarely, but measurably more than chance, and look very good on benchmarks using PASS@10, but fail often in reality. https://t.co/ifO6cKwyt2 https://t.co/IshWWsXkGp

Media 1Media 2
❀️71
likes
πŸ”10
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Fri
πŸ†”08987889

My toxic trait is I like to make my own _repr_html_ in notebooks for complex data objects. Underrated way to experience joy with LLMs https://t.co/Q6iJHraoMr

Media 1
❀️83
likes
πŸ”8
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Fri
πŸ†”66342299

Having played with it a bunch, Horizon-alpha did a pretty solid version of Missile Command with Relativistic Effects with a few rounds of feedback, passed the Lem Test the first time (without reasoning), and drew a passable TikZ unicorn (if you know, you know). Very quick model. https://t.co/PwAwWqkauv

Media 1Media 2
+1 more
❀️110
likes
πŸ”8
retweets
πŸ–ΌοΈ Media
S
Shreya Shankar
@sh_reya
πŸ“…
Fri
πŸ†”96868423

Ironically, I feel like most AI teams are allergic to evals. Many folks want to fully automate evaluation instead of just sitting down to read a few traces :-( But not the students in the AI evals course!! Makes me so happy If you're curious about the secrets they're learning,… https://t.co/N4RWCoNvPN

Media 1
❀️99
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
V
vishal
@vishal_learner
πŸ“…
Fri
πŸ†”88456304

3 standout ideas from Lesson 3, and 4 standout ideas from Chapter 4 of the Course Reader from Hamel and Shreya's AI Evals course https://t.co/HgqPcu2eG8

Media 1
❀️18
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
J
ζœΊε™¨δΉ‹εΏƒ JIQIZHIXIN
@jiqizhixin
πŸ“…
Fri
πŸ†”64101590

ByteDance is exploring diffusion LLMs too! πŸ‘€ Seed Diffusion Preview: a blazing-fast LLM for code, built on discrete-state diffusion. With 2,146 tokens/sec inference on H20 GPUs, it outpaces Mercury & Gemini Diffusion, while matching their performance on standard code… https://t.co/KELdXb2YKu

Media 1
❀️536
likes
πŸ”77
retweets
πŸ–ΌοΈ Media
T
Teknium (e/Ξ»)
@Teknium1
πŸ“…
Fri
πŸ†”50910553

I really love synthetic data https://t.co/LmvNiTkNlH https://t.co/AZpP3DHpMt

Media 1
❀️406
likes
πŸ”42
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Fri
πŸ†”94377035

Had early access to Gemini with Deep Think. Very good model, big gains over standard Gemini 2.5 Pro for a lot of problems. Here is the first attempt at the starship control panel prompt I try with every model. First time I have seen a model make a 3D interface in response. https://t.co/bLFF2IcOP3

❀️962
likes
πŸ”89
retweets
πŸ–ΌοΈ Media
T
Teknium (e/Ξ»)
@Teknium1
πŸ“…
Fri
πŸ†”54947230

For my pretraining friends, perhaps @saurabh_shah2, @LucasAtkins7, @eliebakouch, etc Does this cover most distinct skills or tasks of pretraining today? Surely I'm missing some - let me know :) https://t.co/5lWjhi1Kjg

Media 1
❀️168
likes
πŸ”8
retweets
πŸ–ΌοΈ Media
N
Nils Reimers
@Nils_Reimers
πŸ“…
Fri
πŸ†”87746067

π„π§ππŸπ„π§π 𝐕𝐒𝐬𝐒𝐨𝐧-𝐑𝐀𝐆 𝐰𝐒𝐭𝐑 π‚π¨π‘πžπ«πž Our data is multi-modal πŸ–ΌοΈ, but most RAG pipelines are still text-only. This causes massive problems with complex visual information. With Cmd-A-Vision from @cohere you now get a sota vision model for Vision-RAG https://t.co/rFPHoQqQtA

Media 1
❀️741
likes
πŸ”71
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Fri
πŸ†”15221589

🚨New prompting report, from us: Don't bother with threats. Does threatening an AI really make it perform better (the way Google founder Brin claimed)? How about offering to tip the AI? We find no impact of threats or tips on average performance (but variance at question level) https://t.co/OMufYxTZlg

Media 1Media 2
❀️259
likes
πŸ”40
retweets
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Fri
πŸ†”04068483
⭐0.70

Build robust LLM applications with private data using LlamaIndex and @novita_labs's powerful model inference capabilities. πŸ”„ Connect diverse data sources through our comprehensive connector ecosystem - from PDFs and databases to APIs and documents 🧠 Transform your data into… https://t.co/TZ9JYmWXDi

Media 1
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Fri
πŸ†”46974431
⭐0.66

Whether you want to chat with your terminal or add a voice assistant to your web-app, we got you covered with our Gemini Live integration, now available in TypeScript! πŸ‘‡ Check out the demo below, where @itsclelia shows you how to set up and run a simple terminal chat - but if… https://t.co/R2N6KwhMwt

πŸ–ΌοΈ Media
M
Jacob Matson
@matsonj
πŸ“…
Fri
πŸ†”15969315

built baby's first eval: I want to find the best LLM for co-writing. What that means for me is I want it to have a great way of thinking in patterns of words, not numbers. so using NYT Connections puzzles seemed to be the perfect subject. https://t.co/QvKNoSs0cD

Media 1Media 2
❀️58
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
A
Ankur Goyal
@ankrgyl
πŸ“…
Fri
πŸ†”26108535

We are no longer charging per user. This applies to both free and pro plans. It is a privilege to be able to simplify our pricing. We've grown exponentially over the last year, which has made it obvious that the more you eval & log, the more value we can provide. Plain & simple. https://t.co/X3K6mNRpvL

Media 1
❀️74
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Fri
πŸ†”80454086

Self-evolving agents are an important component to artificial superintelligence. Finally, there is a survey paper on the topic. I believe it covers a lot of the important literature on self-evolving agents. A top read for the weekend! https://t.co/VErWVPUp6z

Media 1
❀️771
likes
πŸ”153
retweets
πŸ–ΌοΈ Media
C
Cerebras
@CerebrasSystems
πŸ“…
Fri
πŸ†”77440464

Cerebras Code: 20x faster than Claude, 1x the price Today we are launching two monthly coding plans: ➑️Cerebras Code Pro: $50/m – for indie developers ➑️Cerebras Code Max: $200/m – for power users with 5x rate limits Both plans get: Qwen3-Coder at 2,000 tokens/s, 131K context,… https://t.co/YUCtGzdyhf

❀️1,481
likes
πŸ”143
retweets
πŸ–ΌοΈ Media
J
jason liu
@jxnlco
πŸ“…
Fri
πŸ†”61662136

Bunch of good conversations in today's RAG office hours. Most teams get RAG wrong because they're obsessed with the AI instead of the data. The real money is in finding what users actually need through data analysis. I've seen $100k/month value unlocked just by identifying… https://t.co/BuFqpoUQeH

Media 1
❀️94
likes
πŸ”8
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Fri
πŸ†”83145791

Anyone can build useful AI Agents. But it requires having a solid framework to design and improve AI agents. That's what I'll teach in my new training on Building Effective AI Agents. Topics include context engineering, augmenting AI agents, multi-agent systems, and more. https://t.co/xPUV8QvK7L

Media 1
❀️428
likes
πŸ”73
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Sun
πŸ†”79487498

Reorganized the evals FAQ into categories, since there are so many now! You can also download the FAQ in different formats (pdf, markdown) from the sidebar on the page directly. https://t.co/JkGztq1AaH https://t.co/rQVf3bqe2c

Media 1
❀️439
likes
πŸ”60
retweets
πŸ–ΌοΈ Media
E
Eugene Yan
@eugeneyan
πŸ“…
Sat
πŸ†”39234389

Look Mum, I'm in the news*! (Now she has proof for our relatives that I'm not just on a looong vacation and bumming around in the US lol) * LangChain, VentureBeat, Geeky Gadgets, Smol AI News https://t.co/csnvi5bmxn

Media 1Media 2
+2 more
❀️63
likes
πŸ”4
retweets
πŸ–ΌοΈ Media