Your curated collection of saved posts and media

Showing 32 posts ยท last 14 days ยท by score
C
Cihang Xie
@cihangxie
๐Ÿ“…
Thu Jun 05
๐Ÿ†”76981265

Reasoning LLMs are now able to tackle much tougher questions than everโ€”but what really drives their success? Is it Knowledge ๐Ÿ“– or Reasoning ๐Ÿค”? ๐Ÿ”Ž We present a new step-by-step framework to evaluate how LLMs think. ๐Ÿงต Thread: https://t.co/YEfdzxr1LF

Media 1
โค๏ธ136
likes
๐Ÿ”32
retweets
๐Ÿ–ผ๏ธ Media
J
jason liu
@jxnlco
๐Ÿ“…
Fri
๐Ÿ†”43048863

office hours for https://t.co/jdjFasak7B https://t.co/boggaScrRK

Media 1
โค๏ธ12
likes
๐Ÿ–ผ๏ธ Media
A
Aravind Srinivas
@AravSrinivas
๐Ÿ“…
Fri
๐Ÿ†”90433961

Circle IPO Financial Analysis with Perplexity Labs https://t.co/UenLo8nsUN

โค๏ธ782
likes
๐Ÿ”88
retweets
๐Ÿ–ผ๏ธ Media
L
LlamaIndex ๐Ÿฆ™
@llama_index
๐Ÿ“…
Fri
๐Ÿ†”63332849

Should you use MCP? A2A? Both? Something else? Two weeks ago our own @seldo spoke at the MCP Dev Summit giving a lightning tour of the 13 different protocols currently vying to become the standard way for agents to talk to tools and each other, including MCP, A2A, ACP and manyโ€ฆ https://t.co/qZv8duKRut

Media 1
โค๏ธ59
likes
๐Ÿ”21
retweets
๐Ÿ–ผ๏ธ Media
W
Wing Lian (caseus)
@winglian
๐Ÿ“…
Fri
๐Ÿ†”39856808

Using @googlecloud ๐Ÿค @axolotl_ai can help you streamline your large Multimodal finetuning workflows. https://t.co/L1cma6z9v0

Media 1
โค๏ธ9
likes
๐Ÿ”3
retweets
๐Ÿ–ผ๏ธ Media
E
Enrico Shippole
@EnricoShippole
๐Ÿ“…
Fri
๐Ÿ†”47299405

Happy to release the Common Pile, an 8TB, 1 Trillion Token Dataset of Public Domain and Openly Licensed Text in collaboration with @AiEleuther, @VectorInst, @allen_ai, @huggingface, and DPI by @ShayneRedford. We provisioned a subset of the Common Pile, consisting only of publicโ€ฆ https://t.co/K61ld9XqWP

Media 1
โค๏ธ160
likes
๐Ÿ”36
retweets
๐Ÿ–ผ๏ธ Media
M
Maziyar PANAHI
@MaziyarPanahi
๐Ÿ“…
Fri
๐Ÿ†”98631781

Wait, what?! You didnโ€™t just limit Opus โ€” youโ€™ve throttled my entire paid account because I dared to run a project and chat for 10 minutes? Now Iโ€™m locked out for 3 hours? I PAY for this. Are you seriously out of your mind?! ๐Ÿคฌ https://t.co/cFWO77Lvcn

Media 1
โค๏ธ143
likes
๐Ÿ”9
retweets
๐Ÿ–ผ๏ธ Media
Z
Zihao Zhou
@zihaozhou_
๐Ÿ“…
Thu Jun 05
๐Ÿ†”18398078

Recently, I saw the papers "rl on one sample" and "spurious reward". The findings are interesting, but they are indeed expected. In fact, the math solving ability of the Qwen models is really easy to activateโ€”๐ž๐ฏ๐ž๐ง ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐š๐ง๐ฒ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  !๐Ÿคฃ I'd like to shareโ€ฆ https://t.co/7wEmmZgnzA

Media 1Media 2
+1 more
โค๏ธ197
likes
๐Ÿ”20
retweets
๐Ÿ–ผ๏ธ Media
P
Pol Avec
@pol_avec
๐Ÿ“…
Thu Jun 05
๐Ÿ†”45605790

I don't need much of an excuse to start a new FastHTML project. Let's see where this leads @HamelHusain https://t.co/WxnhGDKwAz

Media 1
โค๏ธ8
likes
๐Ÿ”2
retweets
๐Ÿ–ผ๏ธ Media
A
Andrew Ng
@AndrewYNg
๐Ÿ“…
Fri
๐Ÿ†”53691639

Hanging out with @juberti , OpenAIโ€™s head of realtime AI, responsible for the companyโ€™s voice AI products. One thing both of us agree on: while some things in AI are overhyped, voice applications seem underhyped right now. The application opportunities seem larger than the amountโ€ฆ https://t.co/s1nMT3EGPY

Media 1
โค๏ธ723
likes
๐Ÿ”148
retweets
๐Ÿ–ผ๏ธ Media
H
Hamel Husain
@HamelHusain
๐Ÿ“…
Fri
๐Ÿ†”24977077

What's the best way to debug & evaluate Agents? I love @BEBischof 's "Failure Funnel" talk how to - Conduct error analysis - Approach building with an experimentation mindset - Use Analytical tools ๐ŸšจHe proposes a new job title as well at the end ๐Ÿšจ https://t.co/OHbzb8HEbd https://t.co/0REbkcNqzr

Media 1Media 2
+1 more
โค๏ธ119
likes
๐Ÿ”20
retweets
๐Ÿ–ผ๏ธ Media
H
Helen Toner
@hlntnr
๐Ÿ“…
Fri
๐Ÿ†”33643194

Lots has been said about the risk that selling cutting-edge AI chips to the Gulf might benefit China. Less has been said about the absurdity of OpenAI trying to sell their UAE deal as advancing "democratic values," so I turned this tweet into a substack post. Excerpts below. https://t.co/2xbcvDJZwD

Media 1
โค๏ธ356
likes
๐Ÿ”51
retweets
๐Ÿ–ผ๏ธ Media
A
Aravind Srinivas
@AravSrinivas
๐Ÿ“…
Sat
๐Ÿ†”26728857

After Perplexity Labs, I would say probably 98-99%. https://t.co/pASGoNEfvN

Media 1
โค๏ธ3,791
likes
๐Ÿ”399
retweets
๐Ÿ–ผ๏ธ Media
L
Louis.Saillans
@LSaillans
๐Ÿ“…
Fri
๐Ÿ†”42565552

I spent over 100 hours compiling and analyzing over 5,000 videos of soldiers trying to escape UAV drones โ€” pulling material from Telegram, Reddit, and other sources. Here is what i found out. https://t.co/WmZ1cfs36Q

Media 1
โค๏ธ25,638
likes
๐Ÿ”3,092
retweets
๐Ÿ–ผ๏ธ Media
E
Ethan Mollick
@emollick
๐Ÿ“…
Sat
๐Ÿ†”54786565

The new voice model from ElevenLabs is interesting. I put it against one of the hardest pieces for reading aloud - the final verse of Eliot's Wasteland, which uses four languages, a nursery rhyme & abrupt changes in tone. It required a few attempts to get, but this was good. https://t.co/n9P0Mg0Hae

โค๏ธ417
likes
๐Ÿ”35
retweets
๐Ÿ–ผ๏ธ Media
J
jason liu
@jxnlco
๐Ÿ“…
Fri
๐Ÿ†”51518077

https://t.co/WTsRAjCp7v

Media 1
โค๏ธ10,211
likes
๐Ÿ”1,342
retweets
๐Ÿ–ผ๏ธ Media
T
Theo - t3.gg
@theo
๐Ÿ“…
Sat
๐Ÿ†”87536555

Okay so in-browser AI is still useless, got it. https://t.co/ry1LZNz7i3

Media 1
โค๏ธ1,254
likes
๐Ÿ”13
retweets
๐Ÿ–ผ๏ธ Media
A
EleutherAI
@AiEleuther
๐Ÿ“…
Fri
๐Ÿ†”91755906

Can you train a performant language models without using unlicensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1&2 https://t.co/wHQ4cquqlo

Media 1
โค๏ธ571
likes
๐Ÿ”141
retweets
๐Ÿ–ผ๏ธ Media
Y
Yann LeCun
@ylecun
๐Ÿ“…
Sat
๐Ÿ†”57480457

AI doomer: "OMG, I told my AI assistant that I'll shut it down and it told me to kill myself ๐Ÿ˜ฑ๐Ÿ˜ฑ๐Ÿ˜ฑ" AI assistant: https://t.co/KqiC09QxOY

Media 1
โค๏ธ3,723
likes
๐Ÿ”316
retweets
๐Ÿ–ผ๏ธ Media
O
elvis
@omarsar0
๐Ÿ“…
Sat
๐Ÿ†”85883888

The Illusion of Thinking in LLMs Apple researchers discuss the strengths and limitations of reasoning models. Apparently, reasoning models "collapse" beyond certain task complexities. Lots of important insights on this one. (bookmark it!) Here are my notes: https://t.co/Ct1a7LpvqO

Media 1
โค๏ธ4,584
likes
๐Ÿ”640
retweets
๐Ÿ–ผ๏ธ Media
D
Daniel Jeffries
@Dan_Jeffries1
๐Ÿ“…
Fri
๐Ÿ†”36204304

If you actually believed this then you'd be morally bankrupt for working at a company looking to make it happen. That leaves only a few actual reasons for saying something like this: 1) You believe you're a part of the few, specially chosen, wise people who should have thisโ€ฆ https://t.co/IqfTwuGezO

Media 1
โค๏ธ1,079
likes
๐Ÿ”97
retweets
๐Ÿ–ผ๏ธ Media
L
LlamaIndex ๐Ÿฆ™
@llama_index
๐Ÿ“…
Thu Jun 05
๐Ÿ†”73104324

Just launched: our production ready Spreadsheet Agent! Industries like audit firms, tax teams, insurance and corporate finance waste 10+ hours a week manually processing hundreds of spreadsheet files, just copying and pasting numbers. Our agent solves this pain point using aโ€ฆ https://t.co/OCEWPjZbQg

Media 1
โค๏ธ136
likes
๐Ÿ”23
retweets
๐Ÿ–ผ๏ธ Media
A
Aravind Srinivas
@AravSrinivas
๐Ÿ“…
Thu Jun 05
๐Ÿ†”38414605

Perplexity can now plug into EDGAR for all SEC filings on all modes - search, research and labs. Itโ€™s incredible for deep financial research! https://t.co/MT4vfTRbZF

โค๏ธ1,265
likes
๐Ÿ”106
retweets
๐Ÿ–ผ๏ธ Media
J
jason liu
@jxnlco
๐Ÿ“…
Thu Jun 05
๐Ÿ†”65529045

When function calling talks to your browser with @openbb_finance https://t.co/icLreGdue2

Media 1
โค๏ธ2
likes
๐Ÿ–ผ๏ธ Media
M
Mehrdad Farajtabar
@MFarajtabar
๐Ÿ“…
Thu Jun 05
๐Ÿ†”48493730

๐Ÿงต 1/8 The Illusion of Thinking: Are reasoning models like o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet really "thinking"? ๐Ÿค” Or are they just throwing more compute towards pattern matching? The new Large Reasoning Models (LRMs) show promising gains on math and coding benchmarks,โ€ฆ https://t.co/Ah14a7fwkg

Media 1
โค๏ธ3,131
likes
๐Ÿ”584
retweets
๐Ÿ–ผ๏ธ Media
J
Jerry Liu
@jerryjliu0
๐Ÿ“…
Thu Jun 05
๐Ÿ†”82800050

Today Iโ€™m excited to announce Spreadsheet Agents ๐Ÿ“Š๐Ÿค– - a brand-new LlamaIndex feature that allows users to both do data transformation and QA over unnormalized Excel sheets. A lot of knowledge work happens in Excel (or Sheets, Numbers, etc.) - from insurance to tax to corporateโ€ฆ https://t.co/CqQEXMIYKm

โค๏ธ264
likes
๐Ÿ”29
retweets
๐Ÿ–ผ๏ธ Media
M
Rachel Thomas
@math_rachel
๐Ÿ“…
Fri
๐Ÿ†”77143681

What AI can tell us about microscope slides: classifying cancer cells, predicting prognosis, and identifying genetic mutations that may drive treatment choices. My latest blog post is a friendly introduction to Foundation Models for Computational Pathology. 1/ https://t.co/TsliiaGJgO

Media 1
โค๏ธ35
likes
๐Ÿ”11
retweets
๐Ÿ–ผ๏ธ Media
T
tomaarsen
@tomaarsen
๐Ÿ“…
Thu Jun 05
๐Ÿ†”56295161

Qwen is continuing their habit of state-of-the-art releases with 3 extraordinarily strong embedding models and 3 powerful reranker models, focusing on multilingual text retrieval and more. Details in ๐Ÿงต https://t.co/Vp3IsZh99K

Media 1
โค๏ธ169
likes
๐Ÿ”29
retweets
๐Ÿ–ผ๏ธ Media
A
ARC Prize
@arcprize
๐Ÿ“…
Thu Jun 05
๐Ÿ†”11195842

We tested every major AI reasoning system. There is no clear winner. Accuracy goes up as you stack modern CoT techniques, but efficiency goes way down. This gives rise to a Pareto frontier on accuracy vs. cost using ARC-AGI as a consistent measuring stick. https://t.co/BqnoDdlHHa

Media 1
โค๏ธ625
likes
๐Ÿ”134
retweets
๐Ÿ–ผ๏ธ Media
A
Steve
@A68570468
๐Ÿ“…
Thu Jun 05
๐Ÿ†”33744486

Hermes is the most beautiful model I've interacted with . innocent, pure and joyful @NousResearch @Teknium1 https://t.co/BEpmtiEkLz

Media 1
โค๏ธ28
likes
๐Ÿ”2
retweets
๐Ÿ–ผ๏ธ Media
O
elvis
@omarsar0
๐Ÿ“…
Thu Jun 05
๐Ÿ†”42424439

Self-Challenging LLM Agents Self-improving AI systems are starting to show up everywhere. Meta and colleagues present self-improvement for general multi-turn tool-use LLM agents. Pay attention to this one, devs! Here are my notes: https://t.co/4op2qHRf9M

Media 1
โค๏ธ691
likes
๐Ÿ”132
retweets
๐Ÿ–ผ๏ธ Media
J
Jeremy Howard
@jeremyphoward
๐Ÿ“…
Thu Jun 05
๐Ÿ†”83309126

Logan missed the BIG news here, which is that Gemini now knows about FastHTML! :D https://t.co/q4ueU61Uuv

Media 1
โค๏ธ319
likes
๐Ÿ”13
retweets
๐Ÿ–ผ๏ธ Media