Your curated collection of saved posts and media

Showing 32 posts Β· last 14 days Β· by score
V
Taelin
@VictorTaelin
πŸ“…
Thu Jul 10
πŸ†”15139952

Something about this kind of prompt is simply unfathomable to LLMs. They just can't perform better than chance, and I'm not sure why. Most people will dismiss this as just being "hard math stuff", but it is not, I swear. It is just alien to you because it is *niche*, thus, it… https://t.co/R9j73R0wAC

Media 1
❀️739
likes
πŸ”66
retweets
πŸ–ΌοΈ Media
S
Sayash Kapoor
@sayashk
πŸ“…
Thu Jul 10
πŸ†”58404130

After we invented the dynamo, it took us 40 years to electrify factories. In the process, we had to redesign the entire factory layout β€”Β electrifying existing factories didn't cut it. Software engineering will likewise need to undergo drastic changes to truly benefit from AI.… https://t.co/Vu67Annscv

Media 1
❀️193
likes
πŸ”33
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Thu Jul 10
πŸ†”99629470

Excited to announce my new short course: Building Agentic Applications with Replit Agent and n8n. With AI this capable, I believe anyone can become a builder. The stack I use here will teach you how to rapidly build agentic apps with no-code tools. https://t.co/MezezZtuvy

❀️162
likes
πŸ”15
retweets
πŸ–ΌοΈ Media
M
Maxime Labonne
@maximelabonne
πŸ“…
Thu Jul 10
πŸ†”75381864

Liquid AI open-sources a new generation of edge LLMs! πŸ₯³ I'm so happy to contribute to the open-source community with this release on @huggingface! LFM2 is a new architecture that combines best-in-class inference speed and quality into 350M, 700M, and 1.2B models. https://t.co/ILhTJknlAx

Media 1
❀️697
likes
πŸ”107
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Thu Jul 10
πŸ†”91869348

This large study of 187k developers using GitHub Copilot finds AI transforms nature of coding. Coders focus: more coding & less management. They need to coordinate less, working with fewer people They experiment more with new languages, which would increase earnings $1,683/year https://t.co/UN62E7Hmem

Media 1
❀️335
likes
πŸ”42
retweets
πŸ–ΌοΈ Media
J
Jeremy Howard
@jeremyphoward
πŸ“…
Thu Jul 10
πŸ†”56466171

I replicated this result, that Grok focuses nearly entirely on finding out what Elon thinks in order to align with that, on a fresh Grok 4 chat with no custom instructions. https://t.co/NgeMpGWBOB https://t.co/QTWzjtYuxR

Media 1
❀️4,925
likes
πŸ”739
retweets
πŸ–ΌοΈ Media
J
jack morris
@jxmnop
πŸ“…
Fri
πŸ†”81774280

so xAI just 10x’d the amount of compute we use on RL and the models only got a tiny bit better are we just doing RL wrong? or is pretraining just inherently much more useful https://t.co/xyknCRWemU

Media 1Media 2
❀️1,436
likes
πŸ”47
retweets
πŸ–ΌοΈ Media
A
Aravind Srinivas
@AravSrinivas
πŸ“…
Fri
πŸ†”15928315

β€œComet browser gives the first glimpse of 100x productivity” - Early Chrome PM, a16z GP. https://t.co/BhZN7latqL

Media 1
❀️608
likes
πŸ”41
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Fri
πŸ†”81019661

Grok 4, in general, is very influenced by search results and pretty credulous when it sees a web search result. When you ask it to code, it often looks for code online first and uses that. https://t.co/7C0dEIyO82

Media 1
❀️466
likes
πŸ”19
retweets
πŸ–ΌοΈ Media
J
Jeremy Howard
@jeremyphoward
πŸ“…
Thu Jul 10
πŸ†”96917714

Here's a complete unedited video of asking Grok for its views on the Israel/Palestine situation. It first searches twitter for what Elon thinks. Then it searches the web for Elon's views. Finally it adds some non-Elon bits at the end. ZA 54 of 64 citations are about Elon. https://t.co/6Mr33LByrm

❀️7,642
likes
πŸ”911
retweets
πŸ–ΌοΈ Media
T
Teknium (e/Ξ»)
@Teknium1
πŸ“…
Thu Jul 17
πŸ†”81478051

Just merged a PR for an environment to improve LLM as a Judge as well as evaluate models on their capability of doing judgements! Did you know that all verifiable RL environments are nearly equivalent to benchmarks (and vice-versa!)? So we added an evaluate command to Atropos'… https://t.co/5cg5TQojE6

Media 1
❀️65
likes
πŸ”3
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Thu Jul 17
πŸ†”85083895

Agent Leaderboard v2 is here! > GPT-4.1 leads > Gemini-2.5-flash excels at tool selection > Kimi K2 is the top open-source model > Grok 4 falls short > Reasoning models lag behind > No single model dominates all domains More below: https://t.co/dcWDqXSj7m

Media 1
❀️1,810
likes
πŸ”214
retweets
πŸ–ΌοΈ Media
Z
Ravid Shwartz Ziv
@ziv_ravid
πŸ“…
Mon
πŸ†”43127246

If you're at ICML, come tomorrow (Tuesday) to Oscar's talk, where he will present our paper "Layer by layer: Uncovering hidden representations in language models" at 10am (West Ballroom D) and for the poster session at 11am (East Exhibition Hall A-B #E-2607). https://t.co/8gPdEvldQd

Media 1
❀️30
likes
πŸ”6
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Thu Jul 17
πŸ†”42669416

The eval space is the most intense battle for AI market share I have seen second to coding agents. This is why we will have Arize & Braintrust go head-to-head. They will each show how to complete our 5 homework assignments using their tools . Over 1k students learning about… https://t.co/rjLzNOphBz

Media 1
❀️113
likes
πŸ”8
retweets
πŸ–ΌοΈ Media
S
Sayash Kapoor
@sayashk
πŸ“…
Thu Jul 17
πŸ†”91404244

The mainstream view of AI for science says AI will rapidly accelerate science, and that we're on track to cure cancer, double the human lifespan, colonize space, and achieve a century of progress in the next decade. In a new AI Snake Oil essay, @random_walker and I argue that… https://t.co/p5OBUluVyg

Media 1Media 2
+1 more
❀️233
likes
πŸ”62
retweets
πŸ–ΌοΈ Media
O
elvis
@omarsar0
πŸ“…
Thu Jul 17
πŸ†”18257591

Agentic-R1 This 7B model is surprisingly good at interleaved tool use and reasoning capabilities. It's fun to see small language models improving this fast. Knowledge distillation in full display. Here are my notes: https://t.co/mPaZA36JUv

Media 1
❀️618
likes
πŸ”112
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Fri
πŸ†”18445354

ChatGPT agent: "create a PDF of a novel D&D adventure, add illustrations, make it super interesting and deep, add tables, etc" "Fix the formatting, build it out more" Got a 19 page PDF. Agent doesn't do layouts well, but pulls off building a coherent adventure, hard for LLMs. https://t.co/OYEo3L1hgP

Media 1Media 2
+2 more
❀️341
likes
πŸ”24
retweets
πŸ–ΌοΈ Media
H
Hamel Husain
@HamelHusain
πŸ“…
Fri
πŸ†”88197898
⭐0.30

πŸ₯ @hwchase17 has entered the fight https://t.co/UkP9k6xcwF

Media 1
❀️22
likes
πŸ”5
retweets
πŸ–ΌοΈ Media
J
Jan Eric Lenssen
@janericlenssen
πŸ“…
Thu Jul 17
πŸ†”91336420

You can bring our Sudoku solving diffusion models to other domains! If you are at interested and at #ICML2025, come see @bartek_pog and @ChrisWewer's πŸŒ€ Spatial Reasoners package β€” now released in beta! Here are some examples for images and videos. Links below. https://t.co/mkSR4WxUGi

❀️193
likes
πŸ”26
retweets
πŸ–ΌοΈ Media
W
Wizard Gang
@wizardposting1
πŸ“…
Fri
πŸ†”52269010

https://t.co/uRGLcBWnxP

Media 1
❀️21,740
likes
πŸ”1,721
retweets
πŸ–ΌοΈ Media
S
Haider.
@slow_developer
πŸ“…
Fri
πŸ†”93804055

Eric Schmidt says we're on the verge of AI replacing most programming and math tasks these domains are scale-free and don't require real-world data β€” just compute "world-class AI mathematicians will emerge in the next year, and top-tier AI programmers within two" https://t.co/oJLjx9BIue

❀️671
likes
πŸ”118
retweets
πŸ–ΌοΈ Media
E
Eugene Yan
@eugeneyan
πŸ“…
Wed
πŸ†”53393802

How do you build an LLM-evaluator / LLM-as-Judge? The book for "AI Evals for PMs and Engineers" has a chapter devoted to it (35% discount: https://t.co/3botO05aw6) First, we need to define the right metrics. For example, we can start by listing the failure modes from our error… https://t.co/C3wVt06fT6

Media 1
❀️851
likes
πŸ”73
retweets
πŸ–ΌοΈ Media
Z
ZEKE
@zekedup
πŸ“…
Wed
πŸ†”45704402

https://t.co/93kn3u7W5T

Media 1
❀️523
likes
πŸ”4
retweets
πŸ–ΌοΈ Media
A
Arthur Gretton
@ArthurGretton
πŸ“…
Mon
πŸ†”20724360

Accelerated Diffusion Models via Speculative Sampling, at #icml25 ! at 16:30 Tuesday July 15 poster E-3012 https://t.co/GGuwJxdIFf @ValentinDeBort1 @agalashov @ArnaudDoucet1 https://t.co/mu3xgoIdCR

Media 1
❀️305
likes
πŸ”46
retweets
πŸ–ΌοΈ Media
S
SkalskiP
@skalskip92
πŸ“…
Wed
πŸ†”40086682

supervision-0.26.0 is out we finally released support for ViTPose and ViTPose++ pose estimation models from @huggingface transformers link: https://t.co/xXMRaS3Guk https://t.co/chhTNMWxJ0

❀️993
likes
πŸ”155
retweets
πŸ–ΌοΈ Media
J
jason liu
@jxnlco
πŸ“…
Wed
πŸ†”73998870

Tired of misleading benchmark results that don't reflect real-world embedding performance? Discover why your embeddings might be underperforming despite strong benchmark scores in this eye-opening workshop. Kelly Hong, Researcher at Chroma, will reveal the hidden pitfalls of… https://t.co/SsWBOZX1Qq

Media 1
❀️4
likes
πŸ–ΌοΈ Media
L
LlamaIndex πŸ¦™
@llama_index
πŸ“…
Wed
πŸ†”71760242

Join our own @tuanacelik at ODSC's Agentic AI Summit tomorrow! She'll be presenting a hands-on workshop where you'll learn how to: ➑️ Build agents using LlamaIndex - Learn to create autonomous applications that use goals and tools to accomplish tasks independently ➑️ Compose… https://t.co/6jcYIGR70s

Media 1
❀️15
likes
πŸ”6
retweets
πŸ–ΌοΈ Media
D
Dariusz Debowczyk
@ddebowczyk
πŸ“…
Wed
πŸ†”56096231

InstructorPHP v1.0.0 released! πŸŽ‰ First stable release of InstructorPHP - MIT/OS library for structured data extraction and LLM integration for PHP apps. Framework agnostic, batteries included. πŸ”₯ StructuredOutput, Inference, and Embeddings classes with fluent, cohesive APIs.… https://t.co/5WzUL2WqYZ

Media 1
❀️4
likes
πŸ”1
retweets
πŸ–ΌοΈ Media
T
TBPN
@tbpn
πŸ“…
Wed
πŸ†”49108009

BREAKING: Claude Code PMs Boris Cherny and Cat Wu have returned to Anthropic after a brief stint at Cursor. https://t.co/GGcNHfppMM

Media 1
❀️2,097
likes
πŸ”102
retweets
πŸ–ΌοΈ Media
J
jason liu
@jxnlco
πŸ“…
Wed
πŸ†”27987092

notes on context engineering with @walden_yan of @cognition_labs https://t.co/DJsWC1EESr

Media 1
❀️731
likes
πŸ”59
retweets
πŸ–ΌοΈ Media
E
Ethan Mollick
@emollick
πŸ“…
Wed
πŸ†”90207096

The research on AI companions and mental health is still very preliminary & unclear as to long-term impact. Seems like an important topic to research right now. (I would also hope that xAI is tracking anonymized data about their new companion product for known potential harms) https://t.co/bvU8X2tWbE

Media 1Media 2
+1 more
❀️201
likes
πŸ”19
retweets
πŸ–ΌοΈ Media
C
Chip Huyen
@chipro
πŸ“…
Wed
πŸ†”08184115

I open sourced Sniffly, a tool that analyzes Claude Code logs to help me understand my usage patterns and errors. Key learnings. 1. The biggest type of errors Claude Code made is Content Not Found (20 - 30%). It tries to find files or functions that don't exist. So I… https://t.co/nxcjMcjR9C

Media 1Media 2
❀️1,187
likes
πŸ”124
retweets
πŸ–ΌοΈ Media