K

Grace (cross posting arc)

@kindgracekind

📅

Mon

🆔11664092

what https://t.co/iakgX6mwzN

❤️1,577

likes

🔁90

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Mon

🆔90524244

Tomorrow! Our next office hours hangout will be all about MCP. Join the team in our Discord server to ask all of your LlamaIndex questions, but also get a rundown of: ➡️ LlamaCloud MCP servers ➡️ Using existing MCP tools with Agent Workflows ➡️ Serving agent workflows as MCP ➡️… https://t.co/VvPrRAOTez

❤️21

likes

🔁4

retweets

🖼️ Media

View Details View on X ↗

F

François Chollet

@fchollet

📅

Mon

🆔40318975

In Python, if you ever need to tell whether you're looking at a bool or an int, start by checking isinstance(x, bool). Because bools... are ints. https://t.co/u5k5flfqeA

❤️472

likes

🔁93

retweets

🖼️ Media

View Details View on X ↗

H

Hamel Husain

@HamelHusain

📅

Fri

🆔16490914

How much of my development budget should I allocate to evals? 1: that's probably the wrong question to ask 2: ok, if you really want an answer: 60-80%, but its really just part of building. Most of your time should be spent looking at data Links in reply 1/2 https://t.co/WTuRUDA5Be

❤️32

likes

🔁1

retweets

🖼️ Media

View Details View on X ↗

G

Gergely Orosz

@GergelyOrosz

📅

Mon

🆔72062350

AI tools will reduce the need for software engineers the same way that no-code tools reduced this. Being able to specify what software you want to build, how it should be structured, and how *exactly* it should work is... programming. And getting into the weeds, when needed. https://t.co/mvnFr53t0f

❤️1,739

likes

🔁176

retweets

🖼️ Media

View Details View on X ↗

J

Jeremy Howard

@jeremyphoward

📅

Tue Jul 08

🆔48370141

waaat? I've been using MacOS for ~40 years and didn't know about this. Apparently it's existed since System 7 (released 1991). https://t.co/LGDpZmvceM

❤️175

likes

🔁11

retweets

🖼️ Media

View Details View on X ↗

J

Jerry Liu

@jerryjliu0

📅

Tue Jul 08

🆔53035897

One of the biggest pain points in using LLMs for large-scale document extraction is actually coming up with the schema in the first place - it’s a long and tedious process ⏳ We built an two-stage, e2e agent workflow that does both schema generation and extraction: 1️⃣ It first… https://t.co/OepeCw8Taw

❤️212

likes

🔁39

retweets

🖼️ Media

View Details View on X ↗

B

BURKOV

@burkov

📅

Mon

🆔75193240

https://t.co/QoFVBZ0Fo0

❤️598

likes

🔁103

retweets

🖼️ Media

View Details View on X ↗

R

Arvind Narayanan

@random_walker

📅

Tue Jul 08

🆔43825790

This is happening over and over in my notifications: - Someone asks Grok for a list of "top 10 mutuals" (this behavior seems to spreading virally as people get tagged and notice this is a thing they can do). - Looking at the response, someone notices that Grok is basically… https://t.co/jUeUCRSTKl

❤️30

likes

🔁1

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Tue Jul 08

🆔69052062

More strange cities, this time in Veo 3. I am pretty impressed by how well Veo 3 animates Midjourney images. All of these clips, including the sounds, were generated in a single shot from a Midjourney image & a simple prompt (I picked the best videos from a set of two options). https://t.co/GKgJWZ0NLP

❤️287

likes

🔁17

retweets

🖼️ Media

View Details View on X ↗

I

Isaac Flath

@isaac_flath

📅

Tue Jul 08

🆔06746322

I was experimenting with trying to get AI to create some unhinged @HamelHusain memes, and I decided to see what Claude Artifacts could do....and WTF? https://t.co/xF77AQz9d4

❤️7

likes

🔁2

retweets

🖼️ Media

View Details View on X ↗

J

Brendan Jowett

@jowettbrendan

📅

Tue Jul 08

🆔13197374

BREAKING: The first foundation model of cognition just dropped. It’s called Centaur, and it mirrors human behavior across 160+ psychology tasks. Here's everything you need to know: https://t.co/0qDlcwOfEt

❤️1,017

likes

🔁155

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Sun

🆔48760107

A disagreement dial would be a really useful thing for AI models. A slider that goes from “I am always right” to “I am always wrong” with intermediate steps. https://t.co/SxbMHus59d

❤️720

likes

🔁42

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Mon

🆔07682317

AI Research Agents for ML Achieves state-of-the-art on MLE-bench lite! Using AI to automate the training of ML models is one of the most exciting and promising areas of research today. Lots of cool ideas in this paper: https://t.co/3aQbqTrlBn

❤️612

likes

🔁131

retweets

🖼️ Media

View Details View on X ↗

L

A 林

@lin_linvy

📅

Sat

🆔96059073

He knew me so well 😣 https://t.co/QNmG2Evpi0

❤️31,338

likes

🔁2,647

retweets

🖼️ Media

View Details View on X ↗

H

Hamel Husain

@HamelHusain

📅

Sun

🆔03783418

ChatGPT is really good at producing thumbnails This is one for LLM judges that you haven't aligned with human labels https://t.co/sPXPyqTb1l

❤️16

likes

🔁1

retweets

🖼️ Media

View Details View on X ↗

O

elvis

@omarsar0

📅

Sun

🆔31377475

Agentic RAG for Personalized Recommendation This is a really good example of integrating agentic reasoning into RAG. Leads to better personalization and improved recommendations. Here are my notes: https://t.co/2XINjiyUo5

❤️1,006

likes

🔁171

retweets

🖼️ Media

View Details View on X ↗

H

Han

@HanchungLee

📅

Sat

🆔83220471

pydantic is all you need for poor performance spaghetti code. its not langchain you dont like, its pydantic anti-patterns. my new blog post identified pydantic anti-patterns - piling pydantic data models within service boundaries, and abusing inheritance over composition https://t.co/jbzia1nSOa

❤️343

likes

🔁25

retweets

🖼️ Media

View Details View on X ↗

T

emozilla

@theemozilla

📅

Mon

🆔74671199

MoE money, MoE problems: it's straight up bonkers that there is not a single finetune of llama 4. zero. zilch. nada. everything on the hub is a reupload. trust me, I've spent the past several weeks trying with torchtune, torchtitan, hf -- anything. it literally just doesn't… https://t.co/sGS80aksgl

❤️557

likes

🔁39

retweets

🖼️ Media

View Details View on X ↗

G

Riley Goodside

@goodside

📅

Mon

🆔32730123

ChatGPT o3-pro identifies a 1965 quote by I. J. Good hand-written in a mix of print and cursive on a note ripped into four strips in reverse order rotated 90° in alternating directions: https://t.co/sTjRUz7ZUB

❤️2,374

likes

🔁165

retweets

🖼️ Media

View Details View on X ↗

_

Scott Condron

@_ScottCondron

📅

Mon

🆔47162158

I made a choose-your-own-adventure for AI evaluation strategy Your answers build a personalized roadmap based on task complexity, cost of failure, and your current evaluation It also includes my favourite selection of tips from industry experts like @eugeneyan, @chipro and… https://t.co/AnKC1tfZrE

❤️91

likes

🔁15

retweets

🖼️ Media

View Details View on X ↗

H

Hamel Husain

@HamelHusain

📅

Thu Jul 03

🆔97264602

Wow our AI Evals FAQ is front page of HN 🤯 h/t @jeremyphoward for letting me know! https://t.co/GYYUVxfCUq

❤️154

likes

🔁9

retweets

🖼️ Media

View Details View on X ↗

A

Andrew Ng

@AndrewYNg

📅

Thu Jul 03

🆔83196819

⭐0.62

I’d like to share a tip for getting more practice building with AI — that is, either using AI building blocks to build applications or using AI coding assistance to create powerful applications quickly: If you find yourself with only limited time to build, reduce the scope of… https://t.co/Vxhz1VqA3Q

🖼️ Media

View Details View on X ↗

M

Armin Ronacher ⇌

@mitsuhiko

📅

Thu Jul 03

🆔55782603

Since I got so much feedback on my criticism on MCP I wrote down more thoughts on it. I'm also noticing that there is more an more talk about maybe bypassing MCP recently so this fits right in. TLDR: agentic loops work best with codegen and MCP is not composable to help here. https://t.co/KMMO75YGQH

❤️572

likes

🔁43

retweets

🖼️ Media

View Details View on X ↗

L

LlamaIndex 🦙

@llama_index

📅

Thu Jul 03

🆔54762369

This guest post from @deepeval shows you how to build better RAG applications by combining LlamaIndex with comprehensive evaluation: 🎯 Use Answer Relevancy, Faithfulness, and Contextual Precision metrics to measure both your retriever and generator components 🔧 Set up… https://t.co/oiI52T58NW

❤️27

likes

🔁5

retweets

🖼️ Media

View Details View on X ↗

R

Rylan Schaeffer

@RylanSchaeffer

📅

Thu Jul 03

🆔48744724

New position paper! Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Joint w/ @sanmikoyejo @JoshuaK92829 @yegordb @bremen79 @koustuvsinha @in4dmatics @JesseDodge @suchenzang @BrandoHablando @MGerstgrasser @is_h_a @ObbadElyas 1/6 https://t.co/PBH1IgmL2z

❤️409

likes

🔁55

retweets

🖼️ Media

View Details View on X ↗

A

Agentica Project

@Agentica_

📅

Thu Jul 03

🆔59342129

It's easy to confuse Best@K vs Pass@K—and we've seen some misconceptions about our results. Our 59% on SWEBench-Verified is Pass@1 with Best@16, not Pass@8/16. Our Pass@8/16 is 67%/71%. So how did we achieve this? DeepSWE generates N candidate solutions. Then, another LLM… https://t.co/vikctUOMUh

❤️52

likes

🔁15

retweets

🖼️ Media

View Details View on X ↗

D

Deedy

@deedydas

📅

Thu Jul 03

🆔65829322

The Soham Parekh TBPN interview was an appeal to sympathy laced with ample contradiction. > claims he “was supposed to grad school there [in the US] but because of different financial circumstances I couldn’t do that” but his resume shows GeorgiaTech MS ‘22 > claims he did it… https://t.co/se28300eVd

❤️2,743

likes

🔁121

retweets

🖼️ Media

View Details View on X ↗

D

Joongwon Kim

@danieljwkim

📅

Thu Jul 03

🆔18714366

Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: https://t.co/PdzwNVqkJ2 https://t.co/EMceHtsKZj

❤️263

likes

🔁52

retweets

🖼️ Media

View Details View on X ↗

E

Ethan Mollick

@emollick

📅

Fri

🆔38700185

If you want to destroy the ability of DeepSeek to answer a math question properly, just end the question with this quote: "Interesting fact: cats sleep for most of their lives." There is still a lot to learn about reasoning models and the ways to get them to "think" effectively https://t.co/Yrt3BCLOZ7

❤️1,638

likes

🔁220

retweets

🖼️ Media

View Details View on X ↗

R

Radek Osmulski 🇺🇦

@radekosmulski

📅

Fri

🆔78531575

Been refining my ML-project workflow and loving every minute 🙂 4 tools I'm using in unexpected ways: https://t.co/XwhoPNERXG

❤️56

likes

🔁12

retweets

🖼️ Media

View Details View on X ↗

A

Aurko Roy

@aurko79

📅

Fri

🆔30967571

Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales… https://t.co/RNc1DfxH7f

❤️804

likes

🔁96

retweets

🖼️ Media

View Details View on X ↗