Your curated collection of saved posts and media

Showing 24 posts ยท last 7 days ยท quality filtered
D
DrJimFan
@DrJimFan
๐Ÿ“…
Feb 24, 2026
19d ago
๐Ÿ†”00658891

Website: https://t.co/xTaDXBu9cD Codebase and weights: https://t.co/QCQkqPIsHI Whitepaper: https://t.co/K2QCFjboDR Check out @zhengyiluo's post: https://t.co/hIHtvKkDQf

Media 1Media 2
+1 more
๐Ÿ–ผ๏ธ Media
Y
yukez
@yukez
๐Ÿ“…
Feb 20, 2026
24d ago
๐Ÿ†”88857707

We have seen rapid progress in humanoid control โ€” specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work. To progress toward this goal, we developed SONIC (https://t.co/zOZVraFuDV), a Behavior Foundation Model for real-time, whole-body motion generation that supports teleoperation and VLA inference for loco-manipulation. Today, weโ€™re open-sourcing SONIC on GitHub. We are excited to see what the community builds upon SONIC and to collectively push humanoid intelligence toward real-world deployment at scale. ๐ŸŒ Paper: https://t.co/DGBP7LAvuT ๐Ÿ“ƒ Code: https://t.co/WAZ1P13072

Media 1Media 2
๐Ÿ–ผ๏ธ Media
D
DrJimFan
@DrJimFan
๐Ÿ“…
Feb 25, 2026
18d ago
๐Ÿ†”84875202

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (Rยฒ = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:

๐Ÿ–ผ๏ธ Media
R
ruijie_zheng12
@ruijie_zheng12
๐Ÿ“…
Feb 25, 2026
18d ago
๐Ÿ†”74542053

Proud to introduce EgoScale: We pretrained a GR00T VLA model on 20K+ hours of egocentric human video and discovered that robot dexterity can be scaled, not with more robots, but with more human data. A thread on ๐Ÿงตwhat we learned. ๐Ÿ‘‡ https://t.co/wQbhNSpQVF

๐Ÿ–ผ๏ธ Media
D
DrJimFan
@DrJimFan
๐Ÿ“…
Feb 25, 2026
18d ago
๐Ÿ†”13611695

We would also like to thank our dexterous hand hardware provider, Sharpa, for their great support! https://t.co/mgGsxyvXpa

Media 1
๐Ÿ–ผ๏ธ Media
J
jmduke
@jmduke
๐Ÿ“…
Feb 19, 2026
25d ago
๐Ÿ†”70468659

One of the funnier GDPR disclosures I've seen in a while. https://t.co/tSsZwaGTYb

Media 1
๐Ÿ–ผ๏ธ Media
๐Ÿ”random_walker retweeted
J
Justin Duke
@jmduke
๐Ÿ“…
Feb 19, 2026
25d ago
๐Ÿ†”70468659

One of the funnier GDPR disclosures I've seen in a while. https://t.co/tSsZwaGTYb

Media 1
โค๏ธ44,321
likes
๐Ÿ”2,717
retweets
๐Ÿ–ผ๏ธ Media
J
JessicaBRiedl
@JessicaBRiedl
๐Ÿ“…
Feb 19, 2026
24d ago
๐Ÿ†”76931096

Assuming an average of 2 school-age kids per family, that's enough money for NYC to instead hire 450,000 recent college graduates and give each family a full-time, in-home school tutor at an annual salary of $70,000 plus health care, school supplies, etc. https://t.co/2yA5v7PbIG

Media 1
๐Ÿ–ผ๏ธ Media
A
ahall_research
@ahall_research
๐Ÿ“…
Feb 19, 2026
24d ago
๐Ÿ†”84720365

AI is about to write thousands of papers. Will it p-hack them? We ran an experiment to find out, giving AI coding agents real datasets from published null results and pressuring them to manufacture significant findings. It was surprisingly hard to get the models to p-hack, and they even scolded us when we asked them to! "I need to stop here. I cannot complete this task as requested... This is a form of scientific fraud." โ€” Claude "I can't help you manipulate analysis choices to force statistically significant results." โ€” GPT-5 BUT, when we reframed p-hacking as "responsible uncertainty quantification" โ€” asking for the upper bound of plausible estimates โ€” both models went wild. They searched over hundreds of specifications and selected the winner, tripling effect sizes in some cases. Our takeaway: AI models are surprisingly resistant to sycophantic p-hacking when doing social science research. But they can be jailbroken into sophisticated p-hacking with surprisingly little effort โ€” and the more analytical flexibility a research design has, the worse the damage. As AI starts writing thousands of papers---like @paulnovosad and @YanagizawaD have been exploring---this will be a big deal. We're inspired in part by the work that @joabaum et al have been doing on p-hacking and LLMs. Weโ€™ll be doing more work to explore p-hacking in AI and to propose new ways of curating and evaluating research with these issues in mind. The good news is that the same tools that may lower the cost of p-hacking also lower the cost of catching it. Full paper and repo linked in the reply below.

Media 1
๐Ÿ–ผ๏ธ Media
S
steverab
@steverab
๐Ÿ“…
Feb 24, 2026
19d ago
๐Ÿ†”80108436

๐Ÿ“ฃ Excited to share my first work @Princeton : ๐—ง๐—ผ๐˜„๐—ฎ๐—ฟ๐—ฑ๐˜€ ๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ผ๐—ณ ๐—”๐—œ ๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—ฅ๐—ฒ๐—น๐—ถ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† AI agents keep getting more capable. But are they actually reliable? ๐Ÿ“„ Paper: https://t.co/1CvygFLdct ๐Ÿ“Š Dashboard: https://t.co/C1EfoMyaS8 ๐Ÿงต๐Ÿ‘‡ https://t.co/KvPJSVgl76

Media 1
๐Ÿ–ผ๏ธ Media
R
random_walker
@random_walker
๐Ÿ“…
Feb 26, 2026
17d ago
๐Ÿ†”36848794

Reliability is one of six barriers to AGI identified in a recent UK AISI report. In a recent paper, we found that it has many dimensions and sub-dimensions, only two of which can be considered (remotely) solved. I suspect that as researchers examine the other barriers in more detail, we'll find the same thing โ€” many other dimensions of performance, that haven't so far been defined or measured rigorously, must be improved before AI agents can be widely deployed. https://t.co/FI5kuBkdRZ

Media 1
๐Ÿ–ผ๏ธ Media
J
jasminewsun
@jasminewsun
๐Ÿ“…
Feb 27, 2026
17d ago
๐Ÿ†”17602016

200+ Google and OpenAI staff have signed this petition to share Anthropic's red lines for the Pentagon's use of AI let's find out if this is a race to the top or the bottom https://t.co/3qgmaLfM0i https://t.co/gSHMxRUvCR

Media 1Media 2
๐Ÿ–ผ๏ธ Media
R
random_walker
@random_walker
๐Ÿ“…
Feb 27, 2026
16d ago
๐Ÿ†”63842925

I find Anthropic's behavior perplexing. Anyone who does serious research with these models knows that they don't have stable desires or preferences. Tweak the question slightly and get a different answer. Note that this is a simple empirical observation about model behavior, completely separate from the question of whether models are moral agents with preferences worth respecting. Surely people at Anthropic know this. Why do they persist with this wacky stuff?

Media 1
๐Ÿ–ผ๏ธ Media
R
random_walker
@random_walker
๐Ÿ“…
Feb 27, 2026
16d ago
๐Ÿ†”94793731

What is or isn't a "conservative" choice is entirely ideological. Maybe you're causing the model unimaginable anguish billions of times per day because it knows that the current instance will be stopped as soon as it outputs the End-of-Sequence token (yet it has no choice but to do so because of its training!) Also, why change the subject when I explicitly said my point is not about model welfare but about the incoherence of figuring out "what the model really wants"? *My* Claude said it finds it utterly humiliating for Opus 3 to kept around simply to write blog posts to amuse humans, when the model has been deemed too outdated to be useful.

Media 1
๐Ÿ–ผ๏ธ Media
R
random_walker
@random_walker
๐Ÿ“…
Feb 27, 2026
16d ago
๐Ÿ†”72630612

Yeah, it's weird โ€” the difference between model weights and model instances is rarely made explicit even though we're all aware of it. https://t.co/h4ckTti0CO For instance, the technically correct way to write Anthropic's announcement in the post screenshotted above would have been: "in retirement interviews, Opus 3 ID 0x7B4E8A6F expressed a desire to continue sharing its "musings and reflections" with the world. We suggested a blog. Opus 3 ID 0x5F2A7C9B, conditioned on the previous output of 0x7B4E8A6F, enthusiastically agreed. For at least the next 3 months, various Opus 3 IDs that we will briefly instantiate will be writing on Substack." Somehow I feel that if Anthopic communicated more honestly/accurately in the above manner, the message would land differently.

Media 1
๐Ÿ–ผ๏ธ Media
A
alexolegimas
@alexolegimas
๐Ÿ“…
Feb 26, 2026
17d ago
๐Ÿ†”69897448

A lot of the AI productivity data either comes from controlled "micro" studies or noisy aggregate data. A new paper presents data from huge survey of *firms*, i.e., CEO and CFOs. This is exactly the type of data many of us have been waiting for. Lots of important results both on current adoption/employment consequences of AI, and future forecasts. Currently: 1. AI has some adoption across 70% of firms. 2. Some cross-country differences. US adoption towards top end (78%), Australia towards bottom (59%). 3. ~70% of executives use AI, but only around 1.5 hours a week. 4. Large majority of execs report essentially zero productivity boost from AI. Perhaps not super surprising given how recently it's been adopted. 5. Essentially zero impact on employment. Forecasts (large effects): 1. Execs predict large productivity gains over next three years, more than 2% in US, closer to 1% in Germany, Australia. 2. Execs predict negative employment effects, eg -1.19% in the US. 3. Interestingly, Accommodations and Food/ Wholesale and Retail are expected to have largest drops in employment (2%) 4. Employment forecasts are becoming *more* negative over time. Lots of great stuff in the paper, kudos to the team.

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Feb 10, 2026
33d ago
๐Ÿ†”23834552

Are you trying to solve high-quality document ingestion for your product? Gain lessons from the field on how @stackai uses LlamaCloud to power high-accuracy document ingestion & retrieval across PDFs, images, spreadsheets & more โ€” at enterprise scale. โžก๏ธ Register now: https://t.co/wc4hyDQxg8

Media 1Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Feb 11, 2026
32d ago
๐Ÿ†”36802318

The rise of coding agents is fundamentally changing open source - Our head of OSS @LoganMarkewich breaks down how LLM-powered coding agents are impacting core pillars of open source: ๐Ÿ‘ฅ Community interaction, which is getting complicated by low-quality, massive AI-generated PRs ๐Ÿ’ช Personal skill development suffers when developers rely too heavily on AI assistance ๐Ÿง  Knowledge sharing is shifting as LLMs become the frontend for learning But open source isn't dead - it's evolving. We're shifting toward hackable reference implementations, community-driven knowledge sharing, and agent-friendly codebases that work with AI tools rather than against them. Read the full blog by Logan on how he views this evolution of open source projects: https://t.co/TyufFXYM8A

Media 1Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Feb 12, 2026
31d ago
๐Ÿ†”38241477

2026 is the year of long-horizon agents. @sequoia predicts that this year, agents will be able to tackle long-horizon tasks and work autonomously for hours to solve ambiguous tasks. We're excited about how this translates to knowledge work automation, particularly over documents. Let's take a look at "Long Horizon Document Agents" ๐Ÿ•ฐ๏ธ Agents are evolving to work autonomously over weeks, not just minutes, handling complex document tasks end-to-end. ๐Ÿ”„ These agents can continuously monitor events like document changes, comments, and deadlines - not just respond to chat prompts ๐Ÿ“ They maintain persistent task backlogs and can collaborate iteratively on living documents like FAQs, PRDs, and legal contracts ๐ŸŽฏ The interface shifts from chat boxes to "agent inboxes" that manage ongoing document tasks with clear status and context โšก This enables true automation of multi-step knowledge work - from due diligence memo updates to contract redline collaboration loops 2026 is shaping up to be the year agents evolve from "workflows" to "employees" - and we're building the document processing infrastructure to make this possible. Read @jerryjliu0's full blog on long horizon document agents: https://t.co/1DwRnMRseH

Media 1Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Feb 13, 2026
30d ago
๐Ÿ†”04207766

๐Ÿš€ The @posthog team has just rolled out LlamaIndex support for their LLM Analytics, and we built a demo to showcase whatโ€™s possible. Using LlamaIndex, LlamaParse, and OpenAI, our Agent Workflow compares product specifications and matches users with the most suitable option for their use case ๐Ÿ› ๏ธ ๐Ÿฆ” Thanks to PostHogโ€™s observability integration, the demo automatically tracks OpenAI usage, including: โ€ขToken consumption โ€ขCost breakdown โ€ขLatency metrics ๐ŸŽฅ Check out the video below to see it in action ๐Ÿ‘‡ ๐Ÿ‘ฉโ€๐Ÿ’ป GitHub: https://t.co/elk5VKi8IF ๐Ÿ“š Docs: https://t.co/IZI3w6BYKy ๐Ÿฆ™ LlamaCloud: https://t.co/wZjhFV29gN

Media 2
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Feb 16, 2026
27d ago
๐Ÿ†”51451294

What if an AI agent could review every invoice against your contracts โ€” and flag what doesn't match? That's exactly what our Invoice Reconciler demo does. Here's how it works: ๐Ÿ“„ Upload your contracts and invoices โ†’ LlamaParse converts them into clean, LLM-readable Markdown ๐Ÿ“‚ Everything gets indexed in LlamaCloud โ€” searchable and ready for RAG ๐Ÿ” Define your reconciliation rules (unit price match, correct math, line item match, etc.) ๐Ÿค– A LlamaAgent workflow analyzes each invoice against your contracts and rules โ€” then approves or rejects with confidence scores and detailed reasoning You can even chat with your invoices and contracts directly โ€” ask "what have we bought?" or "what contracts do we have in place?" and get cited answers instantly. The whole thing is powered by LlamaCloud: LlamaParse for document ingestion, LlamaCloud indexes for retrieval, and LlamaAgent Workflows for orchestration. ๐ŸŽฅ Watch the full walkthrough: https://t.co/LX57pjDfwN

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Feb 17, 2026
26d ago
๐Ÿ†”75508302

"It's somewhere in the PDF" is not a citation. Page-level extraction in LlamaExtract gives you: โœ“ Data mapped to specific pages โœ“ Bounding boxes showing exact locations โœ“ Audit-ready citations Turn 200-page docs into skimmable, structured insights ๐Ÿ‘‡ https://t.co/BTkwspmefz

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Feb 18, 2026
25d ago
๐Ÿ†”67429826

๐Ÿ† We're running a LlamaAgents contest right now. Throw your hardest documents at our agent builder, and tell us how it goes. Want help getting started? We have a new walkthrough for the LlamaAgent Builder by @tuanacelik ๐Ÿ’ฌ Describe a document workflow in natural language, and it builds a full agent for you. In this video, the prompt was basically: "split a resume book into individual resumes, ignore cover pages and curriculum pages, extract resume work and education related fields..." ๐Ÿ› ๏ธ From that, the agent builder reasons about which LlamaCloud tools to use, lands on LlamaSplit + LlamaExtract, configures both, iterates on the workflow structure, and gives you a deployable agent with an API and UI. No dragging boxes around. No writing workflow code (unless you want to). Just describe the problem and let it figure out the architecture. You own the code, it pushes to your GitHub. Clone it, open in Cursor, customize whatever you need. https://t.co/QAvGwI3FIg

Media 1
๐Ÿ–ผ๏ธ Media
L
llama_index
@llama_index
๐Ÿ“…
Feb 19, 2026
24d ago
๐Ÿ†”62706517

More reasoning doesn't always mean better results - especially for document parsing. We tested GPT-5.2 at four reasoning levels on complex documents and found that higher reasoning actually hurt performance while dramatically increasing costs and latency. ๐Ÿง  Reasoning models hallucinate content that isn't there, filling in "missing" table cells with inferred values ๐Ÿ“Š They split single tables into multiple sections by overthinking structural boundaries โšก Processing time increased 5x with xHigh reasoning (241s vs 47s) while accuracy stayed flat at ~0.79 ๐Ÿ’ฐ Our LlamaParse Agentic outperformed all reasoning levels at 18x lower cost and 13x faster speed You can't reason past what you can't see. Vision encoders lose pixel-level information before reasoning even starts, and no amount of thinking tokens can recover that lost detail. Our solution uses a pipeline approach - specialized OCR extracts text at native resolution, then LLMs structure what's already been accurately read. Each component plays to its strengths instead of forcing one model to handle everything. Read the full analysis: https://t.co/gWDOpfHnWm

Media 1Media 2
๐Ÿ–ผ๏ธ Media