Your curated collection of saved posts and media
@karpathy https://t.co/kaZwaiAhn6
The AI Consumer Index (ACE) Most AI benchmarks today focus on reasoning and coding. But most people use AI to shop, cook, and plan their weekends. In those domains, LLM hallucinations continue to be a real problem. 73% of ChatGPT messages (according a recent report) are now non-work-related. Consumers are using AI for everyday tasks, and we have no systematic way to measure how well models perform on them. This new research introduces ACE (AI Consumer Index), a benchmark assessing whether frontier models can perform high-value consumer tasks across shopping, food, gaming, and DIY. Consumer tasks require grounding in real-world information. A model that hallucinates a product price or provides a dead link isn't just wrong, it's actively unhelpful. ACE's grading methodology dynamically checks whether responses are grounded in retrieved web sources, penalizing hallucinations with negative scores. The results expose a substantial gap: GPT-5 (Thinking = High) leads at 56.1%, followed by o3 Pro at 55.2%. The best model scores only 45.4% on Shopping. Models frequently hallucinate prices and product features, scoring negative on grounded criteria. The study found that on "Provides link(s)" in Shopping, Gemini 3 Pro scores -54%. That's not just failing to provide links, it's confidently providing dead or fabricated ones. Other models like Opus 4.5 also face similar issues. All of these issues can be improved with multi-agent systems, but it's important to be aware of the issue first. The benchmark includes 400 hidden test cases created by 47 domain experts. Each case has fine-grained rubrics distinguishing whether failures come from not meeting requirements versus hallucinating information. Paper: https://t.co/VBSBCJMFHQ ACE reveals the gap between benchmark performance and real-world utility.
Join us for AI Dev Days this week! Day 2 of the event, on December 11th, is all about enhancing developer productivity with AI - we've got some really great sessions from the @code team that you don't want to miss ๐ Learn more at our blog: https://t.co/9s8SNUBbT7
For the first time in six years, MIRI is running a fundraiser. Our target is $6M. Please consider supporting our efforts to alert the worldโand identify solutionsโto the danger of artificial superintelligence. SFF will match the first $1.6M! โฌ๏ธ https://t.co/EWNoIKsHnB
#NativeAmerican #native #nativetwitter #cloutmma3 #เนเธเธตเธขเธฃเนเนเธเธญเธฐเธงเธญเธขเธเน #OTDirecto29D #afuaasantewaasingathon #sueperkupa #NewYearsHonours https://t.co/z1SWoZu1o2

Love wins holy shit #rayfrog #bullfrog #Ramon https://t.co/G6H1iNTiYc

Gay old men canvas no way #charpim #rayfrog #pongorma #dedusmuln #giroro #dororo https://t.co/9a2JhGti0S

This July 4th, we contemplate parallels between the colonization of Turtle Island (โNorth Americaโ) and Palestine. Supporting Palestiniansโ right to return and right to self-determination in their homeland goes hand in hand with supporting Indigenous peopleโs demand for #LandBack ๐งต
Pretty in pink? Want to join us? Repost it as much as you can. https://t.co/iYaxIbEWUY https://t.co/QDMVqIISWb

ใใญใใฐๅๅ ใๅทฆ่ค็ฉบๆฐใใใใคใฉในใใใใใซใ ๆถผๆฃฎ็ๅธใ ๅๅๅถไฝไบๅฎ๏ผ #wf2019w #ใใคใใฃใ https://t.co/R7UVXJ3hQD

This is awesome!!! Thank you @Algorand @AlgoFoundation for the shout out!!!! https://t.co/r73QC8FTBc

https://t.co/LbokDLyLnt.frens ๐ค Saturday learning and exploring ๐ง Some outputs i like below https://t.co/3SAG0v4wrJ
Shout out and big thank you to @nft_highmali for collecting both collabs between me and @Gogolitus ! Instant full set ๐พ 'Floppy risk' https://t.co/QfxAQG10Xb https://t.co/vhqlzxbhHg

rฬตฬคฬฃออฬนฬบฬฬฬฬฬฬอฤฬทฬชฬฅฬฬฒฬณฬฏฬฐฬอฬออฬพฬฬฬออ bฬตอฬญฬฉอฬฬฬรถฬถฬฐฬปอฬฬบฬซฬฬฬoฬดฬขฬฬฬออฬ ฬออฬออฬอ แบฬถฬกฬกออฬ ฬปฬฬ ฬ ฬฬ โ ๏ธFlash warningsโ ๏ธ https://t.co/xYEi0559fp

behind the scenes - the making of natives font https://t.co/OSUKvsBmzH

Integrating LLMs with knowledge bases. Important read for AI practitioners LLMs generate impressive text but struggle with hallucinations, outdated knowledge, and reasoning over structured data. The default response has been scaling up (e.g., more parameters, more compute, more cost). But bigger models don't solve the fundamental problem: LLMs lack reliable access to external, verifiable knowledge. This new survey examines how RAG, Knowledge Graphs, and hybrid approaches address these limitations. The key insight: integration happens at three levels: - Level 1 focuses on retrieval, getting the right information into the model. - Level 2 addresses reasoning, synthesizing retrieved knowledge for complex tasks. - Level 3 handles optimization, adapting systems for domain-specific needs. KAG showed 19.1% exact match improvement over basic RAG on HotpotQA. Think-on-Graph achieved significant accuracy gains over Chain-of-Thought on complex QA. The practical applications span finance, medicine, and code generation. FinAgent combines RAG with reinforcement learning for trading decisions. UMLS integration improves diagnostic accuracy in medical AI. Codex leverages retrieval to enhance code generation quality. Knowledge drift requires continuous updates, domain-specific representations don't always align with LLM embeddings, and standardized evaluation benchmarks are still lacking. The path to reliable LLMs isn't just scale. It's thoughtful integration with structured knowledge that provides factual grounding and enables complex reasoning. Paper: https://t.co/vl8ZPf4ncA Learn to build RAG and AI agents in our academy: https://t.co/zQXQt0PMbG
๐ฆธ1Wrkโs SuperApp, built on #AWS, is helping businesses hire faster & mange talent more intelligently. ๐กDiscover how AWS is providing a โbackboneโ for #generativeAI innovation & helping the #startup scale: https://t.co/hxJI58TMJ9 https://t.co/8Ij87Ipbpb
๐ฃ @DeepgramAI announces integration with key #AWS solutions at #AWSreInvent. ๐ Accessing advanced voice #AI models & capabilities within their existing #AWS environments will enable customers to build, deploy & scale their applications faster & more securely. ๐ https://t.co/FiXYywbF0s
๐ #awsreinvent is back & Day 1 is done! https://t.co/PZYZLKBKHV ๐ฆพ We met the next generation of leaders, talked about the latest trends in #AI & celebrated opening night in style with the global cloud community. What a day for #strartups! https://t.co/XMhbIyRxzI
๐ฃ Applications for the 2026 Physical AI Fellowship are now open! โ https://t.co/Wvje7yrqsy ๐ค At #awsreinvent, @MassRobotics & @nvidia announce the second edition of the 8-week program designed to help robotics & physical #AI startups from around the world scale faster & smarter.
๐ฃ @twelve_labs launches its most powerful video understanding model at #awsreinvent. https://t.co/jEHEFPJYQe ๐ Marengo 3.0 โshatters the limits of whatโs possibleโ for developers & enterprise, enabling them to search, navigate & understand video content at scale. https://t.co/ogx3CtxAJa
๐ค Networking, exploring the future with Matt Garman & celebrating success: what a fantastic day at #awsreinvent! โก๏ธThe energy was electrice, with Day 2 bringing innovation & insights. ๐ Keep it up, cloud community! https://t.co/KzKSpjirk6 https://t.co/NaIVNBZBXA
๐ฑ At #awsreinvent, @BrainBoxAI announces the expansion of its AI pilot for sustainability, with #AWS. ๐ The project will scale across the US, improving energy efficiency & decarbonization at #Amazon fulfillment sites. https://t.co/5cBry2jJ4J https://t.co/ZcZNEZeytD
๐ง๐ Using AI & machine learning to "unmixโ audio recordings, @AudioShake makes audio workflows clear & easy to control. ๐ https://t.co/tEDUnJIANA Backed by #AWS infrastructure, AudioShake is able to process large workloads while staying flexible & fast. https://t.co/xG4ymlPPQ6
๐ฃ๐ @Get_Writer launches new agent supervision & orchestration tools with #AWS. At #awsreinvent, the startup announced its integration with #AmazonBedrock, enabling enterprises to easily build secure agents & get wider access to #AI models. ๐ https://t.co/cdLnaXoHm0 https://t.co/eKx7VYsl9p
๐ Day 3 of #awsreinvent was buzzing with AI insights & our VP of Startups Jason Bennett shares the highlights. ๐ก From a showcase of bold ideas to a keynote from VP of Agentic #AI @SwamiSivasubram โwe saw the future of transformative tech. https://t.co/SYabp5Nvmd https://t.co/FdFScnBcrw
๐ค @agilityrobotics is redefining the warehouse workforce with its humanoid robot, Digit. Discover how the startup continuously optimizes & scales its AI model training through #AWS Cloud, driving both productivity & wellness. ๐ https://t.co/jaQ7JmSLZf https://t.co/jFQFuCIbp4
๐#awsreinvent: what a week! ๐Future leaders pitched bold ideas, #startups built global connections, investors shared insights & pioneers unpacked the next era of #AI. ๐A huge thank you to the amazing cloud communityโsee you all next year! https://t.co/uepoI953f6 https://t.co/x4z0RCnDvL
@FPLGOAT7 Ranked 4,000 GW1, currently 49k. Any suggestions? https://t.co/EmCYm2yftH

Great overview of Nvidia open source goodies for researchers @NaderLikeLadder https://t.co/VMjG3DhzJD
Great overview of Nvidia open source goodies for researchers @NaderLikeLadder https://t.co/VMjG3DhzJD

I'm at NeurIPS! Giving a talk today on state of OSS & why NVIDIA spends millions of GPU hours training models just to give it all away: datasets, recipes, weights, architecture. 545pm @ Exhibit Hall A,B https://t.co/G3SBf9iHr8